[Home] [By Thread] [By Date] [Recent Entries]
Tom Cleghorn tcleghorn@xxxxxxxxxxxxx wrote:
I tried the following with Saxon 9.6 PE: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xf="http://www.w3.org/2005/xpath-functions" xmlns:new="http://example.com/new" exclude-result-prefixes="xs xf"> <xsl:param name="size" as="xs:integer" select="20"/> <xsl:variable name="regex" as="xs:string" select="concat('^(\w+[\s\p{P}]+){', $size, '}')"/> <xsl:param name="file-name" as="xs:string" select="'test2014111202Text.xml'"/> <xsl:variable name="start-node" as="text()?" select="descendant::text()[normalize-space()][1]"/> <xsl:variable name="end-node" as="text()?" select="descendant::text()[normalize-space() and matches(string-join((preceding::text()[normalize-space()], .), ''), $regex)][1]"/> <xsl:variable name="end-words" as="xs:string?" select="replace(string-join(($end-node/preceding::text()[normalize-space()], $end-node), ''), $regex, '')"/> <xsl:template match="/"> <xsl:variable name="d1">
<xsl:apply-templates/>
</xsl:variable><xsl:copy-of select="$d1"/> <xsl:result-document href="{$file-name}"> <xsl:variable name="split" select="$d1//new:end"/> <xsl:variable name="copy" select="$split/(ancestor-or-self::node() | preceding::node())"/> <xsl:apply-templates select="($copy//sec)[1]" mode="sep"> <xsl:with-param name="nodes" select="$copy" tunnel="yes"/> </xsl:apply-templates> </xsl:result-document> </xsl:template> <xsl:template match="node()" mode="sep">
<xsl:param name="nodes" tunnel="yes"/>
<xsl:if test=". intersect $nodes">
<xsl:copy>
<xsl:apply-templates select="@* , node()" mode="sep"/>
</xsl:copy>
</xsl:if>
</xsl:template><xsl:template match="new:start" mode="sep"/> <xsl:template match="new:end" mode="sep"> <xsl:text>[...]</xsl:text> </xsl:template> <xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template><xsl:template match="$start-node" priority="5"> <new:start/> <!-- would like <xsl:next-match/> to either use the identity transformation template if start-node and $end-node are different or the template below if they are the same but ran into a problem with Saxon 9.6 PE --> <xsl:value-of select="."/> </xsl:template> <xsl:template match="$end-node"> <xsl:value-of select="substring-before(., $end-words)"/> <new:end/> <xsl:value-of select="$end-words"/> </xsl:template> </xsl:stylesheet> I think it produces the output you want for the input you posted but I have not tried it on other samples. Obviously part of the approach is writing a regular expression that identifies the "words", I used <xsl:variable name="regex" as="xs:string" select="concat('^(\w+[\s\p{P}]+){', $size, '}')"/> which works on your sample but would fail for instance if the first text nodes with words starts with white space or punctuation characters.
|

Cart



