Subject: RE: XSLT Solution for hyphenation
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 22 Dec 2006 09:26:43 -0000
|
You seem to be doing exact matching on the words in your dictionary, not
regular expression matching as your use of matches() would suggest. With
exact matching you can use a key for the lookup which will be dramatically
faster.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx]
> Sent: 22 December 2006 06:10
> To: Xsl-List
> Subject: XSLT Solution for hyphenation
>
> Hi list,
>
> I have this project that applies hyphenation to an XML
> document using a list of words as a reference. The list of
> words can reach up to a million entries.
> My XSLT solution was having a template that matches text()
> nodes then insert hyphens to the matching words that are in
> the list. However the transformation takes to long to finish
> even for a relatively small file (around 1mb). Is there
> anyway to speed this or is there a better solution?
>
> Here's my stylesheet:
>
> <xsl:template match="/">
> <xsl:apply-templates/>
> </xsl:template>
> <xsl:template match="@*|element()|comment()|processing-instruction()">
> <xsl:copy>
> <xsl:apply-templates select="@*|node()"/>
> </xsl:copy>
> </xsl:template>
> <xsl:template match="text()">
> <xsl:variable name="str" select="."/>
> <xsl:variable name="searchStrs" as="xs:string*"
> select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\
[\]\^\$|]',
> '\\$0')"/>
> <xsl:value-of
> select="ati:replace-all($str,$searchStrs,$replaceStr)"/>
> </xsl:template>
> <xsl:function name="ati:replace-all">
> <xsl:param name="input" as="xs:string"/>
> <xsl:param name="words-to-replace" as="xs:string*"/>
> <xsl:sequence select="if (exists($words-to-replace)) then
> ati:replace-all(replace($input, $words-to-replace[1],
> key('replace',$words-to-replace[1],$search-words)),remove($wor
ds-to-replace,1))
> else $input"/>
> </xsl:function>
>
> heres a sample of the look-up table:
>
> <root>
> <wordlist>
> <entry>
> <search>abaissassent</search>
> <replace>abais­sassent</replace>
> </entry>
> <entry>
> <search>abaisshrent</search>
> <replace>abais­shrent</replace>
> </entry>
> <entry>
> <search>abandonnent</search>
> <replace>aban­donnent</replace>
> </entry>
> </wordlist>
> </root>
>
> so if i have a "abaissassent" in a text() node this will be
> replaced with "aban­donnent".
>
> --
> *Jeff*
|