|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: XSLT Solution for hyphenation
You seem to be doing exact matching on the words in your dictionary, not regular expression matching as your use of matches() would suggest. With exact matching you can use a key for the lookup which will be dramatically faster. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx] > Sent: 22 December 2006 06:10 > To: Xsl-List > Subject: XSLT Solution for hyphenation > > Hi list, > > I have this project that applies hyphenation to an XML > document using a list of words as a reference. The list of > words can reach up to a million entries. > My XSLT solution was having a template that matches text() > nodes then insert hyphens to the matching words that are in > the list. However the transformation takes to long to finish > even for a relatively small file (around 1mb). Is there > anyway to speed this or is there a better solution? > > Here's my stylesheet: > > <xsl:template match="/"> > <xsl:apply-templates/> > </xsl:template> > <xsl:template match="@*|element()|comment()|processing-instruction()"> > <xsl:copy> > <xsl:apply-templates select="@*|node()"/> > </xsl:copy> > </xsl:template> > <xsl:template match="text()"> > <xsl:variable name="str" select="."/> > <xsl:variable name="searchStrs" as="xs:string*" > select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\ [\]\^\$|]', > '\\$0')"/> > <xsl:value-of > select="ati:replace-all($str,$searchStrs,$replaceStr)"/> > </xsl:template> > <xsl:function name="ati:replace-all"> > <xsl:param name="input" as="xs:string"/> > <xsl:param name="words-to-replace" as="xs:string*"/> > <xsl:sequence select="if (exists($words-to-replace)) then > ati:replace-all(replace($input, $words-to-replace[1], > key('replace',$words-to-replace[1],$search-words)),remove($wor ds-to-replace,1)) > else $input"/> > </xsl:function> > > heres a sample of the look-up table: > > <root> > <wordlist> > <entry> > <search>abaissassent</search> > <replace>abais­sassent</replace> > </entry> > <entry> > <search>abaisshrent</search> > <replace>abais­shrent</replace> > </entry> > <entry> > <search>abandonnent</search> > <replace>aban­donnent</replace> > </entry> > </wordlist> > </root> > > so if i have a "abaissassent" in a text() node this will be > replaced with "aban­donnent". > > -- > *Jeff*
|
Purchase Stylus Studio Online Today!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






