[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

XSLT Solution for hyphenation

Subject: XSLT Solution for hyphenation
From: Jeff Sese <jsese@xxxxxxxxxxxx>
Date: Fri, 22 Dec 2006 14:09:50 +0800
hyphenation
Hi list,

I have this project that applies hyphenation to an XML document using a list of words as a reference. The list of words can reach up to a million entries.
My XSLT solution was having a template that matches text() nodes then insert hyphens to the matching words that are in the list. However the transformation takes to long to finish even for a relatively small file (around 1mb). Is there anyway to speed this or is there a better solution?


Here's my stylesheet:

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|element()|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:variable name="str" select="."/>
<xsl:variable name="searchStrs" as="xs:string*" select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\[\]\^\$&#x007C;]', '\\$0')"/>
<xsl:value-of select="ati:replace-all($str,$searchStrs,$replaceStr)"/>
</xsl:template>
<xsl:function name="ati:replace-all">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:sequence select="if (exists($words-to-replace)) then ati:replace-all(replace($input, $words-to-replace[1], key('replace',$words-to-replace[1],$search-words)),remove($words-to-replace,1)) else $input"/>
</xsl:function>


heres a sample of the look-up table:

<root>
   <wordlist>
       <entry>
           <search>abaissassent</search>
           <replace>abais&#x00AD;sassent</replace>
       </entry>
       <entry>
           <search>abaisshrent</search>
           <replace>abais&#x00AD;shrent</replace>
       </entry>
       <entry>
           <search>abandonnent</search>
           <replace>aban&#x00AD;donnent</replace>
       </entry>
   </wordlist>
</root>

so if i have a "abaissassent" in a text() node this will be replaced with "aban&#x00AD;donnent".

--
*Jeff*

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.