[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XSLT Solution for hyphenation

Subject: Re: XSLT Solution for hyphenation
From: Jeff Sese <jsese@xxxxxxxxxxxx>
Date: Thu, 28 Dec 2006 09:56:15 +0800
xml x007c
Yes, i'm doing an exact match. I was thinking of using keys but i don't know how to use it for these kind of look-up, i'm more familiar with using keys in grouping.
Suppose i have this input:


<root>
   <p>I have some text that has the words abaissassent and abandonnent.</p>
</root>

How do i use keys so that i can have this output?

<root>
<p>I have some text that has the words abais&#00AD;sassent and aban&#00AD;donnent.</p>
</root>


heres a sample of the look-up table:

<root>
<wordlist>
  <entry>
    <search>abaissassent</search>
    <replace>abais&#x00AD;sassent</replace>
  </entry>
  <entry>
    <search>abaisshrent</search>
    <replace>abais&#x00AD;shrent</replace>
  </entry>
  <entry>
    <search>abandonnent</search>
    <replace>aban&#x00AD;donnent</replace>
  </entry>
</wordlist>
</root>

-- Jeff


Michael Kay wrote:
You seem to be doing exact matching on the words in your dictionary, not
regular expression matching as your use of matches() would suggest. With
exact matching you can use a key for the lookup which will be dramatically
faster.

Michael Kay
http://www.saxonica.com/


-----Original Message-----
From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx] Sent: 22 December 2006 06:10
To: Xsl-List
Subject: XSLT Solution for hyphenation


Hi list,

I have this project that applies hyphenation to an XML document using a list of words as a reference. The list of words can reach up to a million entries.
My XSLT solution was having a template that matches text() nodes then insert hyphens to the matching words that are in the list. However the transformation takes to long to finish even for a relatively small file (around 1mb). Is there anyway to speed this or is there a better solution?


Here's my stylesheet:

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|element()|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:variable name="str" select="."/>
<xsl:variable name="searchStrs" as="xs:string*" select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\
[\]\^\$&#x007C;]',
'\\$0')"/>
<xsl:value-of select="ati:replace-all($str,$searchStrs,$replaceStr)"/>
</xsl:template>
<xsl:function name="ati:replace-all">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:sequence select="if (exists($words-to-replace)) then ati:replace-all(replace($input, $words-to-replace[1],
key('replace',$words-to-replace[1],$search-words)),remove($wor
ds-to-replace,1))
else $input"/>
</xsl:function>

heres a sample of the look-up table:

<root>
    <wordlist>
        <entry>
            <search>abaissassent</search>
            <replace>abais&#x00AD;sassent</replace>
        </entry>
        <entry>
            <search>abaisshrent</search>
            <replace>abais&#x00AD;shrent</replace>
        </entry>
        <entry>
            <search>abandonnent</search>
            <replace>aban&#x00AD;donnent</replace>
        </entry>
    </wordlist>
</root>

so if i have a "abaissassent" in a text() node this will be replaced with "aban&#x00AD;donnent".

--
*Jeff*

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.