[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Washington method

Subject: Washington method
From: Dave Pawson <davep@xxxxxxxxxxxxx>
Date: Sun, 10 Apr 2011 09:45:44 +0100
 Washington method
Was Processing two documents. which order?

Finally got my tiny mind round this one and I believe it is worth
spending some time on to explain it.

Problem: Some text, in XML preferably for which some parts are required
to be marked up as XML in the output. 

The approach.
An external file contains the word list, as xml.
The main input file contains the text needing marking up.

the 'word list' looks something like

<x>
<word>target:word</word>
...
</x>

The xslt contains the following

<xsl:key name="words" match="word" use="."/>

Options:
1 wanted simply to do the markup, no more processing hence
the stylesheet had

<xsl:template match="node()">
 <xsl:copy>
  <xsl:copy-of select="@*"/>
  <xsl:apply-templates/>
 </xsl:copy>
</xsl:template>

If you want other processing then add templates as needed.

The work is done in this template

<xsl:template match="text()[not(parent::a or 
		     parent::b or 
		     parent::c ] priority="2">
 <xsl:analyze-string select="." regex="[a-z][a-z\-:.]+">
  <xsl:matching-substring>
   <xsl:choose>
     <xsl:when test="key('w',.,doc('../props.xml'))">
     <tag>
      <xsl:value-of select="."/>
     </tag>
    </xsl:when>
    <xsl:otherwise>
     <xsl:value-of select="."/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
   <xsl:value-of select="."/>
  </xsl:non-matching-substring>   
 </xsl:analyze-string>
</xsl:template>

1. The regex should match on any character group that *may* contain
one of the wanted words. I had to include - : and . since
the text contained those characters.

2. The 'tag' element is used to markup matches.
  A candidate match occurs when the regex makes a hit, in the
  matching-substring element.
   A further selection is made, matching the key (from the external
   document). Only then does markup happen

3. I required not to markup text in some elements, hence the filtering
not(parent::a or 
		     parent::b or 
		     parent::c ] 
which exludes the text from all these elements. 

In hindsight, the method does not use the character subtraction class,
just the escaping needed (since I needed to match on word-nextword) 
confused me.

Repetition against a parameter for the case I had took 15 minutes.
Using this method, 4 seconds.


In retrospect, it is a valuable addition to any toolkit IMHO.

Washington method? From David Carlisle of course :-)
Thanks David.







-- 

regards 

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.