[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Re: Re: Using XSLT to add markup to a document

Subject: Re: Re: Re: Using XSLT to add markup to a document
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 8 Jul 2003 10:56:05 +0100
xsl analyze string example
Dimitre wrote:
> Another problem with this solution is that it finds the strings not
> strictly from left to right (when we search for words as opposed to
> generally strings this may not be a problem -- my knowledge of
> English does not allow me to make a strong conclusion).

All Dimitre's observations about the inadequacy, in the general case,
of the solution David and I were discussing are correct. Flexible,
general solutions to marking up a string using XSLT 1.0 are not
straight-forward.

It's interesting to see what the regular expression processing in XSLT
2.0 can do to help here. With the words hard-coded into the
stylesheet, it would look like:

  <xsl:analyze-string select="$text" regex="relation|core">
    <xsl:matching-substring>
      <special><xsl:value-of select="." /></special>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>

<xsl:analyze-string> is defined such that the first matching substring
gets picked up, so if you have:

  There is a strong corelation...

then you get:

  There is a strong <special>core</special>lation...

There's no definition in the spec about what happens if you have
overlapping matching substrings, for example:

  <xsl:analyze-string select="$text" regex="relation|core|corelation">
    ...
  </xsl:analyze-string>

(Saxon 7 picks the one that appears first in the regex.) I think that
this is a bug in the spec, and I'll raise it as an issue; I think
probably it should select the longest match.

You can generate the regular expression that's used for the string
dynamically, with an attribute value template. So for example, you
could have:

<xsl:template name="markup" as="xs:string">
  <xsl:param name="text" as="xs:string" />
  <xsl:param name="replacements" as="item()*" />
  <xsl:variable name="regex" as="xs:string">
    <xsl:value-of select="$replacements" separator="|" />
  </xsl:variable>
  <xsl:analyze-string select="$text" regex="{$regex}">
    ...
  </xsl:analyze-string>
</xsl:template>

in which case the markup function can be called with:

  <xsl:call-template name="markup">
    <xsl:with-param name="text"
                    select="'There is a strong corelation...'" />
    <xsl:with-param name="replacements"
                    select="('core', 'relation', 'corelation')" />
  </xsl:call-template>

though to be thorough, you'd need to make sure that you escaped any
regex-significant characters in the replacement strings.

To get the longest match first, at least using Saxon 7, you can sort
the replacements by length, with the longer ones first (or
alphabetically in reverse order will give you the same result):

  <xsl:variable name="regex" as="xs:string">
    <xsl:for-each select="$replacements">
      <xsl:sort select="string-length(.)" order="descending" />
      <xsl:value-of select="." />
      <xsl:if test="position() != last()">|</xsl:if>
    </xsl:for-each>
  </xsl:variable>

Getting whole-word-only matches is much more complicated, in fact I
can't think of a good approach right now, but perhaps someone else
can?
  
Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.