[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Troublshooting XSLT replace()

Subject: Re: Troublshooting XSLT replace()
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Tue, 3 Dec 2013 18:23:13 -0500
Re:  Troublshooting XSLT replace()
Hi,

In general this problem is difficult because the requirement defies
the XML data model. It specifies a set of changes to make to a run of
text that does not exist as such in the tree that XPath sees. It can
be inferred by processing the mixed content (as string() will do or
indeed as asking for the element's string value will do), but it is
not there to be operated on. If you operate on your "view" of it, you
will typically find it hard to work with any inline markup, since by
definition working on the string wipes element structure away.

Usually the better course of valor is to accept that partial solutions
are enough. For example, if you can assume that no token to be
processed (such as "Undated") will ever be split by markup, then a
solution like Graydon's (operating on each of the text nodes
discretely, not the whole string together) is the best course. When
content is potentially split by markup (if you are so fortunate as to
have such markup, as some do some applications of XML) then things are
not so easy, due to the XPath data model. XSLT is designed to work the
other way around.

This suggests to me another approach to the problem, something like this:

<xsl:template match="unittitle">
  <xsl:variable name="with-text-as-elements" as="element()">
    <xsl:apply-templates select="." mode="text-as-elements"/>
  </xsl:variable>
  <xsl:apply-templates select="$with-text-as-elements"/>
</xsl:template>

Mode 'with-text-as-elements' generates a temporary tree in which all
text nodes are transformed into element-based representations,
sequences of 't' ('token' elements) something like this:

<unittitle>
  <n:t str="Here"/><n:t str="is"/><n:t str='my'/><n:t str="title"/>
</unittitle>

This is created by a near-identity transformation that operates on the
text node descendants.

The model could be extended to handle punctuation etc. (Maybe you want
to represent them with their own elements.) Or you can plan to work
around any punctuation with your regular expressions.

This temporary tree could then be processed to do whatever you need to
with the text (or rather, token) content. Templates to match these
elements should be super-easy to write, test and extend with new
substitution and filtering rules. By default:

<xsl:template match="n:t">
  <xsl:apply-templates select="@str"/>
</xsl:templates>

Then

<xsl:template match="n:t/@str[matches(.,'Undated','-i')]>undated</xsl:template>

etc. (this is where the regular expressions come in.)

This works well as long as your substitution rules are confined to
working with single words or tokens. Matching and processing sequences
of them is possible but harder.

It assumes that the result does not have whitespace fidelity to the
source. Add yet another token type when you need to do this. (You will
also need logic to insert whatever whitespace you want into the result
when you serialize this back into text.)

It involves more overhead than doing it directly in templates a la
Graydon. So it might be worth doing only if you have to do a lot of
this.

Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Tue, Dec 3, 2013 at 4:59 PM, Graydon <graydon@xxxxxxxxx> wrote:
> On Tue, Dec 03, 2013 at 04:48:32PM -0500, Nathan Tallman scripsit:
>> Thank you, Graydon. I am cleaning up a huge stack of XMLs;
>> unfortunately I cannot use lower() because there may be other text in
>> <unitdate> that needs to remain capitalized.
>
> Well, bother.  replace() it is, then.
>
>> > <xsl:template match="text()[ancestor::unittittle]">
>
> If there's lots, you might want
>
> <xsl:template match="text()[ancestor::unittittle][normalize-space()]">
>
> instead; the optimizer will _probably_ figure out that you're only
> interesting in text nodes with some non-whitespace contents, but it
> rarely hurts to provide a hint.
>
> -- Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.