[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML text search & replace

Subject: Re: XML text search & replace
From: Martin Honnen <Martin.Honnen@xxxxxx>
Date: Wed, 23 Mar 2011 15:10:19 +0100
Re:  XML text search & replace
a kusa wrote:
Hello

There is a requirement to search for a particular pattern in XML
documents and replace them by reading another XML file and copying
over the replacement text correcpinding to the original text.  I have
been trying to use<xsl:analyze-string>  in xslt 2.0. but I am not sure
how to read another XML file using this tag.

As an example, if I have some text tagged within<para> tags :

<para> this is a simple text</para>

I have an external xml file of the form:

<matchtext>simple</matchtext>
<replacetext>hard</replacetext>

In my<xsl:matching-substring>, can I use doc() to read the external
XML file and replace the text?

Yes, simply build a regular expression and use that. Here is a sample:


<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

<xsl:param name="rep-file" as="xsd:string" select="'test2011032302.xml'"/>
<xsl:variable name="rep-doc" as="document-node()" select="doc($rep-file)"/>
<xsl:variable name="rep-pattern" as="xsd:string"
select="string-join($rep-doc/replacements/replacement/matchtext, '|')"/>


<xsl:key name="rep-key" match="replacement" use="matchtext"/>

  <xsl:template match="para">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="para//text()">
    <xsl:analyze-string select="." regex="{$rep-pattern}">
      <xsl:matching-substring>
        <xsl:value-of select="key('rep-key', ., $rep-doc)/replacetext"/>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

Assumes you have a file test2011032302.xml

<replacements>
  <replacement>
    <matchtext>simple</matchtext>
    <replacetext>hard</replacetext>
  </replacement>
</replacements>


There are some shortcomings, namely that word boundaries like \b are not supported by the XSLT/XPath regular expression language so it is difficult to prevent that e.g. "simple" in "simpleminds" is not replaced. If your XSLT 2.0 processor is AltovaXML Tools then I think it supports \b however.
Another problem occurs if the matchtext contains characters that are meta character in regular expressions like '?' or ')', you would first need to escape them with a function like http://www.xsltfunctions.com/xsl/functx_escape-for-regex.html.



--


	Martin Honnen
	http://msmvps.com/blogs/martin_honnen/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.