[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: marking up text when term from other file is found

Subject: Re: marking up text when term from other file is found
From: Mukul Gandhi <gandhi.mukul@xxxxxxxxx>
Date: Thu, 22 Apr 2010 11:51:10 +0530
Re:  marking up text when term from other file is found
I would try to solve this as, following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                       version="2.0">

  <xsl:output method="xml" indent="yes" />

  <xsl:variable name="index-terms" select="document('indexTerms.xml')" />

  <xsl:template match="node() | @*">
    <xsl:copy>
	  <xsl:apply-templates select="node() | @*" />
	</xsl:copy>
  </xsl:template>

  <xsl:template match="text()" priority="10">
	 <xsl:analyze-string select="."
	                     regex="{string-join(for $term in
$index-terms/terms/term return concat('(', $term, ')'), '|')}">
	    <xsl:matching-substring>
		 <xsl:variable name="idVal" select="string-join(for $attrVal in
$index-terms/terms/term[. =
regex-group(0)]/@*[starts-with(name(),'index')] return $attrVal, '_')"
/>
		 <ph id="{$idVal}">
		     <xsl:value-of select="." />
		 </ph>
           </xsl:matching-substring>
	   <xsl:non-matching-substring>
	       <xsl:value-of select="." />
           </xsl:non-matching-substring>
	 </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

You may adapt this, to suit your requirements if needed.

On Thu, Apr 22, 2010 at 8:38 AM, Hoskins & Gretton
<hoskgret@xxxxxxxxxxxxxxxx> wrote:
>
> HI, I need help finding resources (examples and/or XSL) for this situation,
> for which I haven't found quite the right recipe in my searches of the list
> archives.
> Given an XML file containing a list of terms and another file containing a
> mix of elements containing text (narrative content, some inline markup for
> emphasis and footnotes), I was asked if I could find occurrences of each
> term wherever it appeared in the narrative content, and wrap each
occurrence
> with a tag. So my first thought is to load up each document into a
variable.
> But then I don't know what the most effective method of string comparison
> would be, given that the narrative document might have the term's words
with
> different capitalization. If anyone can point me in the right direction,
I'd
> appreciate it. Also I would like to know if there is a practical limit to
> how large a narrative file I can run with about 150 terms to find in the
> B text. And if a different approach B would work better, such as writing
Java
> to do B brute force search and replace, please tell me so. (I work with a
> Java programmer. Everything looks like a Java problem to her and an XSL
> problem to me.)
> -- Dorothy
> Note: Using Saxon B 9.1.0.7. I just made up a set of terms and a bad
> sentence as an example.
> Example of terms (indexTerms.xml):
> <?xml version="1.0" encoding="UTF-8"?>
> <terms>
> B  <term index1="anxiety">Anxiety</term>
> B  <term index1="children">Children</term>
> B  <term index1="children" index2="illness">Children, illness</term>
> B  <term index1="children" index2="nightmare">Children, nightmare</term>
> B  <term index1="cure" index2="fever">Cure fever</term>
> B  <term index1="cure" index2="illness">Cure illness</term>
> B  <term index1="anxiety" index2="nightmare">Nightmare</term>
> B  <term index1="children" index2="illness">Sick children</term>
> B  <term index1="anxiety" index2="phobia">Worries, phobias and
anxiety</term>
> </terms>
>
> Example of narrative (sampleTopic.xml):
> <?xml version='1.0' encoding='UTF-8'?>
> <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN"
> "http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd">
> <topic id="sampleTopic">
> B <title>sampleTopic</title>
> B <body>
> B  B <p>markup for sample terms testing a set of phrases to match to the
> content of index terms:</p>
> B  B <p>Texttexttext text some of the terms are already in &lt;ph&gt; i.e.
<ph
> id="cure_fever">curing fever</ph>, <ph id="children_illness">sick
> children</ph> and sometime the same terms occur, <i>but different case</i>,
> not in a ph: Curing fever and <b>Sick children</b>. I need to get all the
> occurrences of each of the term element strings marked up with &lt;ph&gt;
> </p>
> B </body>
> </topic>
>
> Desired result:
> <?xml version='1.0' encoding='UTF-8'?>
> <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN"
> "http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd">
> <topic id="sampleTopic">
> B <title>sampleTopic</title>
> B <body>
> B  B <p>markup for sample terms testing a set of phrases to match to the
> content of index terms:</p>
> B  B <p>Texttexttext text some of the terms are already in &lt;ph&gt; i.e.
<ph
> id="cure_fever">curing fever</ph>, <ph id="children_illness">sick
> children</ph> and sometime the same terms occur, <i>but different case</i>,
> not in a ph: <ph id="cure_fever">Curing fever</ph> and <b><ph
> id="children_illness">Sick children</ph></b>. I need to get all the
> occurrences of each of the term element strings marked up with &lt;ph&gt;
> </p>
> B </body>
> </topic>
>
> XSL:
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="2.0">
> <xsl:param name="indexFile">indexTerms.xml</xsl:param>
> <xsl:param name="textFile">sampleTopic.xml</xsl:param>
> <xsl:variable name="termsDocument"
> select="document($indexFile)"></xsl:variable>
> <xsl:variable name="textDocument"
> select="document($textFile)"></xsl:variable>
> <xsl:template match="*" name="test1"><xsl:result-document
> href="matchText-test.xml" method="xml">
> <!-- proof that I can get the terms -->
> <xsl:text>&#10;</xsl:text><xsl:comment><xsl:text>first term is
> </xsl:text><xsl:value-of
> select="$termsDocument/terms/term[1]"/></xsl:comment>
> <xsl:text>&#10;</xsl:text><xsl:comment><xsl:text>second term is
> </xsl:text><xsl:value-of
> select="$termsDocument/terms/term[2]"/></xsl:comment>
> <xsl:text>&#10;</xsl:text><xsl:comment><xsl:text>third term is
> </xsl:text><xsl:value-of
> select="$termsDocument/terms/term[3]"/></xsl:comment>
> <!-- now how to I find them in the $textDocument elements and mark them up?
> -->
> </xsl:result-document>
> </xsl:template>
> </xsl:stylesheet>



--
Regards,
Mukul Gandhi

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.