[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Search and replace many strings that may not be presen

Subject: Search and replace many strings that may not be present in target
From: Zack Brown <zbrown@xxxxxxxxxxxxxxx>
Date: Fri, 17 May 2002 18:43:02 -0700
search replace many strings
Hi folks,

I'm trying to reproduce a feature using XSLT that I had working when I used my
deeply broken home-grown XML parser. I'm moving to 'xsltproc', and GNU Make,
which has so far shown itself equal to all challenges (thanks to some help ;-).

Situation:

I have a number of files that each contain a root element <kc>, with a number
of <section> elements. Within each <section> element there may be a number
of <quote who="firstname lastname">text</quote> elements.

Several instances of the raw text "firstname lastname" may also appear
in the raw text of each <section> tag. A "firstname lastname" text is only
significant to this feature if it has also appeared identically in a <quote>'s
"who" attribute in at least one of the files under consideration.

Problem:

Here is the feature: for each <section> tag in each file, I would like
to do a search and replace on the first occurrence of each "firstname
lastname" appearing in raw text.

Example:

Assume that the <quote>'s "who" attributes in the various files have named
"Tom Jones", "Terry Haywood", and "Isaac Asimov". And assume the
following <section> tag in one of the files:

------sample input------
<section>

<p>this is a section containing a name, George Eliot, that has not
appeared in a &lt;quote&gt; tag. Therefore it will not be acted on by
this feature.</p>

<p>This paragraph contains a &lt;quote&gt; tag naming Isaac Asimov,
thus: <quote who="Isaac Asimov">And here he is saying something. Hi
Mom!</quote></p>

<p>this paragraph contains a reference to Terry Haywood, who appears in
a &lt;quote&gt; tag in a different file. Here is another reference to
Isaac Asimov, but it should not be matched, because only the first
occurrence of a given name in a section should be matched.</p>

</section>
------------------------

In the above sample, only Isaac Asimov and Terry Haywood should be
identified. Tom Jones does not appear in the sample, so the search-and-replace
will not find him. Also, George Eliot appears in the sample, but is not in
the list of names that have appeared in <quote> tags in one of the files,
so she will also not be found by the search and replace. Assuming that the
search and replace will insert a link to another page corresponding to the
name, then the output from the sample input would look like this:

---- sample output -----
<section>

<p>this is a section containing a name, George Eliot, that has not
appeared in a &lt;quote&gt; tag. Therefore it will not be acted on by
this feature.</p>

<p>This paragraph contains a &lt;quote&gt; tag naming Isaac Asimov [<a
href="people/Isaac_Asimov.html">*</a>],
thus: <quote who="Isaac Asimov">And here he is saying something. Hi
Mom!</quote></p>

<p>this paragraph contains a reference to Terry Haywood [<a
href="people/Terry_Haywood.html">*</a>], who appears in a &lt;quote&gt;
tag in a different file. Here is another reference to Isaac Asimov, but it
should not be matched, because only the first occurrence of a given name in
a section should be matched.</p>

</section>
------------------------

Partial solution:

The assumption I've been making is that I will do a first pass through
all files to create metafiles, containing lists of all names appearing
in <quote> tags in all files. Then these files will be concatenated into
a single XML file.

I will then do a second pass, in which I process all files for HTML output. The
XSLT will also use document() to read in the large file just created. That
will theoretically give it all the data it needs to do the search and replace.

At that point my ideas break down. I can think of some very slow
solutions, but nothing that would be feasible for a situation in which
there are hundreds of files and thousands of names and a pentium III
processor.

Thanks a lot for any help.

Zack

-- 
Zack Brown

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.