[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: tricky string matching
Subject: Re: tricky string matching
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Mon, 14 Mar 2011 10:20:14 +0100
|
Another approach:
- in every element that contains the 'tief' element:
- use analyze-string to replace WS chars with an element (let's call
it 'ws')
- in a second pass, group starting with ws, e.g.,
<ws string=" ">CO<tief>2</tief> => <word>CO<tief>2</tief></word>
- in a third pass, replace word/tief with <alias kw="{word}"> and
word/node() as content
- in the same pass, dissolve word without tief to plain text
Gerrit
On 2011-03-14 09:52, Szabo, Patrick (LNG-VIE) wrote:
Hi,
I'm using XSLT 2 and Saxon 9
Example-snippet from my input:
...
<absatz>text text text text text text text text CO<tief>2</tief> text
text text text text text</absatz>
<absatz>text text text text text text text text H<tief>2</tief>O text
text text text text text</absatz>
...
What i have to do is make this look like this:
...
<absatz>text text text text<alias kw="CO2">CO<tief>2</tief></alias>
text text text text text text</absatz>
<absatz>text text text text<alias kw="H2O">H<tief>2</tief>O</alias>
text text text text text text</absatz>
...
I do have an idea on how to solve this problem but it sounds very
inefficient to me.
What would you suggest ?!
I would compile a list with alle the possible "Strings" like
...
CO2
H2O
...
Then i would make the absatz flat so there are no<tief> anymore.
After that i would tokenize all the text() and see if one of them
matches an entry of my list.
Is there a better way ?!
Kind regards
. . . . . . . . . . . . . . . . . . . . . . . . . .
Patrick Szabo
XSLT Developer
LexisNexis
Marxergasse 25, 1030 Wien
mailto:patrick.szabo@xxxxxxxxxxxxx
Tel.: +43 (1) 534 52 - 1573
Fax: +43 (1) 534 52 - 146
--
Gerrit Imsieke
GeschC$ftsfC<hrer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0 |
|
Atom 0.3 |
|
|