[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: tricky string matching

Subject: Re: tricky string matching
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Mon, 14 Mar 2011 10:20:14 +0100
Re:  tricky string matching
Another approach:

- in every element that contains the 'tief' element:
- use analyze-string to replace WS chars with an element (let's call it 'ws')
- in a second pass, group starting with ws, e.g.,
<ws string=" ">CO<tief>2</tief> => <word>CO<tief>2</tief></word>
- in a third pass, replace word/tief with <alias kw="{word}"> and word/node() as content
- in the same pass, dissolve word without tief to plain text


Gerrit

On 2011-03-14 09:52, Szabo, Patrick (LNG-VIE) wrote:
Hi,

I'm using XSLT 2 and Saxon 9

Example-snippet from my input:

...
<absatz>text text text text text text text text CO<tief>2</tief>  text
text text text text text</absatz>
<absatz>text text text text text text text text H<tief>2</tief>O text
text text text text text</absatz>
...

What i have to do is make this look like this:

...
<absatz>text text text text<alias kw="CO2">CO<tief>2</tief></alias>
text text text text text text</absatz>
<absatz>text text text text<alias kw="H2O">H<tief>2</tief>O</alias>
text text text text text text</absatz>
...

I do have an idea on how to solve this problem but it sounds very
inefficient to me.

What would you suggest ?!

I would compile a list with alle the possible "Strings" like

...
CO2
H2O
...

Then i would make the absatz flat so there are no<tief>  anymore.
After that i would tokenize all the text() and see if one of them
matches an entry of my list.

Is there a better way ?!

Kind regards

. . . . . . . . . . . . . . . . . . . . . . . . . .
Patrick Szabo
  XSLT Developer
LexisNexis
Marxergasse 25, 1030 Wien

mailto:patrick.szabo@xxxxxxxxxxxxx
Tel.: +43 (1) 534 52 - 1573
Fax: +43 (1) 534 52 - 146


-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.