[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Processing two documents, which order?

Subject: Re: Processing two documents, which order?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Fri, 8 Apr 2011 10:49:58 +0200
Re:  Processing two documents
On 8 April 2011 09:15, Dave Pawson <davep@xxxxxxxxxxxxx> wrote:
>
> > > Given
> > >      <property>absolute-position</property>
> > >      <property>bottom</property>
> > >      <property>left</property>
> > >      <property>right</property>
> > >      <property>top</property>
> > > as the input... what would the keys look like?
> >
>
> The 'list to be marked up' is as above
> The other document is xml, containing, in other elements those words
>
> Required output
>
> <para> Blah blah blah <property>right</property>
>
> 'items' must be followed by [\s\p{{P}}]  so left-handed doesn't get
> marked up  etc.

If, given "left", "left-handed" should not match, the set of stoppers must
include space and non-letters (\PL) and not punctuation characters (\pP).
If a regular expression is used, the pattern may also have to include the
anchor $.

And, possibly the symmetric pattern (using '^') should precede the pattern.

I'm not at all sure whether a regular expression substitution applied to text
nodes in their entirety would not be able to compete with any other approach.
A simple algorith can be used to optimize the regular expression, away from
the "brute force" pattern joining all words with '|'.

Example:
Given the words

   bee-bonnet-bounce-bounty-burn-burst-sea-seal

the optimized and anchored regex is

  (^|\s|\p{P})((?:b(?:ee|o(?:nnet|un(?:ce|ty))|ur(?:n|st))|sea(?:|l)))($|\s|\
p{P})

Here is a text:

   <p>Bee in my bonnet bounces from bounty. Burst on a bee-line into
the sea as a seal</p>

Applying global case-insensitive substitution with $1<x>$2</x>$3 produces:

   <p><x>Bee</x> in my <x>bonnet</x> bounces from <x>bounty</x>.
<x>Burst</x> on a <x>bee</x>-line into the <x>sea</x> as a
<x>seal</x></p>

Disclaimer: My XSLT skills aren't sufficient to create the optimized
regex from the word list. If someone is interested enough, I can
provide the details.

-W

>
>
> regards
>
>
>
>
> --
>
> regards
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> http://www.dpawson.co.uk

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.