[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Regular expression functions (Was: Re: comments on

Subject: Re: Regular expression functions (Was: Re: comments on December F&O draft)
From: David Carlisle <davidc@xxxxxxxxx>
Date: Sat, 12 Jan 2002 17:04:21 GMT
regular expression bold


>   \para{\italic{this} is \bold{bold \italic{and italic}} text.}

Ohh looks just like TeX, we'll get you using that yet...

I can think of two ways of attacking the above with regexp.

* Plan A (which is the way I'd do it in emacs) is to 

have a regexp replace

\(\\[a-z]*\){\([^{}*]\)}  to <\1>\2</\1>

This matches innermost groups first, they don't have any nested {} so
you can easily find the matching }.
As the replace also removes the {} you just need a loop which terminates
once the regexp no longer matches, so the replacements go

\para{\italic{this} is \bold{bold \italic{and italic}} text.}

\para{<italic>this</italic> is \bold{bold <italic>and italic</italic>} text.}

\para{<italic>this</italic> is <bold>bold <italic>and italic</italic></bold> text.}

<para><italic>this</italic> is <bold>bold <italic>and italic</italic></bold> text.</para>

(generated the above using emacs:-)

That's fine but requires that either you consider the XML markup just to
be part of the string (which is what I did here but what we want to
avoid in XSLT) or that your regexps can match across mixed content
models ie instead of [^{}]*  meaning any character other than a brace
you'd need something that says any character-or-node other than a brace.

The alternative to Plan A is of course:

Plan 2:
work from the outside in: (This is the way I'd do it in omnimark)
Basically the plan here is not to try to match a whole matching brace
clause but just to match each start and end in turn, maintaining a
counter that increments on { and decrements on } so you know what
matches with what.

It's a bit hard to fit that counter model into the XSLT world view but
there is a variant, 

plan 2':
I suspect that one way to attack this in xslt2 is just to have two
simple regexp replaces

\\\([a-z]*\){  -> <start name="\1"/>

}              -> <end/>

so after doing the regexp matching I'd have:

<start name="para"/><start name="italic"/>this<end/> is <start name="bold"/>bold <start name="italic"/>and italic<end/><end/> text.<end/>

so now we've got rid of that flat string and replaced it by something
that's still flat but is mixed content with  empty element nodes and

Getting from that flat mixed content to a hierarchical element tree is
just the famous xslt grouping problem which a typical Gumbie Cat ought
to be able to do in her sleep, especially if given the xslt2 grouping

So while I'm tempted to see if plan A can be made to work as  the the
two stage plan 2' doesn't seem so clean in some ways. I suspect that
integrating plan 2' would be much simpler, as you wouldn't have to extend
regexp searching to search mixed content, just extend regexp replace so
it can generate mixed content.


This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.