[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Regular expression functions (Was: Re: comments on

Subject: RE: Regular expression functions (Was: Re: comments on December F&O draft)
From: "Steven Noels" <stevenn@xxxxxxxxxxxxxxxx>
Date: Thu, 10 Jan 2002 00:16:45 +0100
regular expressions region
> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of
> Jeni Tennison
> Sent: woensdag 9 januari 2002 23:32
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Regular expression functions (Was: Re:  comments on
> December F&O draft)
>
>
> Hi Steven,
>
> Very interesting :)

thanks Jeni, we hope so :-)

> Could you explain a little more about how the matchers work? You call
> them by name - does each of them search over the entire string, or do
> later matchers only match on what's left after matching the earlier
> ones? Did you try any other designs? What made you choose this one?

the principal matcher included with the root <element> is matched
against the entire input document

depending on its outcome, nodes (atts and elems) are generated, or
additional matchers are called: implicitely on the entire matched region
(for-each like), or explicitely using regex groups (comparable to the
tokenization of requests in Cocoon):each "parenthesized" pattern region
can be addressed individually using an integer

this way, you can define which matcher has to be applied to which region

> > One of the things which doesn't work well currently is the
> > specification of the regex as an attribute to the <matcher> element.
> > We will avoid this by putting the regex inside a CDATA section of a
> > <regex> subelement (will be optional, we are testing this right
> > now). Not sure whether this is good practice, advice welcome. It is
> > only partially related to this discussion of course.
>
> I can see why you'd want to do that, given that you're matching HTML
> tags. Note that you're doing more escaping than you have to in the
> attribute value, though. Consider:

yes, I got lazy after a while and started to escape everything ;-)

> delimited the attribute with single-quotes. So you could have:
>
> <matcher
> regex='CLASS="story3">([^&lt;]+)&lt;BR>&lt;/SPAN>&lt;/FONT>&lt
> ;/STRONG>
> &lt;FONT\sCOLOR="#333333"\sFACE="sans-serif,\sarial">&lt;SPAN\
> sCLASS="s
> tory">([^&lt;]+)&amp;nbsp;(.+)&lt;A\sHREF="([^"]+)">More'
> name="items">

I find this mixture even less readible somehow :-) but on the ' and ",
you are absolutely correct - it was just my XML IDE that uses double
quotes by default

> But I agree - if you've got regular expressions like this, it's best
> to put them in an element where you can use CDATA sections to at least
> make it look like the stuff you're matching.

and that is what we will do - a pity one cannot declare an attribute of
being CDATA type in the sense of CDATA sections on the document content
level

> For XSLT, I think that attributes are more natural because attributes
> are used for this kind of thing elsewhere (matching nodes, for

indeed, and exactly the reason why we started off with atts for our
regexes

> instance). It would be handy if the regular expressions could be held
> in (global) variables because then they could be defined in content
> (with CDATA sections) rather than in an attribute. However, that would
> run up against the dynamic regular expression problem that David and I
> talked about yesterday. I don't think it'll be too big a problem,
> though - the regular expressions in XSLT are likely to be a lot
> smaller than these, and not include tags (hopefully!).

I will try to read and understand your discussion - because we already
thought of storing the regexes in such a way but threw that idea away
because it was affecting the readability of the regexslt
transformationsheet

I like all parameters to a certain action to be contained in the same
area, and storing the regexes inside 'global variables' would conflict
with that

thanks for your reaction,

Steven Noels
http://outerthought.org/
(+32)478 292900



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.