[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: two regexp related questions

Subject: Re: two regexp related questions
From: Julian Reschke <julian.reschke@xxxxxx>
Date: Thu, 19 May 2011 22:45:30 +0200
Re:  two regexp related questions
On 2011-05-19 22:24, Imsieke, Gerrit, le-tex wrote:


On 2011-05-19 21:16, Julian Reschke wrote:
On 2011-05-19 20:51, Brandon Ibach wrote:
For 2), if you're using the regex to both validate the input (making
sure it conforms to the required syntax) and parse/extract the
name/value pairs, you might be able to make the job easier by breaking
these two tasks apart. Use the regex as you have it now to validate
the input and then, if it matches, use a shorter regex that matches
just a single name/value pair with analyze-string to do the actual
processing.

-Brandon :)

That's more or less what I do know. But as long as the regex contains a repeating pattern, <xsl:matching-substring> will only be invoked once, and the regex-group function will only return the contents for the last match, right?

I think it depends on the implementation. I couldn't see anything in the spec about what regex-group(3) of ([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))* should be. In Saxon, it's ';e=f' for your example, but in principle it could also be ';c=d'.

As Brandon pointed out, using analyze-string with a repeating pattern
that matches the entire string is not the best approach. There are more
natural approaches that work without recursion. I sketched two of them
below.
..

Wow, thanks for the feedback.


What I did not mention in my mail is that I simplified things; first of all tokenize() won't work, as the separator needs to take context into account (the right hand side can be a quoted string which can contain the ";").

Also, the syntax is slightly more complex; the first component differs from the other components.

What I'm trying to parse is an HTTP header field syntax, shared by header fields like Content-Type or Content-Disposition:

  value = name ( ";" param )*
  name = token
  param = token "=" (token | quoted-string)
  ...

(in IETF ABNF speak).

The actual code I currently have and which works is in

http://greenbytes.de/tech/tc2231/tc2231.xslt

to be applied to

http://greenbytes.de/tech/tc2231/tc2231.xml

I currently have one template for matching the whole expression, which delegates to another one for

( ";" param )*

which itself matches the first param, and then recurses. This probably can be simplified as in your "as" example.

Thanks for the feedback, Julian

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.