[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: {} quantifiers in regex

Subject: Re: {} quantifiers in regex
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Sun, 13 Jan 2008 02:24:47 +0100
Re:  {} quantifiers in regex
Geert Bormans wrote:

If I change it to this (removing \d{2} in favour of \d\d)

[...]
it works

Am I overlooking something?

The regex attribute of analyze-string is an AVT. Now accolades have a special meaning in both an AVT and a regular expression and to use an accolade in any AVT without it being interpreted as the start/end of an expression is to double it. Because accolades are are use often in regexes and because their contents is usually a number, the result is not an illegal AVT:


\d{2}

is interpreted as the regular expression:

\d2

which will quite likely match sometimes and sometimes not, but not when you want it. The resulting behavior has all the features of a buggy regular expression parser which in fact is a buggy expression itself... ;)

Because I used to make this mistake often (and because escaped quotes and doubled accolades look ugly), I started to put the regular expression into a variable in all but the most trivial cases. The added benefit of this is that you can now use comments in a regular expression:

<xsl:variable name="regex" as="xs:string">
     \d        <!-- a digit -->
     {2}     <!-- must occur twice and only twice -->
</xsl:variable>
<xsl:analyze-string regex="{$regex}" flags="x">
  ...
</

Note the use of the 'x' modifier, which is necessary here. Regular expressions have the tendency to be the most unreadable of existing mini-languages, so comments and whitespace are often very welcome. The as="xs:string" is there because we don't need a document node but a string.

For the fun of it and to complete this little story, note that in the world of obfuscation a lot is possible, if you set your mind to it. If you want it and you like fun code, you *can* put comments inside a regular expression (but only inside an AVT) using the following, imo rather silly construction:

<xsl:analyze-string flags="x" regex="
      \d         {()(: a digit :)}
      {{2}}   {()(: must occur twice and only twice :)}">

The () is because an xpath cannot be an empty string. The (: and :) are, of course, the comment delimiters for an XPath 2.0 expression. I don't know about other's opinions on this, but from my point of view, this doesn't add much to readability, so I still prefer the "best practice" of putting the regex in a variable (what aids to that decision is that some XSLT 2.0 processors do not allow the smiley comments).

Cheers,
-- Abel Braaksma

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.