[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: lookaheads in XSLT2 regexes

Subject: RE: lookaheads in XSLT2 regexes
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 4 Mar 2010 17:39:04 -0000
RE:  lookaheads in XSLT2 regexes
I feel that \b is very much tied to a specific set of characters which might
not be exactly the set you want. I'd be more comfortable providing
general-purpose zero-width look-ahead and look-behind:

regex="(?<=\P{L})\p{{Lu}}{{2,}}(?=\P{L})"

which seems far more powerful.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

> -----Original Message-----
> From: Imsieke, Gerrit, le-tex [mailto:gerrit.imsieke@xxxxxxxxx]
> Sent: 04 March 2010 17:12
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  lookaheads in XSLT2 regexes
>
> Dear Liam,
>
> Thanks for promoting the \b case. As an illustration for \b's
> usefulness, let me show how I tag acronyms for a recent project:
>
>    <xsl:template match="text()" mode="majuscules">
>      <xsl:analyze-string select="."
> regex="(^|[\p{{P}}\p{{Z}}\p{{C}}])(\p{{Lu}}{{2,}})([\p{{P}}\p{
> {Z}}\p{{C}}]|$)">
>        <xsl:matching-substring>
>          <xsl:value-of select="regex-group(1)"/>
>          <span class="majusc">
>            <xsl:value-of select="regex-group(2)"/>
>          </span>
>          <xsl:value-of select="regex-group(3)"/>
>        </xsl:matching-substring>
>        <xsl:non-matching-substring>
>          <xsl:value-of select="."/>
>        </xsl:non-matching-substring>
>      </xsl:analyze-string>
>    </xsl:template>
>
> With (a reasonably defined) \b, this could be simplified to
>
>    <xsl:template match="text()" mode="majuscules">
>      <xsl:analyze-string select="." regex="\b\p{{Lu}}{{2,}}\b">
>        <xsl:matching-substring>
>          <span class="majusc">
>            <xsl:value-of select="."/>
>          </span>
>        </xsl:matching-substring>
>        <xsl:non-matching-substring>
>          <xsl:value-of select="."/>
>        </xsl:non-matching-substring>
>      </xsl:analyze-string>
>    </xsl:template>
>
> Please note that \b should not only match the \w/\W boundary,
> but also the beginning or end of the string (or line, when
> the 'm' flag is in force). Speaking of the 'm' flag, and in
> Michael's direction: I regard \b as much more useful than the
> 'm' flag when processing XML.
>
> Gerrit
>
>
>
> On 04.03.2010 06:59, Liam R E Quin wrote:
> > On Wed, 2010-03-03 at 21:27 +0000, Michael Kay wrote:
> >>> On the subject of \b I'll note we do have \W and \w
> >>
> >> So we do, I overlooked that. And we define it a little differently
> >> from
> >> Perl:
> >>
> >> [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
> >>
> >> So for example "+" is regarded as part of a word, while "-" isn't.
> >> Which strikes me as totally useless, to be honest.
> >
> > I agree.
> >
> > We could fix that for XPath 2.1 I think.  I'm not sure what
> the most
> > useful fix would be, I admit.
> >
> > The Perl definition of "alphanumeric" plus "_" would
> probably work for
> > \w, if one took alphnumeric to mean Letters|Numbers,
> \p{L}|\p{N}, and
> > is coincidentally closer to what you get in Perl if you do
> >      use locale;
> > and your locale is (say) en_UK.UTF8, as it's then the same as the
> > POSIX fragment [[:alpha:][:digit:]_]
> >
> > There are lots of things that could be added to regular
> expressions;
> > but \b is hard to emulate, useful, and also we seem to have
> a rather
> > odd \w.  If \w is there, I think \b was omitted by mistake.
>  Or that
> > \w was included by mistake!
> >
> > Liam
> >
>
> --
> Gerrit Imsieke
> Geschdftsf|hrer / Managing Director
> le-tex publishing services GmbH
> Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341
> 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx,
> http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
> Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek, Thomas
> Schmidt, Dr. Reinhard Vvckler

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.