[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Regular expression functions (Was: Re: comments on

Subject: Re: Regular expression functions (Was: Re: comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Fri, 11 Jan 2002 11:25:00 +0000
regular expression named class
Hi Marc,

> assume we have some :z: == (c1)(:x:){2} then the selection of index
> x[2] would have no meaning, since there is only one x noted in the
> regex
>
> and in normal regex behavior the numbered index 2 (2nd parenthesis) will
> only hold the second occurence of the :x: matching part of the string... it
> is as writing (c1):x:(:x:)

That's an interesting point. Assuming x matched 'c2', then that would
mean a structure of:

  <z>
    <rxp:match>c1</rxp:match>
    c2
    <x>c2</x>
  </z>

> this is how regexes are working I'm afraid... (other hand, the
> notations :z: == (c1)(:x:)(:x:) and/or :z: == (c1)((:x:){2}) would
> possibly tackle what you really need)

Yes - with the second of these, you would get something like:

  <z>
    <rxp:match>c1</rxp:match>
    <rxp:match>
      c2
      <x>c2</x>
    </rxp:match>
  </z>

which would at least allow you to get the result of the two xs
combined.

> oh and by the way, I started of this :subregex: notation, based on bad
> memory of long-past perl days
> just opened some doc again, and understand now that it used to be the
> [:name:] notation for the posix characters... with added possible stuff like
> [:^name:] and the like

Hmm... Perl uses that notation for named character classes. The
equivalent in the XML Schema regular expression language is roughly:

  \p(name)     (characters in the named class)
  \P(name)     (characters not in the named class)

That's a different kind of thing to what we're doing here (where the
named expressions are complete regular expressions rather than
character classes). I'd be tempted to introduce a different escape
character to do it, for example e (for expression):

  \e(name)     (the named subexpression)
  \E(name)     (not the named subexpression, if that's appropriate?)

So something like:

  \e(mantissa)\e(exponent)?
  
> revoking my own introduction: maybe $name makes more sense in any
> case?

Using $name in the regular expression might be confusing - you'd need
to make sure you could detect the end of the name, so probably ($name)
would be better. (I think that if $ is introduced as matching the end
of the string then you could safely state that it only matched the end
of the string if it was at the end of the regular expression.)

So something like:

  ($mantissa)($exponent)

I'd suggest {$name}, but only if regular expression support wasn't
ever available through functions (because {$name} looks a lot like an
AVT, and would make people think that they could put AVTs in
attributes that held expressions).
  
If the references look like variable references then they should
probably be set with variable-binding elements (e.g. xsl:variable).

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.