[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Regular expression functions (Was: Re: comments on

Subject: RE: Regular expression functions (Was: Re: comments on December F&O draft)
From: "Marc Portier" <mpo@xxxxxxxxxxxxxxxx>
Date: Sat, 12 Jan 2002 12:38:38 +0100
marc portier
Hi Jeni,

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Jeni Tennison
> Sent: vrijdag 11 januari 2002 12:25
> To: Marc Portier
> Cc: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Regular expression functions (Was: Re:  comments on
> December F&O draft)
>
>
> Hi Marc,
>
> > assume we have some :z: == (c1)(:x:){2} then the selection of index
> > x[2] would have no meaning, since there is only one x noted in the
> > regex
> >
> > and in normal regex behavior the numbered index 2 (2nd parenthesis) will
> > only hold the second occurence of the :x: matching part of the
> string... it
> > is as writing (c1):x:(:x:)
>
> That's an interesting point. Assuming x matched 'c2', then that would
> mean a structure of:
>
>   <z>
>     <rxp:match>c1</rxp:match>
>     c2
>     <x>c2</x>
>   </z>
>

(refering to the nested-regex vs nested-matcher discussion)
I should check it out, but I'm really afraid the matchresult-groups[] here
would actually be in the case of a (c1)(:x:){3} with :x: going for c2:
[0] c1c2c2c2
[1] c2	(the last of the 3)

and even the start-end positions would not be of more help... it's the regex
engines way of saying you should write it differently if you want it to
behave differently

getting it into
  <z>
    <rxp:match>c1</rxp:match>
    c2c2
    <x>c2</x>
  </z>

leaving litle xpath-natural feeling for getting to 1st or 2nd 'c2'... which
might be against natural xslt feelings?

and it only gets worse when adding {n,m} kind of things in there :-(

somewhere internally the regex engines need to know about the earlier
matches though... different notations only tell it, it can forget about
it...

> > this is how regexes are working I'm afraid... (other hand, the
> > notations :z: == (c1)(:x:)(:x:) and/or :z: == (c1)((:x:){2}) would
> > possibly tackle what you really need)
>
> Yes - with the second of these, you would get something like:
>
>   <z>
>     <rxp:match>c1</rxp:match>
>     <rxp:match>
>       c2
>       <x>c2</x>
>     </rxp:match>
>   </z>
>
> which would at least allow you to get the result of the two xs
> combined.

yep.

>
> > oh and by the way, I started of this :subregex: notation, based on bad
> > memory of long-past perl days
> > just opened some doc again, and understand now that it used to be the
> > [:name:] notation for the posix characters... with added
> possible stuff like
> > [:^name:] and the like
>
> Hmm... Perl uses that notation for named character classes. The
> equivalent in the XML Schema regular expression language is roughly:
>
>   \p(name)     (characters in the named class)
>   \P(name)     (characters not in the named class)
>
> That's a different kind of thing to what we're doing here (where the
> named expressions are complete regular expressions rather than
> character classes). I'd be tempted to introduce a different escape
> character to do it, for example e (for expression):
>
>   \e(name)     (the named subexpression)
>   \E(name)     (not the named subexpression, if that's appropriate?)
>

waw, great idea, sounds like something to propose/bounce off on some perl
mailinglist as well...

> So something like:
>
>   \e(mantissa)\e(exponent)?
>
> > revoking my own introduction: maybe $name makes more sense in any
> > case?
>
> Using $name in the regular expression might be confusing - you'd need
> to make sure you could detect the end of the name, so probably ($name)
> would be better. (I think that if $ is introduced as matching the end
> of the string then you could safely state that it only matched the end
> of the string if it was at the end of the regular expression.)
>
> So something like:
>
>   ($mantissa)($exponent)
>
> I'd suggest {$name}, but only if regular expression support wasn't
> ever available through functions (because {$name} looks a lot like an
> AVT, and would make people think that they could put AVTs in
> attributes that held expressions).
>
> If the references look like variable references then they should
> probably be set with variable-binding elements (e.g. xsl:variable).

yep, also assuming you read and go allong with the remark on parenthesis in
these variables to be
litterally matched as \( and \) ?
and thus keep these next to the regexnesting with \e()

>
> Cheers,
>
> Jeni
>
> ---
> Jeni Tennison
> http://www.jenitennison.com/
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.