[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Regular expression functions (Was: Re: comments on

Subject: RE: Regular expression functions (Was: Re: comments on December F&O draft)
From: "Chris Bayes" <chris@xxxxxxxxxxx>
Date: Mon, 14 Jan 2002 15:02:48 -0000
regexp test xsl
 
> 
> Chris,
> 
> > I've been a bit tied up with one thing and another (and I think you 
> > might have discussed this before) but aren't regex matches just 
> > predicates on text nodes ala <xsl:template match="text()['\(.*\)']">
> >         <x><xsl:apply-templates select=".[1]" /></x>
> > </xsl:template>
> > Which applies templates to whatever is not matched (child 
> texts) (but
> > which matches the template).
> 
> Not all strings that you might deal with are text nodes, so I 
> think that you need to provide something that allows you to 
> match other strings as well. Indeed, your example above 
> demonstrates this - when you do .[1], then presumably you're 
> applying templates to the matched substring of the current 
> text node. I think that there are three possibilities:
> 
>   - assume that when you apply templates to a string, it's
>     automatically converted to a text node, and apply templates to
>     that
>   - open up normal templates so that they can match things other than
>     nodes

What is wrong with that? A template that matches text is pretty much the
end of the line anyway.

>   - introduce specific regexp templates
> 
> > So that template on a text node
> > "(a(b(c)d)e)" (assuming greedy)would produce
> > <x>
> >   a 
> >   <x>
> >     b
> >     <x>
> >      c
> >     </x>
> >     d
> >   </x>
> >   e
> > </x>
> 
> Unfortunately, assuming greedy, (a)(b) would produce:
> 
>   <x>a)(b</x>
> 
Yeh but it doesn't have to be greedy.

<xsl:template match="\((.*?)\)(.*)">
	<x><xsl:apply-templates select=".[1]" /></x>
	<xsl:apply-templates select=".[2]" />
</xsl:template>
Or
<xsl:template match="\((.*?)\)">
	<x><xsl:apply-templates select=".[1]" /></x>
	<xsl:apply-templates select="$'" />
</xsl:template>

> which is probably not what you want. This is why I suggested 
> the bracket-balancing tokenize() function. For example, you'd have:
> 
>   <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" />
> 
> and then:
>   
> <xsl:regexp-template match="\((.*)\)">
>   <x>
>     <xsl:apply-regexp-templates
>       select="tokenize(current-match()[1], '\(', '\)')" />
>   </x>
> </xsl:regexp-template>
> 
> would give:
> 
>  <x>a<x>b<x>c</x><x>d</x></x>e</x>
> 
> > Maybe it's rubbish but it doesn't look too alien to me. What other 
> > useful predicates can you put on a text node?
> 
> Commonly, I'd guess:
> 
>   text()[1]
>   text()[normalize-space()]
>   text()[starts-with(., 'foo')]
>   text()[contains(., 'foo')]
> 
> The second one is the one that would clash with what you're 
> suggesting (where any string used as the predicate to a text 
> node acts as an implicit regexp test on the value of the text node).

Yeh but they are integers or booleans except 2 which would be false for
<x>a  b</x> hmmmm
> 
> But you could always have a test() function that does the 
> test explicitly instead:
> 
>   text()[test('\(.*\)')]
> 
> Or the other option is to have a special syntax to refer to a 
> regular expression, 

You mean like text()['regexp']
Which can't be confused with text()[normalize-space()]

> or even to make regular expressions first 
> class objects.
> 
> > Surely it isn't going to clash with anything. There are nearly 1000 
> > pages of wd's to look at here so looking at it another way is there 
> > anything that says that . can't be a sequence and that I 
> can't index 
> > into it with .[x]?
> 
> . is defined as being the context item (or a singleton 
> sequence containing the context item, 

Which it would be for a node but for a regex it wouldn't be.

> depending on how you 
> want to view it), so logically .[2] should never return 
> anything. Currently, as in XPath 1.0, . is an abbreviated 
> step and cannot take any StepQualifiers (which includes predicates).
> 
> The way I (and I think David) was thinking, you'd use 
> current-match() or some other function to get information 
> about the subexpression matches when you were inside the 
> template. So perhaps:
> 
>   current-match()[x]
> 
> rather than .[x].

Well if you like typing ;-)

Ciao Chris

XML/XSL Portal
http://www.bayes.co.uk/xml


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.