[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Regular expression functions (Was: Re: comments on

Subject: Re: Regular expression functions (Was: Re: comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Mon, 14 Jan 2002 10:11:51 +0000
regex function c
Chris,

> I've been a bit tied up with one thing and another (and I think you
> might have discussed this before) but aren't regex matches just
> predicates on text nodes ala
> <xsl:template match="text()['\(.*\)']">
>         <x><xsl:apply-templates select=".[1]" /></x>
> </xsl:template>
> Which applies templates to whatever is not matched (child texts) (but
> which matches the template).

Not all strings that you might deal with are text nodes, so I think
that you need to provide something that allows you to match other
strings as well. Indeed, your example above demonstrates this - when
you do .[1], then presumably you're applying templates to the
matched substring of the current text node. I think that there are
three possibilities:

  - assume that when you apply templates to a string, it's
    automatically converted to a text node, and apply templates to
    that
  - open up normal templates so that they can match things other than
    nodes
  - introduce specific regexp templates

> So that template on a text node
> "(a(b(c)d)e)" (assuming greedy)would produce
> <x>
>   a 
>   <x>
>     b
>     <x>
>      c
>     </x>
>     d
>   </x>
>   e
> </x>

Unfortunately, assuming greedy, (a)(b) would produce:

  <x>a)(b</x>

which is probably not what you want. This is why I suggested the
bracket-balancing tokenize() function. For example, you'd have:

  <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" />

and then:
  
<xsl:regexp-template match="\((.*)\)">
  <x>
    <xsl:apply-regexp-templates
      select="tokenize(current-match()[1], '\(', '\)')" />
  </x>
</xsl:regexp-template>

would give:

 <x>a<x>b<x>c</x><x>d</x></x>e</x>

> Maybe it's rubbish but it doesn't look too alien to me. What other
> useful predicates can you put on a text node?

Commonly, I'd guess:

  text()[1]
  text()[normalize-space()]
  text()[starts-with(., 'foo')]
  text()[contains(., 'foo')]

The second one is the one that would clash with what you're suggesting
(where any string used as the predicate to a text node acts as an
implicit regexp test on the value of the text node).

But you could always have a test() function that does the test
explicitly instead:

  text()[test('\(.*\)')]

Or the other option is to have a special syntax to refer to a regular
expression, or even to make regular expressions first class objects.

> Surely it isn't going to clash with anything. There are nearly 1000
> pages of wd's to look at here so looking at it another way is there
> anything that says that . can't be a sequence and that I can't index
> into it with .[x]?

. is defined as being the context item (or a singleton sequence
containing the context item, depending on how you want to view it), so
logically .[2] should never return anything. Currently, as in XPath
1.0, . is an abbreviated step and cannot take any StepQualifiers
(which includes predicates).

The way I (and I think David) was thinking, you'd use current-match()
or some other function to get information about the subexpression
matches when you were inside the template. So perhaps:

  current-match()[x]

rather than .[x].

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.