[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Regular expression functions (Was: Re: comments on

Subject: Re: Regular expression functions (Was: Re: comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 8 Jan 2002 15:13:59 +0000
matches regular expression xpath

>> Most regular expression languages don't find overlapping matches,
>> do they? It seems to add a lot of extra complexity if they do.
> No, but then they don't return a list of all matches either.

Some do, if it's a global match. From some JScript documentation:

 "If the global flag (g) is not set, Element zero of the array
  contains the entire match, while elements 1 ? n contain any
  submatches that have occurred within the match.... If the global
  flag is set, elements 0 - n contain all matches that occurred."

> In Xpath you can't do that. So a replace function that only lets you
> replace one set of unstructured input by some more unstructured
> output is not particularly useful.

I agree with your analysis about regexp replace in general, though
it's not altogether useless - when global, at least it goes some way
towards helping with the classic multi-string-replacement problem. For
example, to escape newline characters with "\n", tabs with "\t" and
carriage returns with "\r":

  replace(replace(replace($string, '&#xA;', '\n'),
                  '&#x9;', '\t'),
          '&#xD;', '\r')

(or more manageably with a simple mapping operator:

  $string -> replace(., '&#xA;', '\n')
          -> replace(., '&#x9;', '\t')
          -> replace(., '&#xD;', '\r')

Sorry, couldn't resist.)

But as you've illustrated this doesn't help with the other classic in
this genre, which is replacing &#xA; characters with <br /> elements.

> If however the match function returned the sequence of substrings
> matched or equivalently a sequence of the match positions, then the
> string could be broken up and nodes added as required.

I think that you need a sequence of match positions *and lengths* in
the latter case, to make it possible to pull out the matched string?

Hmm... can't helping thinking that these flat sequences are going to
processing quite difficult - extracting a list of the matched strings
from the sequence would mean:

  for $i in (1 to count($matches) div 2)
  return substring($string, $matches[$i], $matches[$i + 1])

or a recursive function, neither of which is particularly practical.

On the other hand, I think it's impossible to reliably go from the
matched subexpression string to the location of the subexpression
within the original string.
> Actually it might be interesting (and more in the xpath style) to
> allow omnimark style named variable binding (the found-text in the
> above) within the serach string which would then be accessed by
> normal xpath xpath variable reference, $found-text, in any functions
> triggered by the replacement code.

You *could* do this implicitly by setting the variables $1..$N, since
authors cannot set these variables themselves (invalid names). But
either seems a bit messy to me - how do you define the scope, for one



Jeni Tennison

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.