Re: Regular expression functions (Was: Re: comments on
Chris, > I've been a bit tied up with one thing and another (and I think you > might have discussed this before) but aren't regex matches just > predicates on text nodes ala > <xsl:template match="text()['\(.*\)']"> > <x><xsl:apply-templates select="." /></x> > </xsl:template> > Which applies templates to whatever is not matched (child texts) (but > which matches the template). Not all strings that you might deal with are text nodes, so I think that you need to provide something that allows you to match other strings as well. Indeed, your example above demonstrates this - when you do ., then presumably you're applying templates to the matched substring of the current text node. I think that there are three possibilities: - assume that when you apply templates to a string, it's automatically converted to a text node, and apply templates to that - open up normal templates so that they can match things other than nodes - introduce specific regexp templates > So that template on a text node > "(a(b(c)d)e)" (assuming greedy)would produce > <x> > a > <x> > b > <x> > c > </x> > d > </x> > e > </x> Unfortunately, assuming greedy, (a)(b) would produce: <x>a)(b</x> which is probably not what you want. This is why I suggested the bracket-balancing tokenize() function. For example, you'd have: <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" /> and then: <xsl:regexp-template match="\((.*)\)"> <x> <xsl:apply-regexp-templates select="tokenize(current-match(), '\(', '\)')" /> </x> </xsl:regexp-template> would give: <x>a<x>b<x>c</x><x>d</x></x>e</x> > Maybe it's rubbish but it doesn't look too alien to me. What other > useful predicates can you put on a text node? Commonly, I'd guess: text() text()[normalize-space()] text()[starts-with(., 'foo')] text()[contains(., 'foo')] The second one is the one that would clash with what you're suggesting (where any string used as the predicate to a text node acts as an implicit regexp test on the value of the text node). But you could always have a test() function that does the test explicitly instead: text()[test('\(.*\)')] Or the other option is to have a special syntax to refer to a regular expression, or even to make regular expressions first class objects. > Surely it isn't going to clash with anything. There are nearly 1000 > pages of wd's to look at here so looking at it another way is there > anything that says that . can't be a sequence and that I can't index > into it with .[x]? . is defined as being the context item (or a singleton sequence containing the context item, depending on how you want to view it), so logically . should never return anything. Currently, as in XPath 1.0, . is an abbreviated step and cannot take any StepQualifiers (which includes predicates). The way I (and I think David) was thinking, you'd use current-match() or some other function to get information about the subexpression matches when you were inside the template. So perhaps: current-match()[x] rather than .[x]. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format