[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Regular expression functions (Was: Re: comments on
Chris, > I've been a bit tied up with one thing and another (and I think you > might have discussed this before) but aren't regex matches just > predicates on text nodes ala > <xsl:template match="text()['\(.*\)']"> > <x><xsl:apply-templates select=".[1]" /></x> > </xsl:template> > Which applies templates to whatever is not matched (child texts) (but > which matches the template). Not all strings that you might deal with are text nodes, so I think that you need to provide something that allows you to match other strings as well. Indeed, your example above demonstrates this - when you do .[1], then presumably you're applying templates to the matched substring of the current text node. I think that there are three possibilities: - assume that when you apply templates to a string, it's automatically converted to a text node, and apply templates to that - open up normal templates so that they can match things other than nodes - introduce specific regexp templates > So that template on a text node > "(a(b(c)d)e)" (assuming greedy)would produce > <x> > a > <x> > b > <x> > c > </x> > d > </x> > e > </x> Unfortunately, assuming greedy, (a)(b) would produce: <x>a)(b</x> which is probably not what you want. This is why I suggested the bracket-balancing tokenize() function. For example, you'd have: <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" /> and then: <xsl:regexp-template match="\((.*)\)"> <x> <xsl:apply-regexp-templates select="tokenize(current-match()[1], '\(', '\)')" /> </x> </xsl:regexp-template> would give: <x>a<x>b<x>c</x><x>d</x></x>e</x> > Maybe it's rubbish but it doesn't look too alien to me. What other > useful predicates can you put on a text node? Commonly, I'd guess: text()[1] text()[normalize-space()] text()[starts-with(., 'foo')] text()[contains(., 'foo')] The second one is the one that would clash with what you're suggesting (where any string used as the predicate to a text node acts as an implicit regexp test on the value of the text node). But you could always have a test() function that does the test explicitly instead: text()[test('\(.*\)')] Or the other option is to have a special syntax to refer to a regular expression, or even to make regular expressions first class objects. > Surely it isn't going to clash with anything. There are nearly 1000 > pages of wd's to look at here so looking at it another way is there > anything that says that . can't be a sequence and that I can't index > into it with .[x]? . is defined as being the context item (or a singleton sequence containing the context item, depending on how you want to view it), so logically .[2] should never return anything. Currently, as in XPath 1.0, . is an abbreviated step and cannot take any StepQualifiers (which includes predicates). The way I (and I think David) was thinking, you'd use current-match() or some other function to get information about the subexpression matches when you were inside the template. So perhaps: current-match()[x] rather than .[x]. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|