[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Regular expression functions (Was: Re: comments on
David, >> Most regular expression languages don't find overlapping matches, >> do they? It seems to add a lot of extra complexity if they do. > > No, but then they don't return a list of all matches either. Some do, if it's a global match. From some JScript documentation: "If the global flag (g) is not set, Element zero of the array contains the entire match, while elements 1 ? n contain any submatches that have occurred within the match.... If the global flag is set, elements 0 - n contain all matches that occurred." > In Xpath you can't do that. So a replace function that only lets you > replace one set of unstructured input by some more unstructured > output is not particularly useful. I agree with your analysis about regexp replace in general, though it's not altogether useless - when global, at least it goes some way towards helping with the classic multi-string-replacement problem. For example, to escape newline characters with "\n", tabs with "\t" and carriage returns with "\r": replace(replace(replace($string, '
', '\n'), '	', '\t'), '
', '\r') (or more manageably with a simple mapping operator: $string -> replace(., '
', '\n') -> replace(., '	', '\t') -> replace(., '
', '\r') Sorry, couldn't resist.) But as you've illustrated this doesn't help with the other classic in this genre, which is replacing 
 characters with <br /> elements. > If however the match function returned the sequence of substrings > matched or equivalently a sequence of the match positions, then the > string could be broken up and nodes added as required. I think that you need a sequence of match positions *and lengths* in the latter case, to make it possible to pull out the matched string? Hmm... can't helping thinking that these flat sequences are going to processing quite difficult - extracting a list of the matched strings from the sequence would mean: for $i in (1 to count($matches) div 2) return substring($string, $matches[$i], $matches[$i + 1]) or a recursive function, neither of which is particularly practical. On the other hand, I think it's impossible to reliably go from the matched subexpression string to the location of the subexpression within the original string. > Actually it might be interesting (and more in the xpath style) to > allow omnimark style named variable binding (the found-text in the > above) within the serach string which would then be accessed by > normal xpath xpath variable reference, $found-text, in any functions > triggered by the replacement code. You *could* do this implicitly by setting the variables $1..$N, since authors cannot set these variables themselves (invalid names). But either seems a bit messy to me - how do you define the scope, for one thing? Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|