RE: [XSLT2.0] xsl:analyze-string@regex syntax too limi
> Hi, just FYI, I have made a petition to the XSLT and XPath 2.0 > public comments list to remove most of the artificial restrictions > on the regex syntax in the match, replace functions and > analyze-string instructions. It doesn't seem to be there yet... Please note there's no need to comment separately on the two documents. XSLT will automatically pick up any changes made to the XPath functions. > > Michael Kay had to add a pretty complex piece of code to his > Saxon processor just to cripple the available regex syntax which > was previously supported. That's ridiculous. > It's very unlikely that XPath will support the whole of the Java regex syntax, for example the POSIX character classes won't get past the I18N scrutineers. Also, Java regexes match 16-bit UTF16 values, not Unicode characters: so given a character outside the BMP, it counts as two characters in a Java regex but as one character in an XPath regex - a lot of the regex translation code in Saxon is designed to handle such differences, not to remove functionality. So any changes to the XPath syntax won't remove the need for the regex translator. (The translator, incidentally, was written by James Clark to implement the XML Schema regex syntax, and I extended it to handle the XPath extensions.) As I've commented elsewhere, one of the main difficulties in "adding back" further Perl regex features is the need to write an unambiguous specification that is consistent with existing implementations. Writing a spec that turns out to be inconsistent with existing implementations would obviously be a disaster. This always turns out to be more difficult than you think. To take just one example that you want to add, in Perl: " A word boundary (`\b') is a spot between two characters that has a `\w' on one side of it and a `\W' on the other side of it (in either order), counting the imaginary char- acters off the beginning and end of the string as matching a `\W'. (Within character classes `\b' represents backspace rather than a word boundary, just as it normally does in any double-quoted string.) Firstly, that's too informal for the WGs to accept it as written (what is a "spot"? what is an "imaginary character"). Secondly, Perl classifies \b as a "zero-width assertion" but it doesn't say clearly where in the overall scheme of things a zero-width assertion can appear. Thirdly the exception doesn't apply, because backspace isn't a legal XML character. So getting an agreed spec just for \b could easily take an hour of WG time, and the WG is getting pretty impatient about proposals that consume time unless there is a problem that absolutely must be solved. Just warning you... Michael Kay http://www.saxonica.com/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format