XQuery and \w, \W in regex (Saxon 8)Michael Kay mhk at mhk.me.uk
Wed Nov 16 18:37:46 PST 2005
\W is defined in XML Schema Part 2 to match all characters in the Unicode Punctuation, Separator, and Other categories (P, Z, and C). 201C and 201D are in group P, so on the face of it, you appear to be right. I'll look into it. (Saxon is relying partly on Java for its regular expression matching, but it does preprocess the regex first to ensure conformance with the XPath rules rather than the Java rules.) Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: http://xquery.com/mailman/listinfo/talk > [mailto:http://xquery.com/mailman/listinfo/talk] On Behalf Of David Sewell > Sent: 16 November 2005 17:22 > To: http://xquery.com/mailman/listinfo/talk > Subject: XQuery and \w, \W in regex (Saxon 8) > > Given this code: > > let $string1 := '"quoted"' > let $string2 := "“quoted”" > return > ( replace($string1, "\W", ""), > replace($string2, "\W", "") > ) > > Saxon 8.6b returns > > quoted > "quoted" > > (where the " " in the second line are Unicode curly quotation marks). > > Is this a bug in the regex handling? U+201C and U+201D should > be treated > as separators, no? (Likewise single curly quotes, U+2018 and U+2019; I > haven't tried other punctuation in that code block.) > > -- > David Sewell, Editorial and Technical Manager > Electronic Imprint, The University of Virginia Press > PO Box 400318, Charlottesville, VA 22904-4318 USA > Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903 > Email: http://xquery.com/mailman/listinfo/talk Tel: +1 434 924 9973 > Web: http://www.ei.virginia.edu/ > _______________________________________________ > http://xquery.com/mailman/listinfo/talk > http://xquery.com/mailman/listinfo/talk >
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format