[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

XSLT 2.0 : Unicode hex notation in regular expressions

Subject: XSLT 2.0 : Unicode hex notation in regular expressions
From: Pierrick Brihaye <pierrick.brihaye@xxxxxxxxxx>
Date: Thu, 12 Aug 2004 11:38:08 +0200
regular expression hex
Hi,

I don't know if my XSLT syntax is wrong or if it is a Saxon-related problem. Let's blame the XSLT writer rather than the XSLT processor first ;-)

Given the following XML :

<?xml version="1.0" encoding="UTF-8"?>
<text>livre : YX*X'X(</text>

And the following XSLT :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/text">
<xsl:comment><xsl:value-of select="system-property('xsl:vendor')" /></xsl:comment>
<words>
<xsl:for-each select="tokenize(.,'\s+')">
<word>
<xsl:attribute name="language">
<xsl:choose>
<xsl:when test="matches(.,'[a-z]+')">latin</xsl:when>
<xsl:when test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when>
<xsl:otherwise>whatever</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
<xsl:attribute name="codepoints"><xsl:value-of select="string-to-codepoints(.)"/></xsl:attribute>
<xsl:value-of select="."/>
</word>
</xsl:for-each>
</words>
</xsl:template>
</xsl:stylesheet>


I get :

<?xml version="1.0" encoding="UTF-8"?>
<!--SAXON 8.0 from Saxonica-->
<words>
  <word language="latin" codepoints="108 105 118 114 101">livre</word>
  <word language="arabic" codepoints="58">:</word>
  <word language="whatever" codepoints="1603 1578 1575 1576">YX*X'X(</word>
</words>

Why this curious match for codepoint 58 ? And why no match for the arabic characters ?

BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by http://www.unicode.org/reports/tr18/#Hex_notation

But Saxon returned the following error :

Error at xsl:when on line 11 of file:/C:/...:
net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at character 2 in regular expression: bad escape sequence


That's why I doubled the "\" character. Is this doubling spec-compliant ?

Cheers,

p.b.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.