[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XSLT 2.0 : Unicode hex notation in regular expres

Subject: RE: XSLT 2.0 : Unicode hex notation in regular expressions
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Thu, 12 Aug 2004 12:12:08 +0100
unicode hex
The notation \u1234 is not supported in XPath 2.0 regular expressions. Use
&#x1234; instead.

Michael Kay
 

> -----Original Message-----
> From: Pierrick Brihaye [mailto:pierrick.brihaye@xxxxxxxxxx] 
> Sent: 12 August 2004 10:38
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  XSLT 2.0 : Unicode hex notation in regular expressions
> 
> Hi,
> 
> I don't know if my XSLT syntax is wrong or if it is a Saxon-related 
> problem. Let's blame the XSLT writer rather than the XSLT processor 
> first ;-)
> 
> Given the following XML :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <text>livre : ????</text>
> 
> And the following XSLT :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="2.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>    <xsl:template match="/text">
>      <xsl:comment><xsl:value-of 
> select="system-property('xsl:vendor')" 
> /></xsl:comment>
>      <words>
>        <xsl:for-each select="tokenize(.,'\s+')">
>          <word>
>            <xsl:attribute name="language">
>              <xsl:choose>
>                <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when>
>                <xsl:when 
> test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when>
>                <xsl:otherwise>whatever</xsl:otherwise>
>              </xsl:choose>
>            </xsl:attribute>
>            <xsl:attribute name="codepoints"><xsl:value-of 
> select="string-to-codepoints(.)"/></xsl:attribute>
>            <xsl:value-of select="."/>
>          </word>
>        </xsl:for-each>
>      </words>
>    </xsl:template>
> </xsl:stylesheet>
> 
> I get :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!--SAXON 8.0 from Saxonica-->
> <words>
>    <word language="latin" codepoints="108 105 118 114 
> 101">livre</word>
>    <word language="arabic" codepoints="58">:</word>
>    <word language="whatever" codepoints="1603 1578 1575 
> 1576">????</word>
> </words>
> 
> Why this curious match for codepoint 58 ? And why no match for the 
> arabic characters ?
> 
> BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by 
> http://www.unicode.org/reports/tr18/#Hex_notation
> 
> But Saxon returned the following error :
> 
> Error at xsl:when on line 11 of file:/C:/...:
>    net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at 
> character 2 in regular expression: bad escape sequence
> 
> That's why I doubled the "\" character. Is this doubling 
> spec-compliant ?
> 
> Cheers,
> 
> p.b.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.