RE: [xsl] XSLT 2.0 : Unicode hex notation in regular expres

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: RE: XSLT 2.0 : Unicode hex notation in regular expressions
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Thu, 12 Aug 2004 12:12:08 +0100

The notation \u1234 is not supported in XPath 2.0 regular expressions. Use
&#x1234; instead.

Michael Kay
 

> -----Original Message-----
> From: Pierrick Brihaye [mailto:pierrick.brihaye@xxxxxxxxxx] 
> Sent: 12 August 2004 10:38
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  XSLT 2.0 : Unicode hex notation in regular expressions
> 
> Hi,
> 
> I don't know if my XSLT syntax is wrong or if it is a Saxon-related 
> problem. Let's blame the XSLT writer rather than the XSLT processor 
> first ;-)
> 
> Given the following XML :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <text>livre : ????</text>
> 
> And the following XSLT :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="2.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>    <xsl:template match="/text">
>      <xsl:comment><xsl:value-of 
> select="system-property('xsl:vendor')" 
> /></xsl:comment>
>      <words>
>        <xsl:for-each select="tokenize(.,'\s+')">
>          <word>
>            <xsl:attribute name="language">
>              <xsl:choose>
>                <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when>
>                <xsl:when 
> test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when>
>                <xsl:otherwise>whatever</xsl:otherwise>
>              </xsl:choose>
>            </xsl:attribute>
>            <xsl:attribute name="codepoints"><xsl:value-of 
> select="string-to-codepoints(.)"/></xsl:attribute>
>            <xsl:value-of select="."/>
>          </word>
>        </xsl:for-each>
>      </words>
>    </xsl:template>
> </xsl:stylesheet>
> 
> I get :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!--SAXON 8.0 from Saxonica-->
> <words>
>    <word language="latin" codepoints="108 105 118 114 
> 101">livre</word>
>    <word language="arabic" codepoints="58">:</word>
>    <word language="whatever" codepoints="1603 1578 1575 
> 1576">????</word>
> </words>
> 
> Why this curious match for codepoint 58 ? And why no match for the 
> arabic characters ?
> 
> BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by 
> http://www.unicode.org/reports/tr18/#Hex_notation
> 
> But Saxon returned the following error :
> 
> Error at xsl:when on line 11 of file:/C:/...:
>    net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at 
> character 2 in regular expression: bad escape sequence
> 
> That's why I doubled the "\" character. Is this doubling 
> spec-compliant ?
> 
> Cheers,
> 
> p.b.

Current Thread
Re: XSLT 2.0 : Unicode hex notation in regular expressions, (continued) David Carlisle - 12 Aug 2004 11:15:32 -0000 Pierrick Brihaye - 12 Aug 2004 17:38:29 -0000 David Carlisle - 12 Aug 2004 11:18:51 -0000 Michael Kay - 12 Aug 2004 11:20:33 -0000 Michael Kay - 12 Aug 2004 11:12:57 -0000 <=

<- Previous	Index	Next ->
RE: XSLT 2.0 : Unicode hex no, Michael Kay	Thread	two level grouping, Martina Kinzl
Re: recursivity and param, David Carlisle	Date	Re: XSLT 2.0 : Unicode hex no, David Carlisle
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >