[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Schema 1.1 xpath 2.0 regex question

  • From: Michael Kay <mike@saxonica.com>
  • To: Mukul Gandhi <mukulg@softwarebytes.org>
  • Date: Fri, 17 Dec 2021 11:09:05 +0000

Re:  XML Schema 1.1 xpath 2.0 regex question
Neither the XSD nor the XPath regex syntax permits \x. If Xerces accepts it, then it's a non-conformant extension. You'll probably find it works in Saxon if you use the "j" flag, which is also a non-conformant extension - it switches from using the XSD regex syntax to the Java regex syntax.

The conformant way to write this in XSD is `&#x20;` (but don't use this with the -x flag)

Michael Kay
Saxonica

On 17 Dec 2021, at 11:02, Mukul Gandhi <mukulg@softwarebytes.org> wrote:

Hi all,
   I've another question on the same topic, as follows.

I've following XML instance document,

<?xml version="1.0"?>
<X>
  <a>hello   world</a>
</X>

And the following XML Schema 1.1 document,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:sequence>
             <xs:element name="a" type="xs:string"/>
          </xs:sequence>
          <xs:assert test="matches(a, 'hello[ ]+world')"/>
          <xs:assert test="matches(a, 'hello\x{0020}+world')"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

(the XSD validation requirement is, XML instance string value of element "a" must be word 'hello' followed by one or more space characters and then the word 'world')

The intent of both xs:assert's is same (it's just that, the second xs:assert refers the space character by a unicode code point hex notation as per java's regex convention. the first xs:assert specifies the space character as a literal).

Apache Xerces, doesn't have problems with both the xs:asserts and reports the XML instance document as valid. Where as, Saxon says that second xs:assert has a regex syntax error (it says, "Syntax error at char 7 in regular expression: Escape character 'x' not allowed").

With respect to the XSD validation example provided above, any thoughts, with respect to XML validation correctness, and what the relevant specs say about compliance?

Is it also fine, that Xerces can say as implementation defined feature, "we support specifying characters within XSD 1.1 regex expressions with unicode code point hex notation (\x{...}) ?

I'm also curious to know, does Saxon supports specifying characters within XSD 1.1 regex expressions with unicode code point notation? 


--
Regards,
Mukul Gandhi



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.