[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML schema xs:string and non BMP character like 𐌀,
FWIW, I agree with your assessment. Pete Cordell Codalogic Ltd Twitter: http://twitter.com/petecordell Interface XML to C++ the easy way using C++ XML data binding to convert XSD schemas to C++ classes. Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com for more info ----- Original Message ----- From: "Martin Honnen" <Martin.Honnen@gmx.de> To: "xml-dev" <xml-dev@lists.xml.org> Sent: Friday, October 12, 2012 11:16 AM Subject: XML schema xs:string and non BMP character like 𐌀, length restriction > Hi, > > I am seeing inconsistencies between different schema validating parsers > when it comes to Unicode characters outside of the BMP, like 𐌀 for > instance, and length restrictions on xs:string. > > For the sample http://home.arcor.de/martin.honnen/xml/oneCharInstance1.xml > which has the contents > > <?xml version="1.0" encoding="utf-8" ?> > <root> > <test>𐌀</test> > </root> > > the XSV validator and Saxon 9.4 EE don't report any validation errors when > validading against the schema > http://home.arcor.de/martin.honnen/xml/oneCharSchema1.xsd (which has as it > contents > > <?xml version="1.0" encoding="utf-8"?> > <xs:schema attributeFormDefault="unqualified" > elementFormDefault="qualified" > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > <xs:element name="root"> > <xs:complexType> > <xs:sequence> > <xs:element maxOccurs="unbounded" name="test" type="one-char" /> > </xs:sequence> > </xs:complexType> > </xs:element> > <xs:simpleType name="one-char"> > <xs:restriction base="xs:string"> > <xs:length value="1"/> > </xs:restriction> > </xs:simpleType> > </xs:schema> > > ). > > However Xerces Java 2.11 reports "[Error] oneCharInstance1.xml:3:25: > cvc-length-valid: Value '?' with length = '2' > is not facet-valid with respect to length '1' for type 'one-char'." so it > seems to consider the contents of the "test" element as a string with two > characters. > > MSXML 6 and .NET's validating parser report similar errors. > > In my view Xerces and MSXML and .NET get it wrong as in terms of the XML > specification and the schema data type 𐌀 is a single XML character > but I would like confirmation by others on the list before filing bugs. > > > > -- > > Martin Honnen --- MVP Data Platform Development > http://msmvps.com/blogs/martin_honnen/ > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|