[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Brain Teaser: Element Author is of type xsd:string, what's
Roger Costello writes: > Suppose element Author is declared to be of type xsd:string. In > each example below the Author element has invalid content: If you want to be picky, there's a subtlety here. To allow for direct validation of things like DOMs and SAX strings, which may have originated directly in a program as opposed to by parsing an XML document, XML Schema is defined to validate Infosets, not documents. The subtlety is that, as far as I know, characters like � are indeed not legal in serialized XML documents, but the corresponding characters (e.g. 0x0) are legal in XML Infosets! I'm indebted to Richard Tobin for having pointed this out to me some years ago. So, insofar as your examples are as they appear to be, and really are serialized XML documents with angle brackets etc., then the problem you have is not one of invalidity, but of having documents that are not well formed. A conforming XML parser will thus decline to interpret them as XML, and thus presumably decline to prepare any Infoset for validation. If instead you meant to construct what the Infoset recommendation calls a synthetic Infoset [1], e.g. by consing up a DOM in your application and stuffing a null into some of its text nodes, and if you ask XML Schema to validate that, then XML Schema would indeed report the content as invalid lexical forms for xs:string. Don't you just love this stuff? I told you it was subtle. At least in this case, the subtlety does not come particularly from XML Schema, but more from XML's decision to make some characters illegal, compounded by the Infoset's policy of not restricting Infosets to be information items that could have resulted from the parse of a conforming XML document. I'm not saying those are necessarily bad decisions, but they do cause complexity in analyzing the examples you give. For the record, I am not expert in the particular character ranges that can and cannot appear in XML 1.x documents, and I have not taken the trouble to check the particulars of the ranges you give below. I presume they're right. Noah [1] http://www.w3.org/TR/xml-infoset/#intro.synthetic -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@m...> 02/09/2007 12:06 PM To: <xml-dev@l...> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: Brain Teaser: Element Author is of type xsd:string, what's an illegal value of Author? Thanks Michael and Noah. I would like to summarize. 1. This is the set of legal characters in XML 1.0 #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* excluding the surrogate blocks, FFFE, and FFFF. */ 2. This is the set of legal characters in XML 1.1 [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* excluding the surrogate blocks, FFFE, and FFFF. */ 3. These are the characters which are not legal, regardless of whether XML 1.0 or XML 1.1 is used: #x0 [#xD800-#DFFF] [#xFFFF-#xFFFFF] [#x110000-#xFFFFFF] #xFFFE #xFFFF 4. Suppose element Author is declared to be of type xsd:string. In each example below the Author element has invalid content: <Author>�</Author> <Author></Author> <Author>󿿰</Author> <Author>󿿱</Author> 5. I tested this with XML Spy and Oxygen XML. Here are the results: XML Spy Oxygen XML --------------------------------------------------------------- <Author>�</Author> Valid Invalid <Author></Author> Valid Invalid <Author>󿿰</Author> Valid Valid <Author>󿿱</Author> Valid Valid XML Spy incorrectly validates the data in all four cases. Oxygen XML correctly validates the data in two cases, and incorrectly validates the data in the other two cases. Is the above summary correct? /Roger -----Original Message----- From: noah_mendelsohn@u... [mailto:noah_mendelsohn@u...] Sent: Friday, February 09, 2007 10:48 AM To: Michael Kay Cc: Costello, Roger L.; xml-dev@l... Subject: RE: Brain Teaser: Element Author is of type xsd:string, what's an illegal value of Author? Michael Kay writes: > There's a W3C note on handling XML 1.1 with XML Schema: > > http://www.w3.org/TR/xml11schema10/ > > and it recommends (see the last line of the note) that with that > combination, the definitions of the built-in types should be "stretched" to > accommodate the characters allowed in XML 1.1. With that strategy, there > will never be an invalid instance of xs:string. Indeed. That's about the best we could do without appearing to make a retoactive incompatible change to Schema 1.0. With Schema 1.1, such incompatibilities are less of an issue. Note that the latest public working draft of Schema 1.1 says [1]: "[XML Schema: Datatypes] defines some datatypes which depend on definitions in [XML 1.1] and [XML-Namespaces 1.1]; those definitions, and therefore the datatypes based on them, vary between version 1.0 ([XML 1.0], [XML-Namespaces 1.0]) and version 1.1 ([XML 1.1], [XML-Namespaces 1.1]) of those specifications. In any given schema-validity-·assessment· episode, the choice of the 1.0 or the 1.1 definition of those datatypes is implementation-defined." The working draft for Schema 1.1 Datatypes provides more details [2]: "This specification defines some datatypes which depend on definitions in [XML] and [Namespaces in XML]; those definitions, and therefore the datatypes based on them, vary between version 1.0 ([XML 1.0], [Namespaces in XML 1.0]) and version 1.1 ([XML], [Namespaces in XML]) of those specifications. In any given use of this specification, the choice of the 1.0 or the 1.1 definition of those datatypes is implementation-defined. "Conforming implementations of this specification may provide either the 1.1-based datatypes or the 1.0-based datatypes, or both. If both are supported, the choice of which datatypes to use in a particular assessment episode should be under user control. Note: When this specification is used to check the datatype validity of XML input, implementations may provide the heuristic of using the 1.1 datatypes if the input is labeled as XML 1.1, and using the 1.0 datatypes if the input is labeled 1.0, but this heuristic should be subject to override by users, to support cases where users wish to accept XML 1.1 input but validate it using the 1.0 datatypes, or accept XML 1.0 input and validate it using the 1.1 datatypes. " Regarding "string" in particular, the Datatypes draft says [3]: "It is implementation-defined whether an implementation of this specification supports the Char production from [XML], or that from [XML 1.0], or both. See Dependencies on Other Specifications (§1.3)." So I think we've been quite careful with the details there, and Schema 1.1 can indeed be used to validate as strings the characters that Roger mentions. Note that there are some characters that are not allowed in either XML 1.0 or XML 1.1 infosets, and thus are disallowed even in Schema 1.1 strings. I believe I'm correct that NUL (0x0) is one of these. Noah [1] http://www.w3.org/TR/xmlschema11-1/#intro1.1 [2] http://www.w3.org/TR/xmlschema11-2/#intro-relatedWork [3] http://www.w3.org/TR/xmlschema11-2/#string -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@l... subscribe: xml-dev-subscribe@l... List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|