[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML 1.0 Conformance Test Results
> In looking at the sun/valid/not-sa02.xml file, I can't find any tokens that > that are separated _only_ by character references to whitespace. You're right, my description was just a shorthand for a more complicated set of problems. Here is the long, historical version. There are two aspects to it: attribute value normalization, and validation of normalized attributes. NORMALIZATION: In the first edition of XML 1.0, the description of attribute normalization was unclear. Were the normalization actions listed in section 3.3.3 meant to be alternatives, or applied in sequence? They were meant to be alternatives, but this was not everyone's interpretation. Consider the example: > nmtokens = " this
 also gets  normalized " If the actions were applied sequentially, the 
 would be first replaced by a carriage-return character, and then by a space, and similarly for the 
. The   would of course get replaced by a space. The result would be " this also gets normalized " Assuming that the attribute was of a tokenized type, say NMTOKENS, it would then get normalized to "this also gets normalized" and would be straightforwardly valid. But that's wrong. The actions are meant to be alternatives. Character references are replaced by the corresponding characters, but if those characters happen to be whitespace this doesn't result in them being converted to spaces. So the result after the first stage of normalization should be " this<CR><LF> also gets normalized " where <CR> and <LF> represent the carriage-return and linefeed characters. The second stage of normalization would then produce "this<CR><LF> also gets normalized" (*) because it compresses strings of space characters, not strings of whitespace. Erratum 70 (http://www.w3.org/XML/xml-19980210-errata#E70) attempted to make this clearer, explicitly stating that character references to CR, LF and TAB do not get normalized to spaces. VALIDATION: Normalization is intended to turn tokenized attributes into lists of tokens separated by single spaces, for easy processing by the application. To be valid, after normalization, NMTOKENS attributes must match the Nmtokens production, and ENTITIES and IDREFS attributes must match the Names production. Unfortunately these production were given as [6] Names ::= Name (S Name)* [8] Nmtokens ::= Nmtoken (S Nmtoken)* ("S" means whitespace). The effect of this is to make the normalized value marked (*) be valid, even though normalization has not made it into a list of space-separated tokens! The intention was to follow SGML, and make such values be invalid. The mistake was corrected in erratum 62 (http://www.w3.org/XML/xml-19980210-errata#E62) which changed the productions to [6] Names ::= Name (#x20 Name)* [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)* where S has been replaced by #x20. At this point, all was well. XML was compatible with SGML, and normalized valid tokenized values were always strings of tokens separated by single space characters. Unfortunately, someone queried erratum 62, and in a fit of collective amnesia the XML Core WG forgot that the validity constraints applied *after* attribute value normalization. It seemed that perfectly resonable cases like nktokens="foo bar" had been ruled out (which of course they hadn't). Erratum 108 (http://www.w3.org/XML/xml-19980210-errata#E108) restored the faulty productions, and worse still this was done immediately before publication of the second edition. The mistake was later realized, and erratum 20 to the second edition (http://www.w3.org/XML/xml-V10-2e-errata#E20) restored the old E62. In accordance with the law of cartoon amnesia, all is well if you get hit on the head an even number of times. The Oasis test suite is particularly confused and the output files for not-sa02 and sa02 do not match any of the errata. -- Richard
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|