[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML 1.0 Conformance Test Results

  • From: Richard Tobin <richard@c...>
  • To: xml-dev@l...
  • Date: Tue, 12 Jun 2001 13:10:45 +0100 (BST)

x20 xml
> In looking at the sun/valid/not-sa02.xml file, I can't find any tokens that
> that are separated _only_ by character references to whitespace.

You're right, my description was just a shorthand for a more complicated
set of problems.  Here is the long, historical version.

There are two aspects to it: attribute value normalization, and validation
of normalized attributes.

NORMALIZATION:

In the first edition of XML 1.0, the description of attribute
normalization was unclear.  Were the normalization actions listed in
section 3.3.3 meant to be alternatives, or applied in sequence?  They
were meant to be alternatives, but this was not everyone's
interpretation.

Consider the example:

>    nmtokens =  " this&#x0d;&#x0a; also  gets&#x20; normalized "

If the actions were applied sequentially, the &#x0d; would be first
replaced by a carriage-return character, and then by a space, and
similarly for the &#x0a;.  The &#x20; would of course get replaced by
a space.  The result would be

  " this   also  gets  normalized "

Assuming that the attribute was of a tokenized type, say NMTOKENS,
it would then get normalized to

  "this also gets normalized"

and would be straightforwardly valid.

But that's wrong.  The actions are meant to be alternatives.  Character
references are replaced by the corresponding characters, but if those
characters happen to be whitespace this doesn't result in them being
converted to spaces.  So the result after the first stage of normalization
should be

  " this<CR><LF> also  gets  normalized "

where <CR> and <LF> represent the carriage-return and linefeed characters.
The second stage of normalization would then produce

  "this<CR><LF> also gets normalized"                  (*)

because it compresses strings of space characters, not strings of whitespace.

Erratum 70 (http://www.w3.org/XML/xml-19980210-errata#E70) attempted to
make this clearer, explicitly stating that character references to
CR, LF and TAB do not get normalized to spaces.

VALIDATION:

Normalization is intended to turn tokenized attributes into lists of
tokens separated by single spaces, for easy processing by the
application.  To be valid, after normalization, NMTOKENS attributes
must match the Nmtokens production, and ENTITIES and IDREFS attributes
must match the Names production.  Unfortunately these production were
given as

 [6] Names    ::= Name    (S Name)*
 [8] Nmtokens ::= Nmtoken (S Nmtoken)*

("S" means whitespace).

The effect of this is to make the normalized value marked (*) be
valid, even though normalization has not made it into a list of
space-separated tokens!  The intention was to follow SGML, and make
such values be invalid.  The mistake was corrected in erratum 62
(http://www.w3.org/XML/xml-19980210-errata#E62) which changed the
productions to

 [6] Names    ::= Name    (#x20 Name)*
 [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*

where S has been replaced by #x20.

At this point, all was well.  XML was compatible with SGML, and
normalized valid tokenized values were always strings of tokens
separated by single space characters.

Unfortunately, someone queried erratum 62, and in a fit of collective
amnesia the XML Core WG forgot that the validity constraints applied
*after* attribute value normalization.  It seemed that perfectly
resonable cases like

  nktokens="foo
            bar"

had been ruled out (which of course they hadn't).  Erratum 108
(http://www.w3.org/XML/xml-19980210-errata#E108) restored the faulty
productions, and worse still this was done immediately before
publication of the second edition.

The mistake was later realized, and erratum 20 to the second edition
(http://www.w3.org/XML/xml-V10-2e-errata#E20) restored the old E62.
In accordance with the law of cartoon amnesia, all is well if you get
hit on the head an even number of times.

The Oasis test suite is particularly confused and the output files
for not-sa02 and sa02 do not match any of the errata.

-- Richard

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.