Re: REC-xml-19980210: whitespace

From: emberson@f... (Richard Emberson)
To: cowan@l..., xml-dev@i...
Date: Thu, 15 Oct 1998 14:25:32 -0700

Play the video

John Cowan wrote:
> David Brownell wrote:
> 
> > Moreover, 2.10 says (albeit oddly) that CRLF gets normalized
> > to LF everywhere, and the LF would get normalized to a single
> > space inside of an attribute (or Public Identifier) value.
> 
> Hmmm.  Real CRLFs get normalized to LFs, but does that apply
> to the appearance of "&#xD;&#xA;"?  I think not.
> In attribute values, however, 

At a minimum, Section 3.3.3 Attribute-Value Normalization 
and other places where white space is discussed seems a little
short of exact.

In Section 3.3.3, bullet #1:

  "a character reference is processed by appending the referenced
  character to the attribute value"

None of the other bullets deal with character references. The
value of the character reference is appended to the attribute value.
Its part of the attribute value. 

  Note:
    When do characters in the attribute value get put into the 
    normalized value? Bullet #4 states that "other characters 
    are processed by appending them to the normalized value."
    It seems that character references are never explicilty 
    transfered from the attribute value to the normalized value.

What happens when the character reference is: #x20?
  If the attribute is not CDATA, then nothing.
  If the attribute is CDATA, then 
    if the #x20 is leading or trailing, then its stripped, else
    if the #x20 is part of a sequence of #x20's, then only
      one #x20 takes the place of the sequence.

What happens when the character reference is: #x09?
    Nothing happens because the bullet #3 does not apply to
      character references.

What happens when the character reference is: #x0A?
    Nothing happens because the bullet #3 does not apply to
      character references.

What happens when the character reference is: #x0D?
    Nothing happens because the bullet #3 does not apply to
      character references.

This implies that the sequence "&#xD;&#xA;" is not converted into a #x20
(this was noted by John Cowan above). Section 2.11 End-of-Line Handling,
does not apply since a character reference can not contain both #x0D and
#x0A in a single reference.

So, it is possible for normalized attribute values (of type CDATA and
not CDATA) to contain sequences of #x20s and the character sequence
#x0D#x0A. If this is not the intent of the spec. then Section 3.3.3
needs a little work.

I do not know if this is what was really desired by the authors of the
spec but (at least to me) thats what the spec says.

If even character reference whitespace are to be processed/normalized,
I would recommend that first a value is created by appending 
character reference, recursive appending of entity references, and
simple appending other characters, and then CDATA/non-CDATA normalization
takes place, i.e, a two step description. That way the sequence "&#xD;&#xA;"
will be normalized, again, assuming that thats what the spec is
trying to say.

I am not out to criticized the spec or its authors; I'm just trying to
build a validating parser and not being an SGML-techie or not having 
partaken in the year+ w3c xml spec development process, I only have
the spec to go on. I have the feeling that since there is so much
semantics, not just syntax, in the spec someone ought to have, 
for example, someone like Guy Steele take a pass at it.

Richard Emberson

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)

Follow-Ups:
- Re: REC-xml-19980210: whitespace
  - From: David Brownell <db@E...>

Prev by Date: FPI questions
Next by Date: Re: REC-xml-19980210: whitespace
Previous by thread: Re: REC-xml-19980210: whitespace
Next by thread: Re: REC-xml-19980210: whitespace
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >