[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: REC-xml-19980210: whitespace
John Cowan wrote: > David Brownell wrote: > > > Moreover, 2.10 says (albeit oddly) that CRLF gets normalized > > to LF everywhere, and the LF would get normalized to a single > > space inside of an attribute (or Public Identifier) value. > > Hmmm. Real CRLFs get normalized to LFs, but does that apply > to the appearance of "
"? I think not. > In attribute values, however, At a minimum, Section 3.3.3 Attribute-Value Normalization and other places where white space is discussed seems a little short of exact. In Section 3.3.3, bullet #1: "a character reference is processed by appending the referenced character to the attribute value" None of the other bullets deal with character references. The value of the character reference is appended to the attribute value. Its part of the attribute value. Note: When do characters in the attribute value get put into the normalized value? Bullet #4 states that "other characters are processed by appending them to the normalized value." It seems that character references are never explicilty transfered from the attribute value to the normalized value. What happens when the character reference is: #x20? If the attribute is not CDATA, then nothing. If the attribute is CDATA, then if the #x20 is leading or trailing, then its stripped, else if the #x20 is part of a sequence of #x20's, then only one #x20 takes the place of the sequence. What happens when the character reference is: #x09? Nothing happens because the bullet #3 does not apply to character references. What happens when the character reference is: #x0A? Nothing happens because the bullet #3 does not apply to character references. What happens when the character reference is: #x0D? Nothing happens because the bullet #3 does not apply to character references. This implies that the sequence "
" is not converted into a #x20 (this was noted by John Cowan above). Section 2.11 End-of-Line Handling, does not apply since a character reference can not contain both #x0D and #x0A in a single reference. So, it is possible for normalized attribute values (of type CDATA and not CDATA) to contain sequences of #x20s and the character sequence #x0D#x0A. If this is not the intent of the spec. then Section 3.3.3 needs a little work. I do not know if this is what was really desired by the authors of the spec but (at least to me) thats what the spec says. If even character reference whitespace are to be processed/normalized, I would recommend that first a value is created by appending character reference, recursive appending of entity references, and simple appending other characters, and then CDATA/non-CDATA normalization takes place, i.e, a two step description. That way the sequence "
" will be normalized, again, assuming that thats what the spec is trying to say. I am not out to criticized the spec or its authors; I'm just trying to build a validating parser and not being an SGML-techie or not having partaken in the year+ w3c xml spec development process, I only have the spec to go on. I have the feeling that since there is so much semantics, not just syntax, in the spec someone ought to have, for example, someone like Guy Steele take a pass at it. Richard Emberson xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|