[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: CDATA by any other name... (was The raw and the cooked)
Henry Thompson wrote: > The DOM made a serious mistake here in my opinion: it's > stranded in no-person's-land between raw and cooked, without being > either. It's not cooked, because it gives you EntityReference and > CDATA nodes. It's not raw, because it DOESN'T give you character > entity references. CHARACTER REFERENCES I think Henry means "numeric character reference", and this is the heart of the matter. A numeric character is not an entity, any more than a directly-entered character is. It is just an alternative encoding of the character, and should be of no more interest to a general API than the charset encoding of the document was. (I am putting words into his mouth: or does Henry mean the [XMLs4.6] predefined entities?) Even if you make <!ENTITY example "{"> The numeric character is not an entity: it is the value of an entity with the name "example". MARKED SECTIONS On the subject of marked sections, I personally think that (in SGML) marked sections should do more than just alter delimiter recognition: I think they delimit anonymous inline entities, and label the entity with text-type information. Unerlying this is that, marked sections actually mark up notations: at ISO there has been discussion of whether to allow something like (for example) <![JAVA[ java code here ]]> This is not something that I would expect to make its way into XML (and I think the ISO people are now more keen to help XML/WebSGML than on tidying up SGML) but I think the idea that a marked section not only alters delimiter recognition but also labels the data can be seen (in embryo or residually) in DOMs elevation of CDATAsection to node-worthiness, which has so perplexed Henry. If you take the view that CDATA section labels the data as character data (i.e. not ignorable whitespace) then <![CDATA[ ]]> is clearly invalid in Henry's example: because the " " is marked as data and data is not allowed. But that is emphera: what does the spec say? I think the answer is clear from the spec: [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)* so a CDSect is not CharData. Therefore a CDSect is only valid in mixed content, even though it is well-formed to have it in element content. I think this is doubly clear from the discussion of "white-space" in [XML 2.10]: white-space for xml:space considerations (in element content) is space added for "greater readability". <![CDATA[ ]]> does not do this!! It disrupts readability. So from the purpose of valid whitespace in element content it is clear that <![CDATA ]]> is not legitimate. The text is just as important as the productions. SPACES Henry's problem brings up a further important consideation. XML gives an attribute "xml:space" by which an application can know whether white:space may be collapsed or not. Can <![CDATA[ ]]> be used to override xml:space=default? The answer is NO, because * an application is free to decide whether collapse spaces inside CDATA marked sections or not; * in PCDATA, ISO 10646 provides a specific character to indicate non-collabsible whitespace: IDEOGRAPHIC SPACE   * outside mixed content <![CDATA[ ]]> is not valid for the reasons above. XML, by adopting ISO 10646, takes the line that the only way to overcome the problems that (ASCII) people have with spaces is to un-overload that damned space character. The basic principle of markup is that if a user wants something, they should unambiguosly mark it up in their data: if they want non-collapsible space, the correct answer is "Use  " or "Use xml:space='preserve'". (However, font issues are important here: IDEOGRAPHIC SPACE may be twice as wide as " " spaces, so the xml:lang attribute may be important.) I urge deve2lopers to make sure that their products handle the 17 ISO10646 spacing/hypenation characters properly. There have been previous postings on this group, (what happen to that XML jewels website: it was there too?), or get the Unicode book, or get ISO 10646, or (best option:-) get my book (XML & SGML Cookbook, p 3-90). Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|