|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Dizzy (was Re: Why the Infoset?)
"Simon St.Laurent" wrote: > I'm getting kind of dizzy here. You've objected rather violently to Common > XML and Minimal XML's subsetting of XML syntax, but you seem to insist on > the Infoset only providing an abstraction of just such a subset, > deliberately ignoring the rest. (Off-topic: Flannery O'Connor used to refer to her excellent book "The Violent Bear It Away" as "The Violant Bear".) First, let me start by saying that the new draft of Common XML is really excellent. I read it last night and I was not subject to any of the fits of violence that must so terrify other, more mild-mannered, correspondents. Anyone starting making an XML system publishing XML blind (i.e., you do not have control over the receiving systems) is well advised to read it. I believe Simon commented on SML-DEV that he is preparing to release the new version soon, and he will no doubt publish the URL. As for Minimal XML, if I don't want Microsoft to extend XML or subset XML syntax, it is only consistent to hold that SML-DEV should not either (and still call it XML). Anyone reading the SML-DEV archives will see that over the course of time almost all the absolute statements there (PIs bad, notations bad, attributes bad, comments burdonsome) were found over time to be too extreme to be general, or applied only to particular use-models or implementation techniques. So yes I do not agree with subsetting XML syntax. But I certainly agree that W3C specs should have a consistent view of what information markup should have. And this view should be consonant with XML as SGML: if we disconnect XML from SGML it will not fly free like a beautiful bird, it will be captured for hideous genetic experiments by the rich and powerful, or their hunchbacked .org fronts. I probably have written on this list before that I think it is naive to think that large companies will not use any ammunition the public gives them to justify embrace-and-extend(or subset); agreeing on a conservative information set to be used in W3C specs is one way to corral them. In SGML, whitespace in markup is "ignored"; consequently it should be ignored in general-purpose XML information sets. I hope the information set can be a way to capture some of the valuable semantics of markup clear from ISO8879 but missed from XML 1.0 (to simplify it). So if I say that the number of whitespace characters between x and y in <x y="z"/> should not be part of the information set, what do I mean? I mean that the standard DOM should not support it with extra nodes on an Element node, that there should not be a special axis invented for Xpath for it, that there need not be a way to XLink to it, that c14n draft should not be constrained to keep it, that XSchemas should not have a a way to constrain how may spaces can appear there or to give a regular expression for whether newlines can be put there, or that XSLT need extra elements or attributes to handle it, that CSS does not need extra selectors for it, etc. The XML Infoset has a specific purpose, given in the requirements document: http://www.w3.org/TR/NOTE-xml-infoset-req "It will provide a common reference set that other specifications can use and extend to construct their underlying data models, and will help to ensure interoperability among the various XML-based specifications and among XML software tools in general." The Infoset is aimed at XML specifications and software in general. It is not its intent to state all the information that anyone could encode in their document. I would say that in particular it is setting a policy that W3C XML specs should not operate as if the formatting of the XML markup was significant. This is not a new issue: I remember it being discussed 3 years ago or so. It is good for XML editors to regenerate edited documents with the original formatting of the markup. That is why it is useful if SAX reports rather than collapses whitespace, and why a DOM implementation for an interactive editor should subclass the W3C DOM to provide this information. That is their infoset, but it is not the one that W3C Working Groups should start from. Perhaps John should consider retitling it "XML Information Set for W3C Specifications" and its scope would be clearer. To be glib, Simon & I are mutually dizzy: I would have thought that if interoperability and simplicity are both desired together, roping in the information set from niche requirements and questionable uses should be a Good Thing in Simon's book (not a real book). Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








