[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why the Infoset?
"Paul W. Abrahams" wrote: > Which is the horse and which the cart here? Especially given its ancestry as a > more civilized form of SGML, XML is seen by the world as a set of textual > conventions for recording documents. The Infoset is related to an > implementer's view of the abstract syntax tree. But even then, I believe that > people were writing XML parsers, and therefore creating abstract syntax trees, > before the Infoset ever existed. > > Looking at it another way, how would the XML world be poorer if the Infoset did > not exist? It is far worse than that, I fear. The Infoset is the cuckoo's egg in the XML nest. The fundamental innovation of XML 1.0 was the concept of well-formedness, which as a radical insight amounts to this: the instance text--that is, content plus markup--is entirely self-sufficient both as syntax and as the basis for derived or elaborated semantics. The inherent bias of SGML is toward a pre-ordained content model. The DTD-based validation which XML inherited from SGML imposes as a first and principal demand on the instance document that it be a proper concrete expression of an established form. I call such a priori expectations 'intent', and the XML family of specifications abounds with often mutually-exclusive and mutually-contradictory attempts to impose such preconceptions. They range from DTD-based validation at the milder end of the spectrum to attempts such as SOAP to force an XML document to mandate specific processing at the time of its use--to become, in effect, an executable. By contrast, the concept of well-formedness introduced by XML 1.0 permitted that original XML definition to be understood as a specification of syntax rather than of expected semantics. It offered the possibility of XML which, as fundamentally distinct from SGML, might have no expectations of an instance document other than well-made syntax. That, in turn, offered the possibility that the true content model of an instance document might be uniquely derived at the time and place of its use. The intent of the document creator for the interpretation of the instance document--whether that intent was expressed as a content model in a DTD, or as a schema imposed upon the instance document, or as a stylesheet specifying a pre-ordained transformation, or even a presentation, of that document--might legitimately be ignored, partially-ignored, or modified in ways appropriate to the unique local circumstances where the document consumer processed or otherwise made use of that document. This is the closest that we have come in the field of markup to realizing the separation of content (which cannot be more minimally conveyed than as syntax) from presentation (in its larger sense of the elaboration of semantics from that syntax). This understanding of radically simple well-formed XML leads to other wonderful discoveries as well. For example, just as the XML name promises, the language or markup vocabulary of a document is extensible on the spot, in the instance, through nothing other than the application of markup itself. Since no DTD nor other content model or pre-ordained schema is required for the parsing, and therefore the interpretation, of the resulting instance document, it is not necessary to secure anyone's agreement to the extension of the content model before simply extending the markup vocabulary of the instance document. XML 1.0 is wonderfully silent on how that novel markup is to be understood by a consumer of the document, thereby leaving the question of what the local semantics of the document will be in the circumstances of its use quite properly in the hands of each of its users. The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|