[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why the Infoset?

  • From: "W. E. Perry" <wperry@f...>
  • To: XML DEV <xml-dev@l...>
  • Date: Sat, 29 Jul 2000 03:03:14 -0400

infoset xml
"Paul W. Abrahams" wrote:

> Which is the horse and which the cart here?  Especially given its ancestry as a
> more civilized form of SGML, XML is seen by the world as a set of textual
> conventions for recording documents.  The Infoset is related to an
> implementer's view of the abstract syntax tree.  But even then, I believe that
> people were writing XML parsers, and therefore creating abstract syntax trees,
> before the Infoset ever existed.
> Looking at it another way, how would the XML world be poorer if the Infoset did
> not exist?

It is far worse than that, I fear. The Infoset is the cuckoo's egg in the XML nest.
The fundamental innovation of XML 1.0 was the concept of well-formedness, which as
a radical insight amounts to this: the instance text--that is, content plus
markup--is entirely self-sufficient both as syntax and as the basis for derived or
elaborated semantics. The inherent bias of SGML is toward a pre-ordained content
model. The DTD-based validation which XML inherited from SGML imposes as a first
and principal demand on the instance document that it be a proper concrete
expression of an established form. I call such a priori expectations 'intent', and
the XML family of specifications abounds with often mutually-exclusive and
mutually-contradictory attempts to impose such preconceptions. They range from
DTD-based validation at the milder end of the spectrum to attempts such as SOAP to
force an XML document to mandate specific processing at the time of its use--to
become, in effect, an executable.

By contrast, the concept of well-formedness introduced by XML 1.0 permitted that
original XML definition to be understood as a specification of syntax rather than
of expected semantics. It offered the possibility of XML which, as fundamentally
distinct from SGML, might have no expectations of an instance document other than
well-made syntax. That, in turn, offered the possibility that the true content
model of an instance document might be uniquely derived at the time and place of
its use. The intent of the document creator for the interpretation of the instance
document--whether that intent was expressed as a content model in a DTD, or as a
schema imposed upon the instance document, or as a stylesheet specifying a
pre-ordained transformation, or even a presentation, of that document--might
legitimately be ignored, partially-ignored, or modified in ways appropriate to the
unique local circumstances where the document consumer processed or otherwise made
use of that document. This is the closest that we have come in the field of markup
to realizing the separation of content (which cannot be more minimally conveyed
than as syntax) from presentation (in its larger sense of the elaboration of
semantics from that syntax).

This understanding of radically simple well-formed XML leads to other wonderful
discoveries as well. For example, just as the XML name promises, the language or
markup vocabulary of a document is extensible on the spot, in the instance, through
nothing other than the application of markup itself. Since no DTD nor other content
model or pre-ordained schema is required for the parsing, and therefore the
interpretation, of the resulting instance document, it is not necessary to secure
anyone's agreement to the extension of the content model before simply extending
the markup vocabulary of the instance document. XML 1.0 is wonderfully silent on
how that novel markup is to be understood by a consumer of the document, thereby
leaving the question of what the local semantics of the document will be in the
circumstances of its use quite properly in the hands of each of its users.

The Infoset is the unfortunate standard to which those in retreat from the radical
and most useful implications of well-formedness have rallied. At its core the
Infoset insists that there is 'more' to XML than the straightforward syntax of
well-formedness. By imposing its canonical semantics the Infoset obviates the
infinite other semantic outcomes which might be elaborated in particular unique
circumstances from an instance of well-formed XML 1.0 syntax. The question we
should be asking is not whether the Infoset has chosen the correct canonical
semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed
in this way at all.


Walter Perry


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.