[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why the Infoset?

  • From: Jeff Greif <jgreif@b...>
  • To: xml-dev@x...
  • Date: Thu, 03 Aug 2000 14:17:00 -0700

Re: Why the Infoset?
It seems there is an incontrovertible answer to the question
1.  "What is the information content of a well-formed XML document?"

From this description of the information content, we can extract subsets
that prove useful in a majority of the contexts in which XML documents are
used, to answer such questions as
2.  "What is the useful information content of a well-formed XML document?"
This is the question addressed by the XML Infoset spec that is the subject
of this controversy.
3.  "What is the information content of a well-formed XML document that is
required to be preserved or accessible from a non-validating parser's
representation of an XML document?" or
4.  "What is the information content of a well-formed XML document that is
required to be preserved or accessible from a validating parser's
representation of an XML document?"

This answer to question 1, probably expressible as a grove, includes the
logical content of the document in the form of elements, attributes, text,
annotated by additional information which enables exact replication of the
original document (e.g. whitespace layout, namespace prefixes, presence of
closing tags), and in addition containing complete describing references to
external resources (DTDs, namespaces, schemas) which are pertinent.
Presumably the logical items would be annotated as to whether they were
specified in a DTD, from external or internal subset, in place of ANY
element content markers, as the result of resolution of entities (and which
entities), etc. The information content also contains a set of constraints
(from DTDs or schemas) to which the document purports to conform. There is
clearly adequate information to determine whether it so conforms, but this
might not be determinable without running a validating parser, so it might
best be considered derived information left to the application to determine.

It seems that all the questions above, other than #1, are subject to
dispute; practical considerations, some more important than others to
different disputants, would allow reasonable people to reach different
conclusions about the desired answers to these questions.  But I think it
makes sense to be formal and explicit about the answer to #1 in order to
frame the discussions about the others.  Whatever formal representation is
used, the same formalism would help to highlight the differences between the
information content subsets, even if it is not used for presentation in the
spec, but is referred to there, even if non-normatively.

Jeff


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.