[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why the Infoset?
It seems there is an incontrovertible answer to the question 1. "What is the information content of a well-formed XML document?" From this description of the information content, we can extract subsets that prove useful in a majority of the contexts in which XML documents are used, to answer such questions as 2. "What is the useful information content of a well-formed XML document?" This is the question addressed by the XML Infoset spec that is the subject of this controversy. 3. "What is the information content of a well-formed XML document that is required to be preserved or accessible from a non-validating parser's representation of an XML document?" or 4. "What is the information content of a well-formed XML document that is required to be preserved or accessible from a validating parser's representation of an XML document?" This answer to question 1, probably expressible as a grove, includes the logical content of the document in the form of elements, attributes, text, annotated by additional information which enables exact replication of the original document (e.g. whitespace layout, namespace prefixes, presence of closing tags), and in addition containing complete describing references to external resources (DTDs, namespaces, schemas) which are pertinent. Presumably the logical items would be annotated as to whether they were specified in a DTD, from external or internal subset, in place of ANY element content markers, as the result of resolution of entities (and which entities), etc. The information content also contains a set of constraints (from DTDs or schemas) to which the document purports to conform. There is clearly adequate information to determine whether it so conforms, but this might not be determinable without running a validating parser, so it might best be considered derived information left to the application to determine. It seems that all the questions above, other than #1, are subject to dispute; practical considerations, some more important than others to different disputants, would allow reasonable people to reach different conclusions about the desired answers to these questions. But I think it makes sense to be formal and explicit about the answer to #1 in order to frame the discussions about the others. Whatever formal representation is used, the same formalism would help to highlight the differences between the information content subsets, even if it is not used for presentation in the spec, but is referred to there, even if non-normatively. Jeff
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|