RE: Arrgh! - FW: Call for unifying and clarifying XML 1.0, DOM, XPATH, a
[Forwarded for Kevin Williams <Kevin.Williams@u...>] From: Kevin Williams <Kevin.Williams@u...> To: "'simonstl@s...'" <simonstl@s...> Subject: RE: Arrgh! - FW: Call for unifying and clarifying XML 1.0, DOM, X PATH, and XML Infoset Date: Tue, 25 Jan 2000 15:12:40 -0500 Am I missing something on this thread? Here's my understanding: The Infoset is intended to describe all of the various entities that may together comprise an XML document. If I may expand on your quote from the Infoset last call WD: "The XML information set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a tree for the sake of clarity and simplicity, but there is no requirement that the XML information set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces are also capable of providing information conforming to the information set." The intent here seems to be to allow non-tree-based processors, such as SAX, to abide by the Infoset specification. In other words, "Here's a pile of things that are in an XML document. You have to have some of them, and you can choose to leave others out. They also have to point to one another somehow. The exact mechanism to be used is not specified in this document." Otherwise, an event-driven parser like SAX could never conform to the specification. I don't think the W3C is attempting to describe a content model in the Infoset specification, but to open the door for non-tree processors. As an aside, in the data universe XML documents often are not structured as trees. For example, in a project I've been working on for the mortgage industry, a <Property> element may play several roles in a loan - it may be the subject property, it may be a current address, it may be a piece of real estate owned by a borrower, and so forth. To avoid repeating the same piece of information more than once in a document, then, we use IDs and IDREFs to point to the property, expressing it only once. In this case, a simple tree structure falls short, and we need to "hop" from branch to branch. The subject of attributes is a trickier one. I think the chief problem is that attributes are not ordered; in a tree, then, an attribute might have a parent, and siblings, but not next and previous siblings. Perhaps it might have been better to approach the problem as you state, with ordered and unordered children in the model - however, I think that this is precisely the model that the Infoset describes (while eschewing the terms "tree" and "node" to avoid alienating the non-tree processors). I think that the issue with attributes is really at the core of the problem here - the fact that neither the DOM nor XPath treat attributes as "real" nodes. In an application, however, I would think that the role played by attributes and text elements would be clear and unambiguous, making a construct such as "@* | node()" only necessary in more esoteric situations. While I agree that the language in the W3C specifications is ambiguous, even obtuse, at times, I still feel strongly that imposing the tree structure on every application that uses XML would be the wrong way to go. If we take Infoset as a basis, and then assume that the tree-model processor mechanisms (DOM) and the event-model processor mechanisms (SAX) inherit from it, I don't think there's a problem - certainly if the DOM and XPath are in disagreement, they should be resolved, as I would imagine XPath inherits from the DOM model. Here's a thought - perhaps there's a missing specification or two needed to fill in the gaps. From Infoset (which is a descriptive model), we need to derive two physical models - the tree-based model (very close to what you described in your first post) and the event-based model (for SAX et. al.) Indeed, perhaps there's room for a query-based model as well. Then, we would have tree technologies such as the DOM and XPath using the tree-based model, while SAX et. al. could use the event-based model. That way, the model would be consistent throughout all of the tree-based technologies. (I'm assuming here that the DOM should be treated as an API rather than a content model - its name notwithstanding!) Any thoughts? - Kevin Kevin Williams Ultraprise Corporation (www.ultraprise.com) Co-author, _Professional XML_ (Wrox Press) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1 Unsubscribe by posting to majordom@i... the message unsubscribe xml-dev (or) unsubscribe xml-dev your-subscribed-email@your-subscribed-address Please note: New list subscriptions now closed in preparation for transfer to OASIS.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format