Re: What is an XML Document? [Was: Re: canonicalization]
On Mon, Mar 04, 2002 at 10:40:23PM -0500, Elliotte Rusty Harold wrote: > At 7:44 PM -0500 3/4/02, Daniel Veillard wrote: > > I would be tempted to tease and ask what is an XML document (would > >the TAG ever find the answer ;-) . I also note that in use case like > >the Jabber protocol, you never end-up with a "fully composed document" > >it only exists once the processing is finished and that it had become > >useless. > > Is this really a problem? In practice not really. > According to the XML spec "A data object is > an XML document if it is well-formed, as defined in this > specification." That does leave the question of what a data object > is, but I think a reasonable answer is "a sequence of bytes or a > sequence of Unicode characters". Pretty clearly the spec does not > intend that a data object be a traditional OOP object of some kind. Well that sequence of bytes may actually become a set of sequences as soon as one is dealing with external entities. And the duality well-formedness vs. validating parser exhibited from the specification show that what is named in a similar way (the main entity resource) may end up being considered differently by two instance. But as said in practice it's not such a big deal because the fact that some external resources may be missing has to be handled in the tool chain anyway. And when the requirement is that the set to be seen must be the same this can usually be implemented either by disabling any external access or turning missing set into errors. Still the Jabber case is an interesting example in my opinion because they stretch the usual principle of keeping instances "atomic" and instead agree to work on a long lived "never ending" document. And in such use case entities doesn't work (because there isn't even a DOCTYPE at the start of the connection), while XInclude does (assuming the parser handle them of course), it's intersing to see various specification taken from a Jabber view point, a lot of them actually requires a full document instance and won't work directly in such a context. > I do wish the spec made that last point explicit, but I do think it > won't get anybody into trouble and might indeed pull a few developers > people out of the quicksand they've mired themselves in by believing > things like objects can be XML documents instead of representation of > an XML document. (To cite a classic OOP example, nobody believes a > Car object is a car. Why do developers insist on claiming Document > objects are documents?) I don't ;-) . Still there are some properties one would expect to see (two readers of a same document see the same sequence of character) but which are not garanteed by a document object. Just a fact one need to be aware of. Both the Infoset and C14N are trying to adapt or forbid those case, and most developpers would better understand the issue. Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@r... | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format