Re: What is an XML Document? [Was: Re: canonicalization]
On Tue, Mar 05, 2002 at 02:31:30PM -0500, Elliotte Rusty Harold wrote: > At 1:56 PM -0500 3/5/02, Daniel Veillard wrote: > > > > Well that sequence of bytes may actually become a set of sequences > >as soon as one is dealing with external entities. > > A good point. The way the spec is written though I think it's > consistent to claim that the document is only the byte/character > sequence that references the external entities. It does not actually > include the merged text of the entities. The spec also states that: > > [Definition: A textual object is a well-formed XML document if:] > > 1. Taken as a whole, it matches the production labeled document. ^^^^^^^^^^^^^^^ This occurs after ---------------- Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. ---------------- For me there is no doubt that the document is the set. Well formedness is defined for the set, and a well formedness error detected when parsing an external entity affects the whole document. Anyway, even if the REC may be ambiguous, from a programmer viewpoint the document instance will likely to be based on those extra sets, XPath for example requiring them. > > Still the Jabber case is an interesting example in my opinion because > >they stretch the usual principle of keeping instances "atomic" and instead > >agree to work on a long lived "never ending" document. And in such use > >case entities doesn't work (because there isn't even a DOCTYPE at the > >start of the connection), while XInclude does (assuming the parser handle > >them of course), it's intersing to see various specification taken from > >a Jabber view point, a lot of them actually requires a full document > >instance and won't work directly in such a context. > > > > Another good point. However, the BNF grammar and well-formedness > constraints make it clear that an infinite sequence cannot possibly > be a well-formed XML document. Thus my definition of data object > should be revised to say "either a finite sequence of bytes or a > finite sequence of Unicode characters". I don't know if a Jabber > document is truly infinite or just indefinitely large. (Looking at > the spec I think it's just indefinite.) Yes, the connection get closed by an exchange of </stream:stream> so it's finite in practice but the software needs to be built to process incrementally indefinitely large instances. Very much the foundation principle of SAX (but a progressive DOM builder can work too if you discard processed nodes). Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@r... | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format