[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What is an XML Document? [Was: Re: canonicalization]


what is an xml document
On Tue, Mar 05, 2002 at 02:31:30PM -0500, Elliotte Rusty Harold wrote:
> At 1:56 PM -0500 3/5/02, Daniel Veillard wrote:
> 
> 
> >   Well that sequence of bytes may actually become a set of sequences
> >as soon as one is dealing with external entities.
> 
> A good point. The way the spec is written though I think it's 
> consistent to claim that the document is only the byte/character 
> sequence that references the external entities. It does not actually 
> include the merged text of the  entities. The spec also states that:
> 
> [Definition: A textual object is a well-formed XML document if:]
> 
> 1. Taken as a whole, it matches the production labeled document.
      ^^^^^^^^^^^^^^^

  This occurs after

----------------
Each XML document has both a logical and a physical structure. Physically,
the document is composed of units called entities. An entity may refer
to other entities to cause their inclusion in the document. A document
begins in a "root" or document entity.
----------------

  For me there is no doubt that the document is the set. Well formedness
is defined for the set, and a well formedness error detected when parsing
an external entity affects the whole document.

  Anyway, even if the REC may be ambiguous, from a programmer viewpoint the 
document instance will likely to be based on those extra sets, XPath 
for example requiring them.

> >   Still the Jabber case is an interesting example in my opinion because
> >they stretch the usual principle of keeping instances "atomic" and instead
> >agree to work on a long lived "never ending" document. And in such use
> >case entities doesn't work (because there isn't even a DOCTYPE at the
> >start of the connection), while XInclude does (assuming the parser handle
> >them of course), it's intersing to see various specification taken from
> >a Jabber view point, a lot of them actually requires a full document
> >instance and won't work directly in such a context.
> >
> 
> Another good point. However, the BNF grammar and well-formedness 
> constraints make it clear that an infinite sequence cannot possibly 
> be a well-formed XML document. Thus my definition of data object 
> should be revised to say "either a finite sequence of bytes or a 
> finite sequence of Unicode characters". I don't know if a Jabber 
> document is truly infinite or just indefinitely large. (Looking at 
> the spec I think it's just indefinite.)

Yes, the connection get closed by an exchange of </stream:stream>
so it's finite in practice but the software needs to be built to
process incrementally indefinitely large instances. Very much the
foundation principle of SAX (but a progressive DOM builder can work
too if you discard processed nodes).

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@r...  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.