An approach to let XML 2.n resources hold multiple entities
A couple of times people have suggested that XML should allow multiple top-level elements. Thinking about it, here is one possible approach that might fit in with existing systems with fairly minimal changes. The idea is that every top-level occurrence of <?xml\w (where \w means word end) in an XML resource signals the end of any previous entity and the start of a new one. So the following would be valid <?xml version="1.2"?> <x/> <?xml version="1.2"?> <y/> <?xml version="1.2"?> <z/> but not <?xml version="1.2"?> <x> <?xml version="1.2"?> </x> because we are not at the top level. Only the furst entity in the resource can have a DOCTYPE declaration; this avoids several complications. How does this fit in with XPath? ---------------------------------------- At the moment, count(/*) always is 1. I am suggesting redefining / away from being the "document" to being the "resource", and then using indexing to get other entities. Two ways for this spring to mind: 1) Use existing XPaths, so that in the first example above the address of the y element is document("first example")/* The XPath of the document element is document("first example")/* This has the advantage of not requiring syntax changes to XPath. (The only disadvantage I see is that XPath cannot express which entity leading and trailing comments and PIs come from: I don't think this is a biggy.) <?xml version="2.0"?> <!DOCTYPE x [ <!ENTITY next SYSTEM "#xpointer(/*)"> <x>&next;</x> <?xml version="2.0"?> <y/> 2) Use a new axis on XPath, for example /entity::* is the y element /entity::* is the document element, /x is shorthand for /entity::*/x and //x is shorthand for /entity::*//x This has the advantage of introducting parseable entities as first hand components of a document, which may also be useable by XInclude <?xml version="2.0"?> <!DOCTYPE x [ <!ENTITY next SYSTEM "#xpointer(/entity::*)"> <x>&next;</x> <?xml version="2.0"?> <y/> I am not sure which one I prefer. How does this fit in with SGML? -------------------------------------- The top-level production of SGML is  SGML document = SGML document entity, (SGML subdocument entity | SGML text entity | character data entity | specific character data entity | non-SGML data entity )* which models the document as a single stream of data broken into entities, each entity being terminated and separated with an Entity End signal (to the parser) SGML specifically says in a note on that production that "This International Standard does not constrain the physical organization of the document within the data stream, message handling protocol, file system etc that contains it. In particular, separate entities could occur in the same physical object, a single entity could be divided between multiple objects, and the objects could occur in any order." Of course, at this top level the use of productions are just a formalism not something an SGML parser needs to implements. XML makes the simplification that a entity is addressed by a single URL, which effective precludes the need for an XML entity manager to handle elements that start in one entity by end in another. But there is nothing I see in SGML that prevents a change in XML to disconnect resource and entity, so that a resource can contain multiple parseable XML entities. The textual nature of an XML resource is maintained and an existing tag that is already swallowed as part of entity handling (i.e. <?xml?>) is reused. The use of explicit text is, I think better than using an invisible control character, such as ^L form feed. Cheers Rick Jelliffe
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format