[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

An approach to let XML 2.n resources hold multiple entities

xml multiple entities
A couple of times people have suggested that XML should allow multiple top-level 
elements. Thinking about it, here is one possible approach that might fit in with existing
systems with fairly minimal changes.

The idea is that every top-level occurrence of <?xml\w (where \w means word end)
in an XML resource signals the end of any previous entity and the start of a
new one.  So the following would be valid

<?xml version="1.2"?>
<?xml version="1.2"?>
<?xml version="1.2"?>

but not

<?xml version="1.2"?>
<?xml version="1.2"?>

because we are not at the top level. Only the furst entity in the resource
can have a DOCTYPE declaration; this avoids several complications.

How does this fit in with XPath?  

At the moment,  count(/*) always is 1.  I am suggesting redefining /
away from being the "document" to being the "resource", and then
using indexing to get other entities. Two ways for this spring to mind:
1) Use existing XPaths, so that in the first example above the address of
the y element is   document("first example")/*[2]   
The XPath of the document element is document("first example")/*[1]   

This has the advantage of not requiring syntax changes to XPath. (The
only disadvantage I see is that XPath cannot express which entity
leading and trailing comments and PIs come from: I don't think this is
a biggy.)

<?xml version="2.0"?>
  <!ENTITY next SYSTEM "#xpointer(/*[2])">

<?xml version="2.0"?>

2) Use a new axis on XPath, for example
   /entity::*[2]  is the y element
   /entity::*[1] is the document element,
  /x is shorthand for /entity::*[1]/x  and 
  //x is shorthand for /entity::*[1]//x

This has the advantage of introducting parseable entities as first hand components
of a document, which may also be useable by XInclude

<?xml version="2.0"?>
  <!ENTITY next SYSTEM "#xpointer(/entity::*[2])">

<?xml version="2.0"?>

I am not sure which one I prefer.  

How does this fit in with SGML?

The top-level production of SGML is 

[1] SGML document =
  SGML document entity,
 (SGML subdocument entity |
   SGML text entity |
  character data entity |
  specific character data entity |
  non-SGML data entity )*

which models the document as a single stream of data broken into entities,
each entity being terminated and separated with an Entity End signal
(to the parser)  

SGML specifically says in a note on that production that "This International Standard
does not constrain the physical organization of the document within
the data stream, message handling protocol, file system etc that contains
it. In particular, separate entities could occur in the same physical object,
 a single entity could be divided between multiple objects, and the objects
could occur in any order."

Of course, at this top level the use of productions are just a formalism
not something an SGML parser needs to implements.  XML makes the
simplification that a entity is addressed by a single URL, which effective
precludes the need for an XML entity manager to handle elements that 
start in one entity by end in another.

But there is nothing I see in SGML that prevents a change in XML to
disconnect resource and entity, so that a resource can contain
multiple parseable XML entities.  

The textual nature of an XML resource is maintained and an existing
tag that is already swallowed as part of entity handling (i.e. <?xml?>)
is reused.  The use of explicit text is, I think better than using an invisible
control character, such as ^L form feed. 

Rick Jelliffe


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.