[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Proposed process for DTDs in XML (Implementations)

  • From: Peter Murray-Rust <peter@u...>
  • To: xml-dev@i...
  • Date: Mon, 25 May 1998 14:49:55

uses xml text flat
Many thanks to all those posting.  I am getting the same sort of critical
mass and focussing as before SAX.

At 14:31 25/05/98 +0200, Ron Bourret wrote:
>
>I might be getting a bit ahead of the game here, so please bear with me --
these 
>thoughts are in my head now and I'd like to get them down.
>
>Trees vs. Events
>----------------
>It seems like we need to decide early on whether we are interested in
getting 
>the DTD as events or a tree.  Arguing in favor of events is the fact that
it is 
>more reasonable to build a tree from events than vice versa (less memory
usage), 
>so events are the more basic form.  However, I also think that what is
returned 
>really depends on intended usage.

I suspect that a tree will be the method of choice if it is used for
retrospective exploration (i.e. after the parsing). In that case the tree
will not be ordered. The only reason I can see for events is that they may
help the parser build the DTD in a particular order (?efficiency?).

I *hope* that we shan't get to the stage where memory usage of DTDs is a
problem. I am aware that DOCBOOK takes ca. 3000 lines (but that includes
PEs) - I assume that TEI in all its glory is larger. But even they
shouldn't cause problems compared to document size.

>
>In my limited imagination, events are mostly useful for display -- read in
the 
>DTD definition-by-definition and display it.  This is a common operation
with 
>the text in an XML document and is presumably why SAX returns events.
Except 
>for displaying a DTD or building a tree, how else would DTD events be used?
>
>The two prime uses of DTDs that I can think of are validation and
exploration.  
>Both of these require the information to stay in memory and be accessed 
>randomly, which (to me) implies a tree, hash table, or similar structure.
Are 
>there any common uses of DTDs that require serial access?

The *order* of declaration of elements in a DTD is presumably irrelevant. I
imagine that parsers have to build the DTD in memory anyway

AFAIR it was said on this list that the two uses of DTDs were:
	- syntactic/structural validation
	- processing minimisation

I have added some other *possible* uses of XTD yesterday and it would
probably be useful to group these and other suggestions to offer as questions.

>
>Flat Trees vs. Tree Trees
>-------------------------
>If trees are used, another question is what form the tree takes.  XML-Data 
>currently defines a tree that uses XML's hierarchy as a way to group
information 
>about individual elements.  However, the relation between those elements is 
>actually flat.  For example, the following DTD converts to the following  
>XML-Data structure:
>
>DTD:
><!DOCTYPE a [
><!ELEMENT a (b)>
><!ELEMENT b (#PCDATA)>
>]>
>
>XML-Data:
><schema id = "a">
>   <elementType id = "a">
>      <element type = "#b"/>
>   </elementType>
>   <elementType id = "b">
>      <string/>
>   <?elementType>
></schema>
>
>Notice that the definitions of a and b are at the same level.  That is,
when I 
>build a DOM tree from this XML, a and b are siblings, not parent and child.  
>When exploring a DTD, the parent-child relationship is far nicer -- I move
up 
>and down the DOM tree and get the metadata I need at each level.  On the
other 
>hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and
I'm not 
>sure if representing children with multiple parents would even be possible, 
>given the strict nesting requirements of XML.  Comments?

In JUMBO1 the *elements* are all children of a root XTD node. Each element
has a number of ATTLIST children, and also a single contentSpec child. The
ATTLIST is very flat (just type, default, etc) but the contentSpec can be
hierarchical. I used the terms in the spec (Choice and Seq) as nodes which
a contentSpec could possess recursively. I'd strongly urge sticking to this
because it makes it easy to extract sub-contentSpecs and trivial to parse.

I don't see that there is a useful way that a non-flat tree could be built
up - if the tree is attempting to show the children directly (e.g. not
using Choice and Seq) then we get into recursion. This is the sort of
problem that is faced by tools like Earl Hood's (very nice) dtd2html - a
Perl script for producing SGML documentation. He expands content models
fully the first time and then uses ellipses when the elements re-occur at a
lower level.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.