Re: Proposed process for DTDs in XML (Implementations)
Many thanks to all those posting. I am getting the same sort of critical mass and focussing as before SAX. At 14:31 25/05/98 +0200, Ron Bourret wrote: > >I might be getting a bit ahead of the game here, so please bear with me -- these >thoughts are in my head now and I'd like to get them down. > >Trees vs. Events >---------------- >It seems like we need to decide early on whether we are interested in getting >the DTD as events or a tree. Arguing in favor of events is the fact that it is >more reasonable to build a tree from events than vice versa (less memory usage), >so events are the more basic form. However, I also think that what is returned >really depends on intended usage. I suspect that a tree will be the method of choice if it is used for retrospective exploration (i.e. after the parsing). In that case the tree will not be ordered. The only reason I can see for events is that they may help the parser build the DTD in a particular order (?efficiency?). I *hope* that we shan't get to the stage where memory usage of DTDs is a problem. I am aware that DOCBOOK takes ca. 3000 lines (but that includes PEs) - I assume that TEI in all its glory is larger. But even they shouldn't cause problems compared to document size. > >In my limited imagination, events are mostly useful for display -- read in the >DTD definition-by-definition and display it. This is a common operation with >the text in an XML document and is presumably why SAX returns events. Except >for displaying a DTD or building a tree, how else would DTD events be used? > >The two prime uses of DTDs that I can think of are validation and exploration. >Both of these require the information to stay in memory and be accessed >randomly, which (to me) implies a tree, hash table, or similar structure. Are >there any common uses of DTDs that require serial access? The *order* of declaration of elements in a DTD is presumably irrelevant. I imagine that parsers have to build the DTD in memory anyway AFAIR it was said on this list that the two uses of DTDs were: - syntactic/structural validation - processing minimisation I have added some other *possible* uses of XTD yesterday and it would probably be useful to group these and other suggestions to offer as questions. > >Flat Trees vs. Tree Trees >------------------------- >If trees are used, another question is what form the tree takes. XML-Data >currently defines a tree that uses XML's hierarchy as a way to group information >about individual elements. However, the relation between those elements is >actually flat. For example, the following DTD converts to the following >XML-Data structure: > >DTD: ><!DOCTYPE a [ ><!ELEMENT a (b)> ><!ELEMENT b (#PCDATA)> >]> > >XML-Data: ><schema id = "a"> > <elementType id = "a"> > <element type = "#b"/> > </elementType> > <elementType id = "b"> > <string/> > <?elementType> ></schema> > >Notice that the definitions of a and b are at the same level. That is, when I >build a DOM tree from this XML, a and b are siblings, not parent and child. >When exploring a DTD, the parent-child relationship is far nicer -- I move up >and down the DOM tree and get the metadata I need at each level. On the other >hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and I'm not >sure if representing children with multiple parents would even be possible, >given the strict nesting requirements of XML. Comments? In JUMBO1 the *elements* are all children of a root XTD node. Each element has a number of ATTLIST children, and also a single contentSpec child. The ATTLIST is very flat (just type, default, etc) but the contentSpec can be hierarchical. I used the terms in the spec (Choice and Seq) as nodes which a contentSpec could possess recursively. I'd strongly urge sticking to this because it makes it easy to extract sub-contentSpecs and trivial to parse. I don't see that there is a useful way that a non-flat tree could be built up - if the tree is attempting to show the children directly (e.g. not using Choice and Seq) then we get into recursion. This is the sort of problem that is faced by tools like Earl Hood's (very nice) dtd2html - a Perl script for producing SGML documentation. He expands content models fully the first time and then uses ellipses when the elements re-occur at a lower level. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format