[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Integrity in the Hands of the Client
From: Joe Lapp <jlapp@a...> In this posting I'm going to be a little bold and propose that both the XML and DOM specifications are flawed. The existence of these flaws ride on the assumption that we care to use SGML/XML to create domain models for data where the data evolves over time. I'm also assuming that it is unacceptable for the client objects of a document to maintain the integrity of the document. I've not been following this thread closely, so I apologize if I get something wrong. I'll stop, first, too, to note that when interconverting data formats we rarelt can represent every validity constraint in the new format -- If I dump a DB record to tabbed files I lose referential (and all other) integrity checks, but I may have much better luck moving to a compeiting vendor's system. When using XML, we may reasonably expect that the richer formalism will give us more control (and for hierarchical data, that expectation is well (if not perfectly) met. We may also expect that other properties can be preserved (eg IDrefs eliminate broken pointers, but don't allow typed references), but some probably won't be. We need to design the DTD for this document. Here is our first pass: <!DOCTYPE catalog [ <!ELEMENT catalog (books, authors)> <!ELEMENT books (book*)> <!ELEMENT authors (author*)> <!ELEMENT book (summary)> <!ATTLIST book title CDATA #REQUIRED author IDREF #REQUIRED> <!ELEMENT author (bio)> <!ATTLIST author id ID #REQUIRED name CDATA #REQUIRED> <!ELEMENT summary (#PCDATA)> <!ELEMENT bio (#PCDATA)> ]> To get a better feel for what we've designed, we create a little sample document: <catalog> <books> <book title="The Postman" author="A1"> <summary>Text goes here.</summary></book> <book title="Startide Rising" author="A1"> <summary>Text goes here.</summary></book> <book title="Hitchhiker's Guide to the Galaxy" author="A2"> <summary>Text goes here.</summary></book> </books> <authors> <author id="A1" name="David Brin"><bio>Text goes here.</bio></author> <author id="A2" name="Douglas Adams"><bio>Text goes here.</bio></author> </authors> </catalog> This seems to work. It stores information about books and authors, and it is not possible to add a book without associating it with the description of some author. But we can see that it breaks as soon as we add any other kind of element that has an ID. We know that every book will eventually have an ID, because we'll soon want to have an element whose content elements reference the New York Times Bestsellers. Once we do that, nothing prevents an administrator (or the client program he or she is using) from indicating that the author of a book is another book. This DTD will not suffice. The problem with this is that it uses database style "joins" on ID values. XML's most powerful constraints are tree constraints, based on containment. For example the following structure does not have this problem: <catalog> <authors> <author id=A1><name>David Brin</name> <bio>whatever<bio> <books> <book><title>The Postman</title> <summary> whatever </summary></book> other books go here. If we have more than one author: <book coauthors="A2 A3"> ...etc </book> </books> </authors> </catalog> Note that you do have to pick a "by author" or "by book" hierarchy to use this technique. I also moved title and author into elements: titles frequently contail markup, and names can be complex enough that it's often a good idea to be prepared for the eventual need for markup. Consider Chinese names where the order of family and personal names is different than it is in most European cultures. It seems that we might have to use links, but lets look at other approaches first. We entertain the idea that an author's books belong to the content of the author. We quickly throw that one out when we realize that a book can have more than one author. Or take an alternative approach (as I sketched above). I have not been able to find a way to have the document server force clients to ensure that whenever they add a book, that book is associated with some author. Clients are given the responsibility of maintaining the integrity of the document. No, Servers that want to impose non-XML integrity constraints (such as you are demanding) must impose those constraints themselves. XML, like traditional databases (which seem to be your starting point) represents some things well, nd some things very badly. Attempting to create relational schemas for XML documents produces that same kind of hairy, unnatural specifications and requires similar extra integrity checks on update to represent typical document information. Basically, I think that the flaw of not providing what you ask for is in fact no flaw, but an artifact of different tools being targeted to different purposes. There is a difference -- since XML is a data format and _not_ a processing technology the way a database is, it may be useful as a way to represent data and transport best _manipulated_ in non-XML ways. You get a rich language of structures for free by using an XML parser, and that may save some time in writing data transporters -- for instance, a DTD for the transport of complete RDB table sets would be easy to write -- but checking those tables for semantic correctness would not be one of the things you get for free. I think the XML specification as it currently stands is extremely well-suited for describing data that does not change over time, but that it is lacking in specifying how documents are to evolve. You overstate the case here. It's suited for describing how the data whose integrity costraints correspond to XML validity should evolve. These constraints are not theoretically justified, but are pragmatically justified by the fact that people can get useful document management work done using them. This is the same thing with relational database -- all those theorems about normal forms and algebra merely show that the system is well defined -- the fact that tables are useful for many kinds of data is still a pragmatic one, and not a theoretical one. The world is still full of things that don't fit the relational model very well. I know that our current data-manipulation-savior is OO databases, bit once we have experience with them we'll grow to understand the ways in which they fall short of perfection as well. Nevertheless, future versions of XML might have small improvements that will help cases like this. The provision of multiple ID spaces (ability to have typed IDs and typed IDrefs) is one that has been suggested a number of times. It would also be very useful in documents, since (begin example) only <figures> would have "fignum" attributes, and so the user of "figref" attributes will be prevented from referring instead to a paragraph of random text. Small suggestions like this that also offer a lot of leverage may get considered for XML 1.1. (Small in the sense that little syntax is required to support it, and little processing beyond that already required for ID/IDREF processing). To my mind, such suggestions are compelling to the extent that they are useful in _document_ management (as well as general data management) because that really describes the primary focus of XML design. XML may well be useful beyond that area, but I think it should stay away from bidding on the "universal data format of the ages" title, that may well be impossible to ever attain. -- David ------------------------------------------+---------------------------- David Durand dgd@c...| david@d... Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ | MAPA: mapping for the WWW xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|