[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Integrity in the Hands of the Client
In this posting I'm going to be a little bold and propose that both the XML and DOM specifications are flawed. The existence of these flaws ride on the assumption that we care to use SGML/XML to create domain models for data where the data evolves over time. I'm also assuming that it is unacceptable for the client objects of a document to maintain the integrity of the document. In order for me to most convincingly convey the point, I need you to bear with me as I explore an example of how we might use XML. I do not directly suggest how to correct the XML specification, but I think I end up implying a few different solutions. However, it seems that the correction to DOM is a bit more straightforward, so I make the obvious suggestions. Suppose we want to create a document that contains information about books and about the authors of those books, and suppose we require that whenever the document has a book, it also has information about the author of the book. The document will reside on a server, and one or more administrators will populate the document from their clients. Other users will be free to browse the document. We need to design the DTD for this document. Here is our first pass: <!DOCTYPE catalog [ <!ELEMENT catalog (books, authors)> <!ELEMENT books (book*)> <!ELEMENT authors (author*)> <!ELEMENT book (summary)> <!ATTLIST book title CDATA #REQUIRED author IDREF #REQUIRED> <!ELEMENT author (bio)> <!ATTLIST author id ID #REQUIRED name CDATA #REQUIRED> <!ELEMENT summary (#PCDATA)> <!ELEMENT bio (#PCDATA)> ]> To get a better feel for what we've designed, we create a little sample document: <catalog> <books> <book title="The Postman" author="A1"> <summary>Text goes here.</summary></book> <book title="Startide Rising" author="A1"> <summary>Text goes here.</summary></book> <book title="Hitchhiker's Guide to the Galaxy" author="A2"> <summary>Text goes here.</summary></book> </books> <authors> <author id="A1" name="David Brin"><bio>Text goes here.</bio></author> <author id="A2" name="Douglas Adams"><bio>Text goes here.</bio></author> </authors> </catalog> This seems to work. It stores information about books and authors, and it is not possible to add a book without associating it with the description of some author. But we can see that it breaks as soon as we add any other kind of element that has an ID. We know that every book will eventually have an ID, because we'll soon want to have an element whose content elements reference the New York Times Bestsellers. Once we do that, nothing prevents an administrator (or the client program he or she is using) from indicating that the author of a book is another book. This DTD will not suffice. It seems that we might have to use links, but lets look at other approaches first. We entertain the idea that an author's books belong to the content of the author. We quickly throw that one out when we realize that a book can have more than one author. Now we consider having authors belong to the content of a book, but we throw that idea out because authors may author many books. It is possible to put author information in the content of each book, but then we'd be duplicating the lengthy bio and wasting disk space as well as introducing the headache of managing duplicate copies. The same problem arises if we were to duplicate book information under each of the authors of the book, especially since each book has a lengthy book description. So now we ask whether links can do the job. Links allow us to use URLs and XPointers to reference other elements. For the moment, consider trying to accomplish our task using a single DTD, so that all element IDs have the same scope. In this case, the URL of any link references the document that contains the link, so all of our distinguishing information resides in the XPointers. The ID() location term looks useful, but this term cannot constrain the element type of the element that it references. Using ID() as the first locator term would not be sufficient to distinguish between books and authors. Suddenly a brilliant idea comes to mind. We'll use a locator term to specify the <authors> element and then follow that with the ID() term to select the idea of the particular <author> element. But this idea has a problem: when the ID() term appears, it must appear as the first locator term. Another idea comes to mind. We could use the following combination of locator terms: CHILD(1,authors)(1,author,id,'A3') Here 'A3' is the identifier of the author. We know that we cannot try to match the author's name, because more than one author may have the same name. ID's are guaranteed to be unique. That seems to work. Something similar could have been accomplished by separating books and authors into different documents and then using the URL portion of the href to specify the document that contains the target element. However, these link solutions all have one problem: nothing in the link specification allows a link element declaration to constrain the kind of resource to which a link links. WD-XML-LINK-970731 indicates that an href is an URL, and that when the URL references another XML document, XPointer locator terms may be appended to the URL. I do not see any mechanism by which a link element can constrain the kind of element that the link references. I have not been able to find a way to have the document server force clients to ensure that whenever they add a book, that book is associated with some author. Clients are given the responsibility of maintaining the integrity of the document. The problem grows more complicated when we also ask that no author exist in the document unless we also have at least one book be associated with the author. A solution to the first problem would not be a sufficient change to specifications in order to guarantee a solution that handles this additional requirement. By having constraints operate in both directions we now require that every change to a document occur within a transaction, so that the document is validated against the DTD only at transaction boundaries. (If every book had to have at least one author and every author had to have at least one book, then when it comes time to add a new book by a new author, the document will not validate against the DTD after we add one and before we add the other.) The example I have given here may seem trivial. Surely we can find a way to live with books that don't have associated author entries and authors that don't have associated book entries. However, in general, constraints between elements will be important. For example, it would not be acceptable to store away an account deduction entry without having an associated account entry or to have an account entry that does not have at least one associated account-owner entry. It seems to me that there are very few domains that can be represented without these kinds of constraints. I think the solution to this problem resides partly in the XML specification and partly in the document access language. A DTD needs to be able to express these kinds of constraints among elements, so that the document server can enforce the constraints. We would then not be relying on the proper behavior of all the clients that wish to add to or modify the document. (Let me know if you need an argument for why clients should not hold this responsibility; I'm assuming we agree on this point.) The access language also needs to reflect the solution because in order for a server to implement constraints, all document update operations must be couched in the language of transactions. That is, every document update operation must be associated with a transaction. The DOM model allows us to manage documents from a client, so long as clients assume part of the responsibility for maintaining object model constraints. However, if we decide that the document server is responsible for maintaining these constraints, then the DOM model as it is currently architected will not suffice, since its document-update operations are not architected around transactions. Moreover, I do not see a way to extend the current DOM design so that it can safely support transactions. One way to correct DOM is redesign it so that it submits query/edit objects to the server, where each query/edit object is submitted via a transaction object. Another way to correct DOM is to add a transaction parameter to all document-update method signatures. I don't think of this latter approach as an extension to DOM, since the corrected DOM would not be backwards-compatible with the current DOM. I think the XML specification as it currently stands is extremely well-suited for describing data that does not change over time, but that it is lacking in specifying how documents are to evolve. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@a... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|