[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Database Decision Tree?
Mike has given me the courage to take a bite as well. Thanks Mike. I had read this article a while back on alphaWorks. Here is my 2 bits if anyone cares. Let me start with the objective of the article as stated by Kevin Williams: "My objective in this column is to see when it makes sense to store structured data in these specialized databases." Then go to the conclusion: "While those databases are ideally suited to the management of document-oriented information (unstructured or semi-structured data), they don't make sense for use for structured data. If you need to make structured information accessible as XML, you're better off taking advantage of the XML support provided by relational-database vendors instead." I agree with Kevin 100% if the purpose is to make existing structured information accessible as XML. Dan Suciu of University of Washington terms this as "XML Publishing" and defines it as "the problem of transforming existing, relational data into XML." It is a good paper to get an understanding of the problem and for those who care to read it, the title is "On Database Theory and XML" by Dan Sucio, University of Washington (www.cs.washington.edu/homes/suciu). He goes on to say that "this is the same as defining an XML view over the relational data, but this view is considerably more complex than relational views." He goes on to show the problems and issues with this approach. Another paper that I found useful in this regard is the one from Josephine Cheng and Jane XU from IBM Silicon Valley titled "IBM DB2 XML Extender - An end-to-end solution for storing and retrieving XML documents." They do not go into all the theoretical issues but essentially describe the approach taken in DB2. I find their statement "Also, if retrieval of an entire document is desired, performance is faster because there is no need to compose the XML document from DB2 data." This essentially reinforces Dan's conclusion about creating a XML view over relational data. From a purely implementation standpoint, this is achieved either by using JOIN or UNION. I believe Microsoft is using the second and there are performance advantages to this. So this leads me to conclude that even simple publishing of XML data from standard relational databases is not a trivial problem and we should be aware of the performance and other issues (the queries to create the view performs more joins than necessary and typically are not minimal). Now let us examine the issue of storing XML documents in RDMS as implemented by the RDMS vendors. There are 2 approaches that I have seen (at least in Oracle 9i). 1. Store the XML document in its entirety and use text indexing using the Context engine This approach has the drawbacks that Kevin correctly points out in his article. The performance is also questionable. Also questionable is the ability to search across a collection of documents for an Author Firstname and Lastname. 2. Use the DTD or Schema to derive a relational schema This is the approach that most RDMS vendors want us to use (again at least from what I know of Oracle and DB2). As Dan points out in his paper, one table is created for "each element type that can occur in a collection position." But if the content model changes slightly (example Dan gives is one such case person changing from (name, phone) to (name, phone*), the underlying relational schema has to change. There are a bunch of other issues and for a full listing look up Dan's paper. The problem we have with "Native XML" databases as I see it is the theoretical models and research is way behind practical implementations and use. RDMS is based on solid theoretical work done by E.F. Codd. I have seen signs of the academic community picking this problem up and creating the foundation on which XML databases will be built. We at B-Bop is working with one such group and I would encourage all the academic lurkers in this group to start looking at this problem more closely. Till the theoretical foundations are put in place, we will keep on arguing the merits and de-merits of all the approaches and users will have to use their judgement in picking one solution over another. Having said that I would like Mr. Williams to look at all the issues a bit more closely before making blanket statements like " While those databases are ideally suited to the management of document-oriented information (unstructured or semi-structured data), they don't make sense for use for structured data." But I also know that we will have to live with this level of insight for some time to come. Regards, Soumitra "Champion, Mike" wrote: > > > Native XML databases; a bad idea for data? > > http://www-106.ibm.com/developerworks/library/x-xdnat.html?n-x-10181 > > > > OK, I'll bite. > > 1) Storage: "decomposing the XML document to persist it to a relational > database is not all that difficult" -- I have a one-word answer: "DocBook?". > <sneer> > "Do you really want your information's structure to vary?" Uhh, no ... but > do I usually get a vote? Simplicity of storage is probably the biggest > advantage of a native XML DBMS: it is easy take a chunk of XML and store it > in any of the native XML DBMSs I've researched; it takes considerable > analysis and programming in the XML-enabled RDBMSs. This makes native XML > DBMSs valuable as reliable cacheing/logging tools for simple XML messages, > not to mention making it possible to do useful things with complex > documents. > > 2) Retrieval: "some of the native XML platforms require that the entire > document be returned from the database" Uhh, which ones? Not the ones > supporting XPath, which is pretty much all of them, AFAIK. > "many relational database vendors are currently implementing thin XML > serializer wrappers that enable them to generate XML documents on demand > from relational data." Right, but I thought we were talking about XML data? > > 3)Searching: "searching will return only a set of XML documents; the calling > program must then take further action on those documents, if necessary." > True. SQL has a far richer set of data manipulation operators than XPath > (XQuery addresses this to some extent), and is (loosely) based on a general > theory of data, and XML is much more ad hoc. > "these features are indispensable when working with traditional documents, > where context plays a large role in meaning, but far less important when > working with structured data" Absolutely; if you don't have relatively > complex document-like XML, the *searching* features of a native XML DBMS > don't buy you much over storing simple XML in an RDBMS. But if you do have > "interesting" XML schema, e.g. with recursive elements, simple XPath queries > can address problems that seriously challenge SQL experts. > > 4) Aggregation: "pulling information together and rolling it up (into sums, > averages, and so on) is quite difficult." He's right. Guilty as charged, > although he didn't emphasize "joins" as much as I would have. Again, XQuery > will help, but what we really need is a better theoretical understanding of > how hierarchical XML data relates to the relational model, sortof an > "InfoSet Algebra" if you will. XQuery is at least addressing this problem of > formalizing XML operators, and if even if they fail it will be dissertation > fodder for a new generation of CS theorists :~) > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this elist use the subscription > manager: <http://lists.xml.org/ob/adm.pl> -- Soumitra Sengupta, Ph.D. Co-Founder and C.T.O. B-Bop Associates Inc. Phone: 650-340-2700 Fax : 650-340-2701 Email: soumitra@b... http://www.b-bop.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|