[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Normalizing XML [was: XML information modeling best practi
Ronald Bourret wrote: > This is the messaging view of XML documents. It is probably true when the XML > document is created from an XML-enabled database. > > However, the flexibility of XML means that it is not always true. XML > documents used to store semi-structured data correspond more closely related > to rows in a table and the design of the documents corresponds exactly to the > design of the database. In this case, you could view the XML document as a > transaction, but could also view it simply as the data > and inserting it into a database as the transaction. > > There are also XML documents that don't fit the transaction view at all: XSLT > documents, XML-RPC documents, etc. Actually, they do--by virtue of being, as you say, documents. Put another way, at the level at which they are XML--that is, documents--they can (and at that level must) be processed as text, as lexical entity bodies, regardless of the semantics which might be assigned to their syntax as XSLT, XML-RPC, etc. This is precisely the same argument we just saw in the 'XInclude where I bloody want to' thread. In that thread Uche asked 'if you construct a source document that uses elements in the namespace and with the name reserved by the XInclude specification, why on earth would you blame a processor for acting on those instructions?' The answer is that you wouldn't blame an *XInclude* processor for doing that, but you might well blame an *XSLT* processor for it. At some level, if a processor of any sort is to operate upon XML--as XML--then it must operate on the text as bare syntax, just as Elliotte expects an XSLT processor to act upon xi:* namespaced nodes, without elaborating from the syntax the semantics of the includes. Likewise here, XML documents--whether their semantics are understood by specialized processors to be XSLT, XML-RPC, or whatever--are, at the level at which they are XML documents, processable by an appropriate database engine as *data* transactions. The database engine appropriate for that is one which handles XML--which is to say marked-up text--as the object of basic CRUD transactions. Such a database engine must follow the markup of the document where it leads, provided only that the document itself is well-formed XML. For example, if that engine is asked to commit a document, then the 'class' of that document is as specified by the GI and namespace of its root element, which is to say its fully qualified type in the markup sense. That database engine may not complain if it is then presented with another document of the same class--which is to say one which presents in its markup the same fully qualified type--but which exhibits an entirely different structure beneath the root element. According to its markup, and therefore as XML, that second document quite simply is another instance of the same class as the first. The point is that just as relational database engines operate on records, often composed through joins of multiple relations, XML database engines should operate on documents, composed of elements and attributes. It is a premise of the relational concept that the rows of a given relation are structurally identical and, by extension, that complex records composed on joins of those relations are too. That is not only not a premise of XML, but the peculiarly XML concept of simple well formedness means that there is no expectation that documents declared as of a given class by their root element will therefore exhibit the same, or even similar, structure. So if what we are talking about here is the database handling of XML as XML, the most important consideration must be the markup. Normalization within a document, or within the elements of that document is simply alien to the rules by which XML is structured. You really cannot speak of 'XML documents used to store semi-structured data', let alone conclude that such documents 'correspond more closely related to rows in a table and the design of the documents corresponds exactly to the design of the database'. No matter how many such documents of a given class you see you cannot presume that any future document of that class will exhibit the same structure, because that is a constraint which XML does not impose. Of course, you could limit your database engine to processing only documents which exhibit a particular structure and content model, but under such constraints it is no longer an XML database engine, but only a database engine for a particularly limited class of documents. Of course, you will have to limit your database engine to exactly such constraints of working only on a few carefully predefined document types if you want that engine to read particular semantics from the syntax of a document and process it in accordance with those semantics, rather than as simple well formed XML text. I would ask that you please not call that an XML database engine. A true XML database engine, however, can operate very much as Hugh Chatfield describes. A document (journal?) is submitted for commit (posting?) to the larger database of such documents (ledger?) maintained and manipulated by the database engine. Each of the simple CRUD operations consists primarily of this commit, and there is very little difference among those operations except in how cascading changes resulting from the data transaction must be carried out. The principal effect of the commit is to set the current or most-recent-version value of the elements (in their fully qualified form) present in the document committed. Beyond that, specific semantics must in fact be elaborated by custom processing which recognizes particular syntax. Think of this as database triggers. Where no trigger processes exist for the specific syntax committed, either no processing is done or an error can be raised. That is, however, not an error in the performance of the database engine, which in committing the document has done exactly what it was designed for. It is an error in the comprehensiveness of the processing provided for the data actually encountered. The traditional solution in such cases would be to segregate the unexpected document, bring a human into the loop, and design appropriate processing for dealing with the new circumstances. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|