[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Data storage, data exchange, data manipulation (was RE: Against t heGrai
Well, this issue has been worrying me a lot for two years now, so I'd like to share my thoughts on the subject... A] Data storage using XML I don't think a node-labeled tree (the XML model is a tree, more restricted than a graph) structure can model all kind of data easily and efficiently. Likewise, relational and object model cannot model all kind of data easily and efficiently. The key word here are "easily" and "efficiently" : okay, for any given data to model, you can find a hierarchical (e.g. XML) representation, a network representation (the node-labeled graph model), a relational representation, an object representation, or more exotic representations (e.g. the Caché model). But depending on your data, one of these models will rise out at the "best" one, in terms of ease of implementation and of efficiency in queries and updates. So I believe there is a whole set of problems that will benefit from XML databases (which are I believe based on the hierarchical database model*, maybe Mike can confirm/infirm). The storage, indexation and querying of a set of document-oriented data is a good example. But XML databases isn't or (won't) be a revolution, blasting all other storage models. We could even say that the XML database model is just a come back of the hierarchical model that was supposedly "killed" by the relational model back in the 80s. I don't think XML databases are the "next thing". B] Data exchange using XML Anyway, whatever the database model you chose, you'll have to exchange data between your database and other systems (a business application, another database, etc.). As everyone is not using the same data model, you'll have to find a data model for your data exchange that : - can capture most semantics of your data => it has to have a way to express basic structure. - be as simple as possible to allow for a wide audience => we should look for the "largest common divisor" (from which you can build any other models by adding things) rather than the "least common multiple" (from which you can obtain any other models by building subsets). - can easily be sent on a wire => the serialized form of the structure has to be easily parseable and standardized. Surprise, surprise, XML is AFAIK the right answer to these needs. Things as simple as CSV files do not enable us to capture enough semantics, and more complicated solution like Java serialized objects or CORBA objects-by-values are overkill. Here are some other arguments in favor of the hierarchical model : - you never exchange a whole complex set of data between two different systems. You rather exchange subset or views of the whole database. The hierarchical model should be sufficient to exchange views, even if the underlying model is more complex (a true node-labeled graph, or an object model, for instance). - when considering the particular need of data extraction for presentation to human beings, the hierarchical model is the most structured model that can still be readable (that is, not only by geeks). After all, human beings should be considered as potential systems to exchange data with :). AFAIK, the only ways a computer can exchange data with a human being are serial, and I feel that hierarchised text or speeches are the highest form of structured, serialized data that we can understand. C] In-memory data storage and manipulation using XML This last point on presentation is very important to me, as its consequences finally made me to abandon any attempt of modeling data as full-featured objects in the development of presentation layers (I've been in charge of the development of a multi-modal - HTML, WAP, iMode, VoiceXML, etc. - presentation layer for my company). The current fashion in Java presentation layers (as I've seen at JavaOne this year) is to use JavaBeans to exchange data between the other application layers and the presentation layer. I have followed some very, very strange sessions where the speaker was presenting us how he was extracting data from a RDBMS, mapping it into objects (possibly using an object-relationnal mapping tool) and directly using these objects in JSP pages. I have followed an even more strange session where the data were acquired by a call to a Web Service (the new hype this year), thus directly in XML format, then mapped into Java object, then sent to the JSP pages. I've even seen framework that sent XML data to the JSP, with custom taglibs transforming the XML data into custom Collection objects. To save development time, deployment time, and memory (all thoses classes modeling data come at a high price), we chose not to model data as full-featured object, but simply as XML DOM Documents. We were directly mapping any external data (from RDBMS, ODBMS, LDAP directories, etc.) into XML, then manipulating these data using XSLT or Java code when XSLT is not enough, then applying a final XSLT transformation based on the output device. So, there is a third usage of XML, apart from data exchange and persistent data storage : transitory, in-memory data representation. Of course, it is more a matter of representing data as a node-labeled tree than representing it as serialized XML with tags and all, but the "XML spirit" is there. This usage has a lot of advantages, at least for front-end applications : - it saves a lot of memory by removing application-specific classes and replacing it with a small set of classes, the DOM. This means that a single application server can handle a lot more of different data types. This is important to us as we designed our presentation layer for Application Service Provider (ASP) usage. The ASP context means that to keep costs as low as possible, you run many different applications in the same application server. If each application had its own set of application-specific classes to model data, the application server would be crowded with classes. - it saves a lot of time and energy by the sheer flexibility of XML. If your data and application code are written in XML, adding or removing data to the presentation is way more easy than if data was modeled in application-specific classes. You don't have to modify the application-specific classes, recompile the whole application and redeploy it. All those who have deployed applications using entity EJBs as the object-relational mapping layer know what I'm talking about. - data exchange is straightforward : just parse the XML document you've been sent, or serialize the Document object, et voilà ! No more mapping. There are of course some disadvantages, but I think it's just a matter of work and time before they disappear : - Java APIs for XML document manipulation are awkward. Even if some new DOM API appear (e.g. JDOM and dom4j), you can't beat the simplicity of just writing <foo><bar/></foo> to create a foo element containing a bar element. Moreover, there is no current standard for XPath APIs (though an API is being specified by the W3C at http://www.w3.org/TR/2001/WD-DOM-Level-3-XPath-20010618/). To solve this problem we have developed an extensible XML/Java based language, quite in the same spirit as the Apache Cocoon XSP pages. This language enables us to write <foo><bar/></foo> directly in Java code, as well as XPath expressions, which save us a considerable amount of time. - contrary to Java class definitions, XML schemas (or schemata if you prefer :) are quite difficult to read and write. The "difficult to read" issue can be solved by schema documentation tools. XML Spy, for example, can generate a pretty good documentation based on a W3C XML Schema (though some current limitations prevent us for using this feature efficiently). The "difficult to write" issue can be tackled using tools, but unfortunately having a good editor is not very helpful if the schema meta-model is inherently complex. This is why we are looking for a simple, readable schema language. - compile-time checks are not performed. If you call person.setFavoriteColour() on a Person instance, and the Person class has not this method, you will get a compile-time error. Using Java + DOM, a compiler cannot see an error when you try to add the "favoriteColour" attribute or child element to a "person" element. As we have developed a custom XML language compiled to Java code, we feel that it is possible to make the compiler schema aware, thus enabling compile-time checks when the schema of the manipulated documents are known. - "this is not pure object oriented programming !" I don't know if it is the same out there, but here in France it matters to a lot of people. My current 5 seconds answer is that presentation layers usually do not require a pure object oriented model for data. It does not mean that the underlying framework of the layer is not object-oriented, far from it ! - "yeah, but then how do I associate a behaviour to my data ?" My short answer is "why would you like to do it in a presentation layer ?". If it's for validation, then either the validation is simple and it can be done using schemas, or the validation is complex and it cannot be done at all in the presentation layer, so you have to send the object to another layer, and once again you benefit from the easy serialization/parsing. If it is for another purpose, I recon the model has a limit : no encapsulation of data. Until now, we have not found it to be blocking nor required in the presentation layer (after all, you're here to show the data, not hide it), but we are thinking about the problem. So, there are still a lot of work to be done, but we already are making benefits of this approach. Are there people out there that have the same views ? Regards, ----------------------------------------------------------- Nicolas Lehuen Responsable R&D - Head of R&D Ubicco - Multi Access Software Solutions http://www.ubicco.com/ * see for example http://www.cs.pitt.edu/~chang/156/14hier.html -----Message d'origine----- De : Joshua Allen [mailto:joshuaa@m...] Envoyé : vendredi 29 juin 2001 01:02 À : Mike.Champion@S... Cc : xml-dev@l... Objet : RE: Against the Grain: Pascal commentary about XML and databases >I keep hoping that there is some middle ground where the rigorous mathematics of the >relational model and the pragmatic usability of XML can meet and inform one another. In >private correspondence, Mr. Pascal assured me that a truly mathematical model of XML is >impossible, but I'm keeping an open mind. Hehe, this is pretty good reading. The only reason that RDBMS software dominates the market right now is because we are good at solving these problems, and RDBMS design has evolved to disallow users from asking questions that the database isn't good at answering. The fact that we ship databases that only permit things that we know how to answer efficiently does NOT imply that we will never be able to answer other questions more efficiently (in fact, RDBMS systems have evolved and gobbled up much of the research on data warehousing to include those techniques into the engines -- witness materialized views and bitmapped indexes). It is quite easy to see a trend in the industry that shows consistent continual progress at solving hard query problems. Of course some problems will always be hard (distributed cost-based query optimization is one), but I would point out that research on RDBMS optimizations has tapered off quite a bit and we have seen major increases in research geared towards semi-structured data in the past decade. So we are simply easing off on some of the traditional RDBMS constraints and beginning to allow things like recursive self-joins, ragged hierarchies, etc. and we are optimizing these things. I mean, we already solved the RDBMs optimization challenge (and remember that there were people predicting that SQL would never fly back in 1980) and now it is time to move to the next thing. XML seems like a very appropriate evolutionary step. As for saying that a truly mathematical model of XML is impossible; XML is simply a node-labeled graph. This is about as pure a discrete mathematics concept as you can get. It is easy to find graph traversal challenges that are NP-hard or need O(n^2) or worse. So? I think that areas of discrete mathematics that deal with graphs are currently the most vibrant area of research in the industry. The web itself is one huge graph structure, and research on ways to index the web, optimize routing, etc. all feed directly into techniques for optimizing XML processing. And it seems that TSPs and NP-Optimizations are all the rage these days. XML *is* math, and it's the *cool* math these days. Data processing married with XML is about as real as it gets. But I know this is all twice-told tale for you Mike. Regards, Joshua ------------------------------------------------------------------ The xml-dev list is sponsored by XML.org, an initiative of OASIS <http://www.oasis-open.org> The list archives are at http://lists.xml.org/archives/xml-dev/ To unsubscribe from this elist send a message with the single word "unsubscribe" in the body to: xml-dev-request@l...
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|