|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Thinking About Data (was RE: Enlightenment via avoiding the T-wor d)
> -----Original Message----- > From: Tim Bray [mailto:tbray@t...] > Sent: Wednesday, August 29, 2001 2:19 PM > To: xml-dev@l... > Subject: Re: Enlightenment via avoiding the T-word > > > There's scope for a nice general essay here about the > differences between ways of thinking about data; basic WF XML, > OOP, and RDBMS represent instructively different thought > patterns. PSVI and DTDs and SOAP and so on fit into this > pattern in interesting ways. -Tim That sounds like an interesting topic ... I think it's important to sort out the various "theories of data" because a) we collectively want to steal the best ideas of the RDMBS and OO people, but haven't managed to do this in a coherent way yet; and b) in the real world, we have to make RDBMS, OO, and XML applications and databases interoperate, and we don't have really great ways to do this yet. I don't have well-formed opinions here (much less valid ones!) . C.J. Date has written on the RM and OO aspects of this subject in his "Third Manifesto" book. I haven't read it, just the article at http://www.dbpd.com/vault/9808date.html The article is called "Back to the Relational Future", so I guess you know where he comes out, but it does provide us a nice starting point. Here's a strawman idea on how XML, OOP, and RDBMS "think about data" ... and some thoughts about how the PSVI may fit in. Relational Model - Quoting from the Date (obviously an authoritative source!) article above (but I know he expounds on this elsewhere): "So a database is, abstractly, just a collection of true propositions. And relational theory supports this view of databases very directly, because tuples in relations (rows in tables, if you prefer) are directly interpretable as such true propositions." So, the relational model uses relations and tuples as the operands, the operations defined in the relational algebra as the operators, and propositions about the values of tuples in relations as the fundamental unit of analysis. Critical to the RM is the notion of "integrity", i.e., ensuring that the discrete propositions in a database provide an internally consistent set of "theorems" about the world the databse describes. RM purists don't agonize over types, type inheritance, type operators ... that's all abstracted away in the concept of the "domains" fom which the tuples take their values. OO paradigm -- I'm not sure who, if anyone, is the authoritative source, or if there *is* anything that could be called an "OO model" at the same level of specificicity as the RM. (Date loves to rant about this, pick nits with OO gurus such as Soustroup and the inventors of UML, and generally promote the pure RM as cleanly doing everything that the OO paradigm tries to do in a muddled way ...). My best guess is that the abstractions "class" and "object" are the operands, and there would be a few generic "operators" (such as construction, inheritance, and property accessors) and a whole lot of class-specific operators if someone did formalize an OO model. I've been wondering if OO purists (if there is such a thing!) *do* think about data; one might think that the whole point of the OO paradigm is to present the object as an abstract operand with well-defined operators, and whatever data the instantiation of the operators operate on is encapsulated away from view. Or, perhaps the OO paradigm encourages us to think about data ONLY in terms of "types" (types == classes???), class hierarchices, and the operations on classes. Well-Formed XML -- Obviously this is "just" a syntax and nobody has defined anything like an authoritative theory of how XML "thinks about data" ... but there are a lot of free-floating ideas out there. Perhaps it is nothing more than a neutral serialization format for RM tuples, Objects, and unstructured text, and it has no way of "thinking about data" except as syntax. Those of us who work with XML databases, however, have to come up with *some* conception of how XML relates to the RM... and since my "day job" involves a lot of explaining of when we think RDBMSs are mose appropriate and when XML databases are most appropriate for different system requirements, I've given this a fair amount of thought. The way I see it, WF XML "thinks of data" not as discrete propositions about some world that must be kept internally consistent, but as "bundles" of inter-related propositions that describe a snapshot of the world. Thus, the XML data model is inherently at a higher level of granularity than the RM. The inter-relationships are hierarchical (remember, this is WF XML, no ID/IDREF relationships defined), meaning that they are much less flexible than those allowed by the RM, but since they're "hard-coded" in element/attribute hierarchies, we don't have to worrry about referential integrity constraints -- any thing well-formed is internally consistent even if it's inconsistent at some higher semantic level. Thus, the basic operands of some formalization of WF XML would be trees of some sort, not tuples. (Of course one *could* look at the indivual components of an XML document as discrete propositions that can flexibly inter-relate rather than being fixed in a hierarchy ... but the RM already defines how to do that, so there would be no value in formalizing an XML flavored version). XML "thinks about data" differently from the RM in other ways, notably by specifying that the sequence and embedding of components matters. It thinks about data almost completely differently than the OO paradigm, because there is no conception of type nor inheritance, nor any operators other than maybe graph-theoretic structure navigation and manipulation operations. (Interestingly, the relational model "made its bones" off the CODASYL model 25 years ago by showing that these structure navigation operations were unnecessary. Date's THE DATABASE RELATIONAL MODEL has a very clear discussion of this historical episode that us XML weenies really need to come to grips with somehow). PSVI XML -- I think it's clear that this is WF XML on OOP steroids (or hallucinogens, if you prefer). I'd guess that C. J. Date and the other relational purists would (if it ever gets on their radar) think of it as the worst of the fuzzy OO world plus the worst of the hierarchical XML world. Perhaps the OO people will think more kindly of it as OO that pays some attention to data serialization and interchange up-front rather than relegating it to CORBA to worry about. Insisting on schemas and types certainly makes XML more OOP-friendly than WF XML is-- e.g., we can use databinding tools to generate classes for handing data and we can access XML elements as instances of a Java/C++ class rather than as a mess of character data. On the other hand, it appears from what we've seen on this list that you have to buy into the Schema/PSVI/OO types paradigm big time or not at all. Trying to mix the WF XML view and the PSVI (as Sean McGrath noted a few posts back) creates "brain puree" and you start to question your own sanity :~) I'd been resisting Simon's idea that maybe it's time for the WF people and the PSVI people to go their separate ways, but writing up this e-mail has gotten me thinking that while we can share foundations and tools across the WF/PSVI divide, the two camps seem to have a fundamentally different way of thinking about data. Anyway, again all this gibberish is just a strawman proposal to try to get more people to think about the underlying "theories of data" and share their brainstorms, headaches, delusions, and hallucinations. Tim Bray, you brought this up ... what might YOU say in a "nice general essay here about the differences between ways of thinking about data"? ... feeling free to smack down my strawmen; that's what I set them up for!
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








