[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: ANN: XML and Databases article
[Ron Bourret:] > I've read Paul's tutorial and the GroveMinder summary on the Web, so > let's see if I've got this straight. A grove is basically a > property set, broken down into classes, each of which has > properties. There are probably relationships between those > classes. For example, a grove for XML could have classes for > elements, attributes, entities, and so on, where the element class > points to the attribute class. A grove for a relational database > would have classes for tables, columns, etc., where the table class > points to the column class. Pretty close, but not quite on the money. First of all, a terminological problem: A grove is the set of objects that results from understanding (parsing and processing) some particular logical resource. No grove is made from more than one logical resource (I say "logical" resource because some single resources are distributed in multiple physical containers). However, more than one grove can be made from a single resource. This is because resources have multiple layers. For example, in the case of XML documents, there is always the XML syntax layer of "understanding". The property set (schema) for this layer is probably strongly reminiscent of the DOM. However, there are one or more vocabularies used in every XML document (there's always at least one because the element types have names, even if there's no DTD). The semantics of these vocabularies may imply "emergent properties" of the information contained in the resource, and there can be a property set for each vocabulary's emergent properties. So preparing a single resource for application-internal exploitation may involve creating groves for each vocabulary. By giving names to the emergent properties of vocabularies, such property sets can be, in effect, APIs to the semantics of each vocabulary, thus opening the way for vocabulary-specific software engines, and for far more reliable cross-application information interchange than the Web has ever seen. So, instead of saying, > A grove for XML could have classes for elements, attributes, > entities, and so on, where the element class points to the attribute > class. ... you might better have said any one of the following (this has to be said with extreme precision, so look closely): | A property set for the XML language could have classes for elements, | attributes, entity references, and so on, where the element class | has, as one of its nodal properties an "attribute specification | list" property, whose value is a list of "attribute value | specification" nodes. or: | The primary grove form of an XML resource could have nodes | conforming to the "element", "attribute specification", and "entity" | classes, and so on, where the "element" class has, as one of its | properties, an "attribute specification list" property, whose value | consists of a list of nodes that all must be of the class "attribute | value specification". or, in view of the fact that the DTD of an XML resource is part of its grove (when it appears or is referenced by the DOCTYPE declaration in an XML resource): | The primary grove from of an XML resource could have element type | definitions, attribute list definitions, entity declarations, and so | on, where the element type definition class has, as one of its nodal | properties, an "attribute definition list" property, whose value | consists of a list of nodes that all must be of the class "attribute | definition". The second problem with your summary statement is that "points to" is actually an implementation detail. The standard only says that nodes (objects) in groves have properties, and the some properties can be "nodal" -- that is, the values of such properties can be other nodes (in the same grove and/or in other groves). The manner in which a node is represented to be a property value in any given implementation is almost certainly going to be via pointing (at least in a von Neumann architecture machine), but it's important to realize that that is an implementation decision, and it's inaccurate to say that "pointing" has anything to do with the grove paradigm. A property set can only say that the value of a property is nodal, and implementations of the grove paradigm must make it appear that the value of such a property is indeed one or more nodes, but how that is made to happen is not part of the standard (nor should it be). So, instead of saying: > "where the table class points to the column class" ... it would be much more accurate to say: | where the "table" class has a property named "columns" whose value | is a list of "column"-class nodes. > In this sense, the XML information set has much in common with > groves, as it is a property set. Yes, except that it's not yet clear that the XML info set will be expressed using the ISO Property Set DTD -- but this is merely a syntax issue. I agree with David Megginson: I expect it to be readily convertible. > Similarly, the DOM could be viewed as an API for a grove. Yes, to a single kind of grove, specifically an XML syntactic grove. (A grove governed by the properties of XML's syntax.) (Aside: I hope we're not facing a future in which the semantics of certain chosen vocabularies will be directly supported by future versions of the DOM. Such support should "plug into" (and be unpluggable from) the DOM. No vocabulary-specific support should become a required feature of all DOM implementations. For example, making XLink a vocabulary is fine; making the DOM able to support XLink but no other linking vocabularies would be the start of a long nightmare with a bad ending. To do that would significantly reduce the freedom of industries to design their own information architectures, and to evolve them according to their own perceived needs. It would also destroy the DOM, which must stay simple in order to survive. No API can do everything for everybody, and once you start putting support for DTD-specific (or namespace-specific) semantics into the DOM, where do you stop? I've watched a couple of systems bloat uncontrollably and meet their demise in similar ways, and the stage is perfectly set for the same thing to happen to the DOM.) > The XML information set is not a grove because ... it is not > ... expressed in grove notation. If you replace the word "grove" with "property set" (twice) in the above sentence, you are exactly correct. (There is no such thing as "grove notation". "Grove" is an abstract concept that, when sensibly implemented, makes a grove exactly as human readable as a hex dump of RAM in which there are C structs in no particular order.) > The DOM is not an API for a grove because it's a bit wishy-washy in > places -- for example, four characters of PCDATA could be one node > or four, so it's not built on a rigid enough data model.) Close enough. I would put the same thought differently: The DOM doesn't have a formalized underlying data model, so the DOM doesn't answer the need for a solid basis on which to express the addresses of the components of XML resources. I'm hoping and believing that after the XML infoset is done we'll have a basis for implementing a powerful version of XPath (or XPointers or whatever the idea of generalized addressing of components of XML resources is being called at that time). > The nice thing about groves is that all groves, regardless of what > they are built on, have certain commonalities, such as > addressability, so you can perform certain common functions with > them. Right. All nodes in groves have the same "object model" (I'm using this term in a more formal, scientific sense than the term is used in the phrase "Document Object Model (DOM)".) The grove object model is: Groves have nodes, nodes conform to classes, and classes have named properties with value constraints. Nodes have named properties, and values for those properties. That's about it; the rest is detail. (It's pretty interesting detail.) > GroveMinder is generic grove middleware. It has plug-ins, called > Minders (I think of them as drivers), Hooray, thank you! I have sometimes called them "notation drivers" only to get the blankest stares imaginable. (I then have asked something lame, like, "Do you know what a device driver is, and why we have them?") But you obviously get the point of Minders: Minders represent plug and play support for individual notations, in a system that makes all content look alike (i.e., conform to the grove object model). > that can build groves over different property sets. For example, > there is one Minder for SGML/XML documents and a different Minder > for relational databases. Well, actually, there's probably a one-to-one correspondence between property sets and database schemas. In order to address information in terms of its structure, you have to know the structure. In grove-land, the structure is defined by a property set. Different databases have different structures, normally expressed as database schemas. Making a database look like a grove is very straightforward. The bulk of the work is translating the schema into a property set (which is, after all, a kind of schema). There's a bit of coding involved, too, but the GroveMinder developer kit has tools that make this amazingly easy. (At least the Lockheed-Martin people were amazed, and they said so publicly at XML '98.) The grove paradigm breaks down the distinction between documents (resources) and databases. Everything, in its addressable form, is a grove, and a grove is a database. But a grove is convertible into an interchangeable resource (that is, if the property set is a comprehensive expression of the syntactic features of the notation of an interchangeable resource). Obviously, a resource is also convertible into a grove, given a property set for its notation. Property sets are the bridge between the world of information interchange, and the world in which interchanged information is immediately useful (i.e., the world in which information exists after parsing and common semantic processing of interchangeable resources has been done). If the resource is *already* a database, there's probably no parsing or processing involved. All that needs to be done is to put a translating layer over it that makes the database look like a grove. Then, the database and all its contents are fully able to participate in the wider world of interchangeable information resources: they can be linked, re-used by reference, have any kind of metadata associated with them, etc. etc. > (There can actually be different property sets for a "type" of > data. For example, one property set for XML might include entities > and another might not, specifying that each entity is replaced by > its value. A different Minder is needed for each property set.) Strictly speaking, you're correct: people can disagree about the properties of, say, PostScript as a notation, or they might agree about the properties but not about what the names of the properties should be. Nothing prevents people from writing their own property sets. In fact, however, the situation is not as chaotic as your example might lead one to believe, because of "grove plans". A "grove plan" is a way of selectively deleting properties from classes, and of deleting classes altogether, as a way of avoiding the overhead of storing and/or processing those properties and classes. For example, the property set for SGML is comprehensive, but an application may not need, for example, to store nonsignificant white spaces found in the start tags of SGML elements. The application may therefore use a "grove plan" to delete the properties whose values would be those white space characters. The addresses of nodes in groves are always expressed with respect to a property set and a grove plan. If it were not so, you wouldn't know whether to count a certain node type or not, when counting nodes to get to a particular node. And it's true that, for example, some people want to count the text that was inserted via an entity reference as a distinct node, while other people don't; this kind of flexibility is needed in order to keep peace in the family, and allow people to do addressing in the way they want to do it. Property sets are modularizable, so that it's relatively easy to express commonplace grove plans, to establish conformance levels for processing systems, and to understand the rules for interpreting address expressions. A Minder that implements a property set comprehensively can optionally view groves less comprehensively, so as to be able to resolve addresses that were expressed according to lesser grove plans. There doesn't have to be a different Minder for each different grove plan. (And that's where your example might be misleading.) > One thing GroveMinder can do is store a grove in its own > database. (Note that this is separate from the database addressed by > the relational database Minder -- it has a structure designed to > store groves.) Thus, GroveMinder can store an XML document in a > database as a grove and is what I, in my article, called a content > management systems. That is, it can store and retrieve an XML > document as a document. Sounds right to me. ("...its own database" sounds a bit odd because GroveMinder can use any ODBMS for grove storage.) > Some questions: > 1) Is it possible to combine groves of different types? For example, > can I take a grove representing a table in a relational database and > stuff it into a grove for an XML document, for example as the > content of an element? I'm afraid I don't grasp the intent of this question. When such an XML document is exported from its grove as an XML document, what should the document look like? There's no need (and no way) to stuff something into something else. It is only necessary that the "content" property of the element have, as its value, the node in the database grove that represents the table. The ISO standard SGML Property Set does not allow this; only certain classes of nodes within the same grove are allowed as the value of the "content" property of "element" nodes. However, if you want to change your operative SGML Property Set so that this will be permitted, nothing (other than good sense) prevents you from doing it; the grove paradigm will readily support you in your madness. I don't know why it would be sensible to regard an RDBMS table as the content of an SGML or XML element. The normal meaning of "content" is elements, character data, and/or other SGML constructs, right there, inside the element. There is no way to write a general purpose grove-to-SGML converter unless the classes of the nodes that can appear in element content are limited and known. (We certainly don't want to dump arbitrary data into the content of an element; this would invite a situation in which the document that is ultimately exported is unparsable.) > If so, does the table grove retain its table-ness, or is it > converted to one or more XML elements? Both cases seem reasonable, > although the latter would presumably require a special converter. If > the latter case is true, then GroveMinder might also fit what I call > data transfer middleware, depending on how the conversion is done. I would suggest that an efficient way to handle this would be to convert the table into node classes that *are* permitted to appear in element content, and then make *those* nodes the value of the content property. If you do it this way, you're necessarily making the decisions that must be made about how the XML document, when exported, will reflect the table data. You're right that one application of GroveMinder is data transfer middleware. The conversion program is comparatively easy to write, since everything already conforms to the same object model. > 2) Are groves themselves relevant at a high level in a discussion of > XML and databases? It strikes me that, like SAX and the DOM, they > are a useful tool in implementing software that stores/retrieves XML > documents (or data from those documents) in a database but are not > directly relevant to the discussion itself. Instead, they are most > relevant to the user in that they are likely to weigh heavily in the > feature set exposed by a content management system or (possibly) > data transfer system. Good question. I guess that's for the person who's doing the discussing to decide. Since groves can be persistent (e.g., stored in databases), and since XML resources can become groves, it seems to me that groves are relevant. You're right, the real reason they're interesting is their impact on feature sets. But aren't feature sets (and especially tradeoffs between feature sets) what technical discussions are all about? > 3) This isn't directly related to XML/databases, but what other > common functionality do all groves have? For example, can I write an > application that navigates groves, regardless of their source (I > think the answer is yes)? Yes. We have a demonstration of that. > Can I combine groves of different types or convert painlessly -- > that is, without writing any additional code -- from one type to > another (I think the answer is no -- additional code is needed)? Probably no, but it really depends on what you mean by "code." You have to decide how instances of nodes of particular classes and in particular contexts will be mapped onto instances of nodes of particular classes in the new context, and you have to express your decisions in a formal, machine processable fashion. Right now, using GroveMinder, you can do that with a Python script, which seems about as quick, intuitive, and flexible a way to do it as any. I don't know of any transformation specification language with which a similar feat (transforming one kind of grove into another kind of grove) can be done, except possibly DSSSL (which relies on (and was written in terms of) the grove paradigm, by the way). We haven't implemented DSSSL, but it shouldn't be too hard to do that on top of GroveMinder. Would you call a DSSSL transformation specification "code"? (I guess I would.) > Can I hyperlink from one grove to another (I think the answer is > yes)? Yes. The interesting thing here is that traversal can be initiated from any node in any grove, on account of a link in any grove, and traversal can be made to any node in any grove. Neither the traversal initiation point, nor the traversal target, has to be a linking construct. Neither has to "know" anything about the fact that they are actually anchors. > And so on. I'll provide you with a copy of the GroveMinder demo, if you like. There are lots of playful possibilities. Some people have even written their own HyTime documents to use with the demo software. It's a challenge for puzzle lovers, because the demo does not report errors in documents. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@t... http://www.techno.com ftp.techno.com voice: +1 972 231 4098 fax +1 972 994 0087 pager (150 characters max): srn-page@t... 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|