[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Opinions requested
Thank you, Walter for the erudite response. I am left in a bit of quandary as to how or even whether to respond. This is in large part due to the fact that, while your post was in response to mine, it is not immediately clear to me whether you are addressing my comments specifically or rather the general theme of this thread. Having the vague impression (though no firm conviction) that it is in response to my claims that you waxed eloquent on the theme of what defines an XML database, I will proceed to provide commentary, and occasionally direct response/rebuttal, to a smattering of your points. My humble apologies, Walter, if I have in any way misconstrued your post. On Fri, Mar 05, 1999 at 02:22:51AM -0500, W. E. Perry wrote: > Marcelo Cantos wrote: > > > "Jeffrey E. Sussna" wrote: > > > > > There is not (AFAIK) yet any such thing as an XDBMS > > > > I am continually surprised to hear remarks such as this. SIM _is_ > > an XDBMS (it is also an SGML, MARC, RTF, etc. database with > > structure and full content query capabilities). As an XDBMS it > > has weaknesses (it only supports predefined indexes and limited > > structure querying), but in some ways provides a model that is > > even richer than XML (it provides structure below element level, > > and has the concept of fields > > In addition to this vision of an XML database, there has been much > discussion of XML as a front end or a query-and-response framework > for data stores, but I would argue that such applications of XML > markup are not an XML database. A true XML database is shaped by the > essential characteristics of XML itself: it should be freely > eXtensible; it should be defined and manipulated by Markup; and it > should be cast in a Document Structure within which Elements > identify Data Constructs, and Attributes provide Data > Characterization. It seems here that I may have provided an incorrect characterisation of what we do, and hence given Walter cause to provide some qualifiers on anyone wishing to define themselves as an XML database. On this point, I must make it quite clear that SIM is _not_ an XML front end to a data store. It is an XML (etc.) document repository. One additional, crucial point is that SIM _is_ extensible (though I will qualify this presently). It can be defined to accept markup to any degree of strictness or laxity (within the bounds of well-formedness or validity, of course). It can be setup to accept any and all markup and do _something_ intelligent with it. It can also be configured to make stringent demands (well in excess of the DTD, both with respect to strictness and complexity of constraints) of its inputs. This quality of SIM renders the product amenable to both of the major application streams of XML: data and documents. It can provide strict data validation as well as extensibility. Now, by way of qualification, SIM does not provide free-form runtime extensibility (runtime from the administrator's perspective, not ours). Rather it provides the application developer with the requisite tools to define, at design time, what structures will be supported. For instance, you cannot, with SIM, perform queries such as, "find me all sections containing subsections with an attribute of security="public" and at least one paragraph with fewer than four words in it" The semantic complexity of such a query is beyond the scope of our product. However, if one were to know in advance that queries about the minimum paragraph length in public subsections will be commonplace in the particular application one is developing, then SIM could, at design time, be told to create an appropriate index and then the above query could, indeed, be performed. In short, SIM _is_ extensible, but the extensibility is bound somewhat earlier than runtime. In practice, clients never complain about this quality. In fact, it is usually a benefit rather than a hindrance, for the same reason that compile time type checking is a good thing to have in a programming language. I also take issue with Walter's remark that an XML database should be manipulated by and defined through the medium of XML. This sounds analogous to suggesting that relational databases should be defined and manipulated by markup. Now, it is true that relational schema are, themselves, typically stored as relations (one will, for example, find a ".TABLES" table, a ".FIELDS" table, a ".INDEXES" table, etc. inside a database). However, it seems to me patently absurd to suggest that SQL (whether DML or DDL) be expressed in terms of tuples and relations. Now, while it does not seem likewise absurd to suggest that XML queries and data definition constructs be defined as XML, the truth of such a suggestion is anything but self-evident. Why should one not use an SQL-like language to define and query XML databases? There may or may not be merit in such an approach, but it seems no more or less appropriate than a query/data definition language cast in XML. Indeed, many of the query language position papers at W3C do not use XML syntax. Data definition and query languages are meta-constructs. They are not part of the data, but rather operate on the data and structures. This suggests that while it may be possible to fold the system in on itself by expressing meta-structure as data, it would be unwise to proceed down this path in _a priori_ fashion (Now, have I completely missed Walter's point here? I'm not sure.) > Like XML itself, the XML database is fundamentally mismatched to the > familiar storage and transmission frameworks of filesystem, > relational table, object serialization or data stream. In the first > case, any item--document, data table, or executable--whether 'text' > or binary--which is committed to storage in a filesystem is treated > as a file: that is, as unitary and indivisible within the > perspective and capabilities of the filesystem. A word processing > program may, by opening a document, be able to identify and to > manipulate as individual elements the sentences, paragraphs and > chapters of that document. By contrast, the filesystem in which > that document is stored reads, writes, renames, searches for or > deletes the document as a whole. In XML terms, the filesystem sees > the document as a single element--a root. Regardless of how many > subelements we might mark up within that <root>, the > filesystem--designed for a generic 'file-like' document, is capable > of manipulating only one. One must be careful, here, to discriminate between interfaces and implementations. I basically agree with all of Walter's points in the above paragraph, but would add that many systems store conceptual XML documents as files. Our system uses a highly tuned variable length record manager (unsurprisingly named the VLRM) to store documents and fragments of any size in a highly efficient manner (both in terms of size and speed). Consequently, we store entire documents for the most part. If parsing time starts to weigh heavily due to retrieval of excessively large documents (the entire Australian Tax Legislation, say, or a complete Boeing Aircraft Maintanence Manual), then we fragment the documents to a level where parsing is no longer a bottleneck. In all of this, however, SIM can always treat the XML as XML. The developer always sees trees, not files, or BLOB's. It doesn't matter how it is stored in the background, that is an implementation issue. The one caveat with our product is that fragmented documents cannot be treated as a conceptual whole without physically rejoining the parts. This is one thing which OODBMS's do better than us present, though we are looking at ways to provide that additional level of abstraction (we are also considering the usefulness of doing so, since fragments are more commonly the unit of interest, rather than the entire document). > In the terms of both filesystem and relational table, an XML > document is effectively a BLOB, in that its specifically XML > structure is outside the ability of either to discern or to make any > use of. Just as, for example, with audio or video content more > commonly recognized as BLOBs, the filesystem or relational database > engine is obliged to invoke a particular, content-specific processor > in order to understand, and then to implement, the structure > conveyed by markup in every XML document. Yet this need for > pre-defined, content-specific handlers obviates the benefits of XML > as a general solution. Indeed, it is not really XML at all if the > markup possibilities are circumscribed by the need to conform to > what a pre-defined handler can implement. I disagree with the last sentence above. Not from the pedagogical perspective (which seems quite evident in Walter's prose, and with which I largely sympathise), but from the pragmatic perspective. Yes, the purist will rightly decry the notion of predefinition of structure in an ostensibly XML-friendly environment, but the end-user comes along and not only accepts, but vociferously demands that his environment be constrained. The user doesn't want flexibility to store anything, she wants the flexibility only to store what she wants to store. The serious user of XML does not have a heterogeneous collection of vaguely defined documents with a motley crew of DTD's and well-formed markup. Most users have a well defined data set for which they want to define efficient structures for storage and retrieval (if they aren't interested in efficiency then their problem isn't particularly interesting -- any tool will do). In the few cases where they do have arbitrary structure to deal with, more often than not they are only interested in the content and are likely to throw the structure away. After all, what is the use of structure if you don't know, say, whether the prolog element contains an abstract element, or whether "date" attributes refer to creation time, last modification time, or effectivity (or, worse still, whether they are in U.S., Australian or international format)? In the real world, I suspect that cases where structure is arbitrary but important will be few and far between. This is borne out by the almost complete absense of demand for arbitrary structure querying capability from our clients or potential clients. It just never seems to be an issue. A qualifier is also in order for the above remarks, lest there be a misunderstanding. XML tools, in general, must be extensible and accept any and all valid and/or well-formed inputs. My comments specifically address the issue of repositories (DBMS's). XML may be extensible, but it, too, expresses the notion of constraint through the concept of DTD's. Databases, likewise, not only can, but should constraint the inputs, both for simplicity and efficiency. Perhaps this is, after all, what Walter meant when repudiating the idea of predefined handlers. Cheers, Marcelo -- http://www.simdb.com/~marcelo/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|