[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Access Languages are Tied to Schemas
At 09:26 AM 11/20/97 -0500, Joe Lapp wrote: >I have been searching for the properties that a repository access >language must have. Here I present an argument for why an access >language must be tied to a repository's architecture in the manner >analogous to how SQL and OQL are tied to database schemas. Ideally, the logical model exposed by an SGML repository should be the structure of the document itself, not the implementation details used for a particular repository architecture. An SGML DTD defines structures in the same way that the table declarations do for SQL, and in the same way that the class declarations do for object databases that use OQL. This is in keeping with the fundamental idea behind object persistence in object oriented databases: if you use an object oriented database with C++, your C++ class declarations are your schema. In the same way, if you use a repository with SGML or XML, the logical model is declared by the DTD. >A client must know how to talk to >the repository in order to get the repository to do anything. >We'll call the language that the client must speak the "access >language." The client uses this language to submit requests and >to understand responses. The server uses this language to make >sense of requests and to submit responses. Both the client and >the repository must house knowledge of this access language. If we're talking traditional databases, that means that both sides must know SQL, or both sides must know OQL, or whatever. Since we are talking SGML or XML repositories, that means that both sides must know SGML or both sides must know XML. >The access language must convey information in two directions. In >order for the information to be comprehensible, it must be conveyed >in recognizable units. Both the client and the repository must >know how to generate and parse these units. Hence, a standard must >exist to which both sides conform. This standard says what kind of >information units there are and what they look like. For an SGML repository, these recognizable units are SGML elements. Of course, for any particular SGML application, there would also be a DTD that defines the schema for the applications, and the clients may well have knowledge of this schema. The server might not need to have this knowledge in some cases, as long as it knows how to manage SGML in general. And there may be some clients that do not need this knowledge, either - e.g. a general purpose querying and browsing client should be written to work for any DTD, as should a formatting and printing engine, etc. In order to make general-purpose clients possible, clients must have some way of asking the repository for the schema - either the DTD schema or the structure of a particular document. >Information units usually have relationships with one another. A >client often cares about accessing units that have a particular >relationship with some other unit. For example, a client might >care to retrieve all liens on a particular property. The access >language must allow a client to select units according to their >relationships with other units. In particular, a client must be >able to identify the relationships of concern. The relationships among objects often express much of the semantics of any system - "it's not what you know, it's who you know". SGML/XML has two kinds of relationships: containment and links. Queries should be able to handle both. This has proven invaluable in OQL and SQL-3. >We find we >also need a standard that says what kinds of relationships there >are and what kinds of information units participate in them. But this can be quite general, e.g. the definition of SGML/XML. Again, this is analogous to using C++ or Java to define schemas in object oriented databases. >It seems that the standard has quite a bit to say. It says what >kinds of information units there are, what kinds of information >they contain, what kinds of relationships there are, and what >information units participate in those relationships. What we >have is an object model. An object model of the kind you discuss here seems like the object model of a particular application. >Moreover, in the spirit of object-oriented design, each >side should harbor some representation of this model. That is, >both sides have components that share a common architecture. In the spirit of object oriented systems, metadata is the way one system finds out about another system, unless they belong to the same application, in which case they share class declarations. The same should hold for SGML/XML repositories: programs that are part of the same application may have knowledge of the DTD, but metadata is the way to write general purpose programs, and writing general purpose software as much as possible is usually a big win. >We normally think of impedance mismatch as occurring >between an object-oriented application and a relational database, >but it can also occur between two object-oriented applications. >One organization may decide that liens are not useful entities in >themselves and so bottle them up with their associated properties >(i.e. properties would be aggregates containing liens, and liens >would not be classes of the schema). Another organization may >want to store liens separately so that they can select all liens >that meet a given criterion (i.e. properties would be associated >with liens, and liens would be classes of the schema). When the >second organization decides to hook its client up to the first >organization's database, the client can neither select among >liens nor properly interpret property objects. That depends, of course, on how the programs function. As long as I have access, I can log into anybody's database, browse it, formulate queries to find information, etc., because I use a general-purpose browsing and query facility. If I have programs dependent on the classes defined in a particular schema, then my programs do need to know the schema, e.g. the DTD. One of the great advantages of architectural forms is that they make it possible to write programs that work only on an agreed-upon abstract representation of the schema, and each individual organization can build on that abstraction to build documents that meet their own needs. This is a real strength of the HL7 Kona proposal for medical record attachments, which would allow parties to interchange information based on a set of well-defined architectural forms, yet allow freedom for each party to implement their own DTDs based on these architectural forms in order to accomodate their own needs. This is, of course, analogous to the "design patterns" approach of object oriented design, which strongly encourages writing programs that use the abstract base classes which define the interfaces rather than write programs that use the concrete classes that implement them. Jonathan ________________________________ Jonathan Robie Email: jonathan@t... Texcel Research, Inc. ("http://www.texcel.no") xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|