[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Syntactic and conceptual schemas
I received the following questions privately. However, I think the answers (vague as they are) might be interesting to the list. Uwe Speck wrote: > In the W3C Note of the XML-Data Schema is written that there are > two types of Schema: syntactic and conceptual. > > URL: http://www.w3.org/TR/1998/NOTE-XML-data-0105/ > > "Schemas define the characteristics of classes of objects. This paper > describes an XML vocabulary for schemas, that is, for defining and > documenting object classes. It can be used for classes which as > strictly syntactic (for example, XML) or those which indicate concepts > and relations among concepts (as used in relational databases, KR graphs > and RDF). > *** The former are called "syntactic schemas;" the latter > "conceptual schemas." ***" > > The terms syntantactic and conceptual are NOT used in the W3C Schema > Note Part 1 Structures, but there seems to be the same intention: > > URL: http://www.w3.org/1999/05/06-xmlschema-1/ > > Par. 2.2 On schemas, constraints and contributions > > "XML Schema: Structures not only reconstructs the DTD constraints of XML 1.0 > using XML instance syntax, > > *** it also adds the ability to define new kinds of constraints. *** > > For example, although the author of an XML 1.0 DTD may declare > an element type as containing character data, elements, or mixed content, > there is no mechanism with which to constrain the contents of elements > to only character data of a particular form, such as only integers in a > specified range." ... > > So my questions: Good (and hard) questions. I'll do my best to answer them, but there's no guarantee my answers are completely correct. > 1) From my point of view, the paragraphs written above mean > exactly the same, but use different words. Is that true? I don't think so, but I'm not really sure. (To give you an idea of my confusion, I started out saying "maybe", then said "probably", then went back to "maybe", and ended up with "I don't think so".) Background ---------- In database theory, there are two different schemas -- the physical schema and the conceptual schema. The physical schema states how data is actually stored on the disk. The conceptual schema states how data appears to be organized from the user's point of view. For example, in a relational database, the conceptual schema declares how data is organized into tables and columns, what the data types of each column are, what the primary key / foreign key relationships between the tables are, and so on. The query engine accepts commands that use the conceptual schema. For example, in a relational database, the query engine accepts SQL statements to select, insert, update, and delete data and so on. The query engine then submits requests to the storage engine, which knows the physical schema. For example, the query engine might ask the storage engine for all the rows in the table named ABC; the storage engine looks at the physical schema and retrieves the data. Notice that the storage engine is translating here between the conceptual schema (which uses the concept of table) and the physical schema (which describes how data is stored on disk). Note also that the physical schema can be (and usually is) completely different from the conceptual schema. For example, it would be perfectly legal to store all data on disk as a sequence of strings: [table name][column name][row number][data value] where row number and data value are stored in string form. To find all data for a table, the storage engine could search through all of the data and return only the data for the requested table, which it would change to the data type of the column according to the conceptual schema. Of course, this would be horribly inefficient for a large database, but it shows you how much the conceptual schema and physical schema can differ. XML-Data -------- I believe the authors of XML-Data think of an XML document as physical storage and the XML specification as defining rules for this storage. That is, it has rules for where markup, white space, character data, and so on can appear. In this context, the DTD language is a language for defining physical schemas. In other words, it states what attributes belong to a given element type, the content models of element types, and so on. (To use the terms used by the XML-Data authors, XML defines the syntax for a class of languages -- that is, the legal structure of strings in the language. The DTD language is used to define the syntax for a particular XML language. In other words, a DTD is a "syntactic schema".) Thus, any schema information that does not directly affect physical storage is "conceptual" and needs to be interpreted by a layer above the storage engine (XML processor/parser). A good example of this is data types. All data in an XML document is stored as a string. Therefore, stating that the data type of a given element or attribute is integer is a "conceptual" operation, since conversion to/from strings and type checking is not performed at the storage (XML processor/parser) level, but at a higher level. Similarly, such things as the <foreignKey> element type in XML-Data are conceptual constraints -- that is, constraints that must be enforced at a level higher than the storage engine. W3C XML Schemas --------------- On the other hand, I don't know if the authors of the W3C's XML Schemas see a difference between the constraints imposed by a DTD and other constraints. It appears that they view the constraints that can be written in the DTD language as a subset of the possible constraints. That is, I think that they think there are a large number of possible constraints (content models, lists of legal attributes, element type inheritance, data types, and so on), and that the DTD language supports some constraints; XML Schemas supports more of these constraints. Discussion ----------- One of the problems with XML is that the boundaries between physical and conceptual are not always clear. Technically, the specification only defines physical layout -- that is, the syntax of a legal XML document. Unfortunately, when people start to think about XML, they immediately start to think in conceptual terms: * Programmers usually think of an object model in which element types roughly correspond to classes and attributes correspond to properties of these classes. * Document authors usually think of a document model in which the physical layout corresponds to the conceptual model they have in their head (a book has a title, one or more authors, and one or more chapters; a chapter has a title and one or more sections; and so on). In both cases, an XML schema language (including the DTD language) can be viewed as a conceptual schema language as well as a physical schema language. For example, element types define physical structures (tags and legal children) but also can be used to define classes (in the programmer's case) or document parts such as chapters (in the document author's case). Similarly, archetypes in the W3C's XML Schemas can be thought of as a convenient shorthand (similar to parameter entities) for defining element types, but can also be used to define object inheritance. (The situation is further complicated by things like entities. In DDML, we thought of entities as physical constructs and element types, attributes, and notations as logical constructs. The reason for this was that processors are not required to inform applications of entity usage. Because we were only interested in logical (conceptual) constructs, DDML did not support entity definition. Thus, DDML was primarily a conceptual language. (Note that the other schema languages do support entity definition, although there has been strong support for removing these from the W3C's XML Schemas.)) I think that one reason for the physical/conceptual duality of schema languages (including the DTD language) is that XML only defines physical layout and people, who generally think in conceptual terms, want to express those concepts. Thus, they impose concepts on DTDs and schema languages, even when those languages were designed to express physical schema. An Answer (Finally) ------------------- So, to answer your question, I don't think that these two paragraphs are saying the same thing. On the other hand, I don't think they contradict each other. Instead, I think they are viewing the same question from two different angles. XML-Data's separation of syntactic schemas and conceptual schemas is useful because it makes very clear what XML can do and what it can't do. It also makes clear the responsibilities of the processor (processing syntactic schemas) and the application (processing conceptual schemas). On the other hand, the separation is not entirely relevant to application writers and document authors. The reason for this is that these people use the "syntactic" parts of schema languages to express concepts as well as physical layout. This seems to be the view taken by W3C's XML Schemas. I hope this helps to clarify, if not completely answer, your question > 2) Can we say, that the goal of *** every *** XML-Schema > language is, to support additional constraints compared to DTDs > - means every XML-Schema supports something like a > *** conceptual *** schema-principle! Or are the ** conceptual *** > Schema of XML-Data something extraordinary? Yes, I think you can say this. The only difference between XML-Data and the other schema languages is in this regard is that XML-Data explicit states what parts of their language apply at the XML document level and what parts apply at a higher level. -- Ron Bourret xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|