Reusing schema vocabularies
REUSING SCHEMA VOCABULARIES: THINKING OUT LOUD ================================================ INTRODUCTION ------------ I'm still struggling with trying to figure out why namespaces are needed, exactly what they achieve, what the cost is and why they are the current preferred solution of the XML WG. This is an attempt to clarify my thoughts by writing them down and hopefully having others inspect (and correct) them. Note that this is not a well-written and polished paper, just a sort of 'textual dump' of my thoughts. The headings are there to organize the dump somewhat and make it easier to read. I'm probably also a little bit too optimistic about architectures, but I haven't got the time to modify those parts now. I will use the term 'DTD' when I refer to XML schemas as they are defined in XML 1.0, and will use 'schema' to mean 'a DTDs or a schema in any XML schema language'. I use 'XML 1.1' to refer to 'XML 1.0 as extended by the namespace WD'. The namespace WD seems motivated by the need to be able to define different schema vocabularies in a single document, or to be less general: the need to be able to reuse element and attribute names from different DTDs in a single document. As far as I can see there are currently two ways to achieve this: namespaces and architectures. I'll try to list the advantages and disadvantages of each to see if I can understand why the WG has chosen what it has. EFFECTS ON OTHER STANDARDS -------------------------- NAMESPACES Namespaces, while superficially simple, are really a profound change to the XML data model: one of the most basic concepts (the concept 'name') is changed from a string to a namespace identifier _and_ a string. The reuse of schema vocabularies is enabled by this modified concept of names, allowing processing software to pick out names belonging to a specific schema/namespace and operate on them. This is incompatible with the use of names in XML 1.0, which means that validation and attribute defaulting no longer work as before. In other words: both validating and non-validating parsers are affected, but only in the interpretation of the names used in DTDs. (XML 1.0 documents will work with XML 1.1 parsers, but not vice versa for namespace-using documents.) To allow validation and attribute defaulting in XML 1.1 the schema syntax will have to change, whether the new syntax is a modified DTD syntax or some entirely new schema language. This means that XML 1.1 documents that use namespaces will not be valid SGML documents. In XML 1.1 it is conceivable that different schemas can be combined without needing to be rewritten. With the current DTD syntax this will require a liberal use of ANY content models, which very much weakens the benefits of validation and structured editors. It is conceivable that a schema language with features for the extension of the content model of elements from reused schemas. No such schema language is available at present. This also means that to support XML 1.1 parsers must be modified, as must the DOM and SAX, since they depend on the concept of names, which has changed. (DOM getElementByTag name should be namespace-aware, for instance.) XSL and CSS2 will also have to take XML 1.1 into account if they are to allow stylesheets written for one schema to be used with a schema that incorporates the first schema. (XSL patterns must then support the new names.) XPointer will not need to be modified, since XPointers are designed to be tailor-written to the document they address into. Any XML query language will have to be designed for XML 1.1 (which includes XPointer if XPointer is used as a query language, as it can be). [XLink?] A last problem with namespaces is less technical and more practical: namespace names are awkward to work with, since they have a complex syntax and must be long. This means that all XML applications that rely on namespaces will be awkward where names are concerned, which is almost everywhere. XML ARCHITECTURES XML architectures are superficially complex, but require no changes to the XML data model. They enable the reuse of schema vocabularies by remapping names from the original document to a new 'virtual' document, the architectural document. This means that XML architectures can be layered on top of current parsers (as XAF and xmlarch.py do), and furthermore that they require no changes to XML 1.0. This means that SGML compatibility is retained. Furthermore, it means that DOM, SAX, XSL, CSS2 and possible query languages will not have to take architectures into account (beyond allowing users to declare the architecture they wish XSL/CSS2/queries to apply to), since they operate as before, but on an architectural document instead of the original one. In short, XML architectures do not affect any of the standards currently in use or under design. (As will be seen later the architecture syntax may have to change, but the effects of this change are very likely minor.) XML architectures do require schemas reused in compound schemas to be rewritten. MEETING THE NOTE-WEBARCH-EXTLANG REQUIREMENTS --------------------------------------------- Requirement #1: "It must be possible to introduce a new vocabulary in part of a document in a way that requires changes only locally within the document." Namespaces meet this requirement by allowing new vocabularies to be introduced on each element. XML architectures as defined in ISO 10744:1997 A.3 do not meet this requirement. The interesting question is of course: can they be modified to do so? As far as I can see, the answer must be yes. One way to do it might be to allow the declaring PI to appear anywhere in a document, but only to have scope from its declaration until an ending PI is met. Architecture scopes must properly nest within each other (and within elements). This modified version of XML architectures meets the two first cases listed in the motivation for requirement #1 in Note-webarch-extlang, but not the third. However, the third is not met by namespaces either and can only be met by a change to the XML 1.0 grammar. Given such a change, both architectures and namespaces would meet the third case. Requirement #2: "The syntax must unambiguously associate an identifier in a document with the related schema without requiring inspection of that or another schema." By using URIs as namespace identifiers namespaces meet this requirement. XML architectures do not meet this requirement as they stand, since the names of two architectures may clash. The modification suggested above enables XML architectures to meet this requirement just as well as namespaces do. Namespace names may not collide in the namespace documents, but prefixes may. If prefixes collide the inner prefix shadows the outer one. Prefix collisions do not concern applications, since they use namespace names to identify elements and attributes. XML architecture names may also collide, but can be specified to shadow one another as with prefixes. To enable the unique identification of architectures (even in the case of collisions) architecture declaration PIs can be extended with a namespace attribute that contain an identifying URI. Requirement #3: "It should be possible to create an original document schema such that one can determine, without access to the extension schema, which uses of extensions to that document can be ignored." I do not understand this requirement and so cannot comment on it. SUMMARY ------- >From this discussion I emerge believing that XML architectures are a superior solution to the problem of reusing schema vocabularies. They have far less impact on the XML family of standards than namespaces do and do not require XML to be modified or that SGML compatibility be forsaken for documents that reuse schemas. The nesting of namespaces is slightly more natural than that of architectures, but since this nesting is only designed for automatically generated documents (and since heavily nested namespaces are more or less unreadable for humans anyway) this does not really matter. The data model of XML architecures is also much simpler than that of namespaces, and XML architectures provide far better control over the data model presented to processors designed for the original schemas. --Lars M. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format