escaping QName interlopers
[Warning: half-baked Sunday AM musings follow. There's definitely something here, but what it is and whether it could sweep the world is another set of questions.] QNames appear to have emerged as a means of combining existing markup practice with the W3C's fondness for URIs as the defining identifier for the Web. Nearly every aspect of QNames has been questioned in one form or another on this list and elsewhere, but there are still probably some pieces we haven't explored, some of which may in fact be interesting. It struck me this morning that many of the problems with QNames aren't exactly the fault of QNames. QNames are kind of the result of a collision between a URI truck and the Name compact car where we end up driving the truck from the driver's seat of the car. The most obvious fault line is the inability to express QNames as URIs. For example, it some contexts, it might be very convenient to be able to describe this spec element: <piece xmlns="http://simonstl.com/ns/vellum" /> as: http://simonstl.com/ns/vellum#piece Unfortunately, there are a number of problems with that approach. That works, but what do I do with: <piece xmlns="http://simonstl.com/ns#vellum" /> Creating http://simonstl.com/ns#vellum#piece is an even uglier collision. While namespaces that contain fragment identifiers are apparently rare, creations primarily of the W3C, they do exist and are legal, and therefore offer a barrier to developers who want to describe their vocabularies using URI-based rather than QName-based mechanisms. A query string approach is another option, but it gets even wilder with namespace URIs that use fragment identifiers: http://simonstl.com/ns?name=piece#vellum Moving beyond the intricacies of URI syntax, there are other problems with this kind of approach. RDDL documents are presently designed to provide resources about vocabularies as a set of names, with schemas and other such niceties. It's not clear that even my friendliest case of "http://simonstl.com/ns/vellum#piece" is particularly compatible with that approach, though perhaps RDDL could be extended that way. Given all of these problems, why on earth would we want to be able to treat element and attribute names as URIs? The simplest reason for doing so is removing the mismatch between QName processing, which is now context dependent in an ever-growing number of ways, and URI processing. For better or worse, URIs are less context-dependent (or can be made so through absolutization) than QNames. This would let me get rid of a lot of annoying code and spend less time thinking about issues like QNames in attributes. A more interesting reason for doing this builds on the ever-growing use of namespaces to mix and match vocabularies and the recurring problems of modularization. Pulling pieces of out of various schemas or DTDs and reassembling them to fit given projects is a nuisance because schemas and DTDs also describe sets of resources rather than individual components. While the individual components do get described in the end, they are described for use in a particular context. There are few mechanisms for describing which attributes of a given element, for instance, are crucial to its use, and which may be safely pruned when it is reused elsewhere. There are widely-distributed mechanisms which are in fact designed to answer these kinds of relationship questions, though I've admitted in the past that I'm not particularly fond of them and don't find them particularly accessible. RDF and its surrounding toolkits, however, do an excellent job of describing relationships between resources, at least when those resources can be identified as URIs. The current namespace/QName approach only applies a URI to the namespace, making it difficult to apply RDF to smaller pieces. It struck me this morning that there's a way to apply URIs to individual elements and attributes, though it requires an application of namespaces that varies pretty dramatically from the typical style, and effectively creates something like a DOCTYPE. I'll start with a simple example: <piece xmlns="http://simonstl.com/ns/vellum" > <connections> <traverse> <from href="http://www.w3.org/TR/REC-xml#sec-common-syn" /> <to href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" /> </traverse> <traverse> <from href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" /> <to href="http://www.w3.org/TR/REC-xml#sec-common-syn" /> </traverse> </connections> </piece> Using the approach I've been pondering, this could turn into something like: <piece:x xmlns:piece="http://simonstl.com/ns/vellum/piece" xmlns:connections="http://simonstl.com/ns/vellum/connections" xmlns:traverse="http://simonstl.com/ns/vellum/traverse" xmlns:from="http://simonstl.com/ns/vellum/from" xmlns:to="http://simonstl.com/ns/vellum/to" xmlns:href="http://simonstl.com/ns/vellum/href" > <connections:x> <traverse:x> <from:x href:x="http://www.w3.org/TR/REC-xml#sec-common-syn" /> <to:x href:x="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" /> </traverse:x> <traverse:x> <from:x href:x="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" /> <to:x href:x="http://www.w3.org/TR/REC-xml#sec-common-syn" /> </traverse> </connections> </piece> The second form defines one namespace per element and attribute name, creating a direct mapping between those names and a URI. It then uses a blank local name for all the elements and attributes, since every element and attribute already has a unique identifier. Each element can now have its own space defining different levels of processing, mixability, etc., and modularization approaches can reference these descriptions directly rather than having to harvest information from tangled schemas. It also makes it simpler for modularization approaches to define their own sets of relationships between these components, overriding the claims made by the creators of the original markup if they so choose. There are lots of problems with this, of course. The root element becomes pretty hefty, and there's no easy way to slap that content in an entity since entities can't be containers. The namespace declarations could be distributed through the document, though that has the amusing side effect of moving an element's real name to its attributes and the real names of attributes to sibling attributes. Still, I think there's something here worth thinking about. This approach seems to bind XML much more tightly to the Web architecture, exposing more information to Web-oriented tools and potentially removing the layers of obfuscation that grow when namespace-mixing becomes commonplace. Will it happen? I'm not counting on it. Is it worth a few minutes of thought? I think it is. -- Simon St.Laurent Ring around the content, a pocket full of brackets Errors, errors, all fall down! http://simonstl.com -- http://monasticxml.org
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format