[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] giving xml a colonic: re-opening (or re-opining on) an old argument
Heyo. Did a release this week (care to look at www.genxdm.org?), so now I'm stirring up trouble. :-) So, the namespaces in XML discussion. Yes, yes, I'm going to flog that horse again. It's not dead *enough*, you see. Or perhaps I'm setting myself up as a Miracle Max .... The Namespaces in XML spec is an ongoing problem for XML adoption (and even now, it's worth talking about adoption; part of the reason that further adoption has stalled is because of the problems with XML Namespaces). Problem: complexity. Here I want to talk about code complexity. To handle namespaces in XML, you have to keep a "namespace context" hanging around. That's fine for the processor, but it's a real problem for applications, which don't have a good way to access the processor's internals (and shouldn't, either). Changes to what can be done with namespace-prefix mapping make this even more crufty; depending upon which version of XML you're supporting, you can undefine the default prefix mapping, or all prefix mappings. You've got to have a new-on-change context for every element, but not change the context if the element doesn't introduce a change. Even simple documents are weighted down with namespace concrete shoes. Problem: failure of intuition. Have you ever tried to explain to someone why the element <element name="name" ... /> is the same as the element <xs:element name="name" ... />? And why the attributes are the same? And why the attributes in <type name="name" /> <xs:type name="name" /> are the same as each other, but not the same as the ones in element? The most graceful explanation is to tell folks that elements define a scope for the naming of their attributes, which therefore don't require prefixes, but it always requires explanation. Ever noted that qnames in content refer to elements, never (hardly ever!) to attributes? Problem: resistance to change. Despite the problems with namespaces, the code is widespread. It won't be thrown away. And that's the primary reason to *not* write this email (or for you to stop reading; you really should, you know): the current status quo, however quirky, is widely-distributed (and if namespaces in XML aren't properly supported in your favorite language/library, then you probably don't care about the issues that the spec was designed to resolve). But ... namespaces are important. The introduction of namespaces provides potentially significant power: you can *combine vocabularies* (or call them schemas or DTDs or whatever). Even browers *could* make use of this, potentially, mapping namespaces to plugins (except that the XML implementation is so incredibly nasty that no browser actually seems to do this). More importantly, perhaps, namespaces provide a means of distribution of authority for authoring schemas for different application areas or industries, which are useful even in mono-namespaced documents. Instead of W3C taking all the "good names" in the global XML namespace, everybody can define their own, and use the names that they think good inside it. So ... what is a namespace? Well, according to the Namespaces in XML specification, a namespace "is a URI." That's utter horseshit, but it's what the spec says. It doesn't actually have any of the characteristics of a URI, apart from syntax, but the spec really, really wants to pretend that a URI is what it is. So, looking at the rest of the specification, what is it *really*? Well, it's actually got two potentially different definitions: from the point of view of a namespace creator/definer, and from the point of view of a namespace user. From the point of view of a namespace creator, a namespace is a (reasonably) unique identifier with a low cost of entry, and distributed authority. Now, the writers of the URI specification(s) envisioned, and the writers of particular URI scheme specifications detailed, a means of leveraging the distributed authority of the domain name system for creating unique identifiers. DNS alone is inadequate, unless one assumes that a domain will create only a single namespace ... but the power (and corresponding complexity) of URIs is probably overkill. You need distributed authority, and a way of distinguishing multiple work products within the aegis of a single recipient of authority. That provides a reasonably stable namespace name. From the point of view of an end-user, a namespace isn't a URI at all. It's compared for string equality. A namespace is a label, a string, an array of characters. Equality is all that matters. Any single-character variation means a different namespace. We'll come back to this, but let's move a little further. Namespaces are of interest because they enable XML vocabularies that mix multiple namespaces. Key examples include XML Schema, XPath, XSLT (and XPath2, XSLT2, XQuery). The key abstraction in vocabulary merging seems to be the QName. What, exactly, is a QName? Well, it's defined to be a Qualified Name, but ... as with namespaces, that's horseshit in the well water. Whatever it *really* is, "qualified" is prolly not the ideal descriptor. Syntactically, a QName is a combination of NCName:NCName. That's a no-colon-name, a colon, and a no-colon-name. An obvious appellation would be "colon name", but that's apt to give rise to unpleasant jokes (not to mention subject lines), so should be avoided. It can be described as an abbreviated name. The combination of the expansion of the prefix (via mapping) with the local part of the name generates a 'complete' name. Interestingly, though, the prefix (and even the colon) can be missing, and still <element> is not the same as <element>. That's the problem with QNames: they have extremely *poor* locality. Worse, they extend that poor locality from themselves to every NCName in XML. You can't know the name of an XML element without looking up its ancestor axis to generate the namespace context. So, while Namespaces in XML allows you to embed foreign vocabularies, it makes it a real challenge to *extract* namespace-well-formed fragments from multi-namespace documents. Now, if you're reading xml-dev (and if you've continued this far, which was a mistake that you should certainly correct as soon as possible; I recommend beer, wine, or the distilled beverage of your choice), you've seen all this rehearsed before, I know. And you've seen proposed solutions. I'm not proposing much of a solution; I'm proposing something more on the order of a profile, for schemas and instances, that leaves the existing code infrastructure alone (for the time being). How do we make namespace in XML less egregious? Well, first principle: avoid QNames. They demonstrate poor locality, which makes processing portions of a document challenging. So ... where are they used? They're used in content, typically as references (they compete with IDs and keys as references, mind, but they're also used to describe 'classes' or 'categories' of things). This is anathema, frankly. Any XML dialect that is using QNames in content (which unfortunately includes the most-adopted XML technologies: Schema, XPath, XSLT, XQuery) is broken. It's broken because it fundamentally breaks layering: namespaces (and consequently QNames) are in the XML processor layer, while references are implicitly in the application layer. Exposing namespaces to the application layer universally means that every application, even those that don't care about namespaces, has to cope. But namespace handling is unintuitive, and complex. Ugh. QNames are also used as attribute names. This is, in fact, a necessary use case, that cannot be worked around. There's an XML namespace, with attributes in it; every processor can handle those. In order to put foreign attributes on an element, you have to prefix them (it's the only way to avoid the potential for name clashes). Foreign attributes have to be distinguished from native ones; prefixes are the current solution. QNames are used for elements. This is simply unnecessary (note: there's a conflict between namespaces-in-content and namespaced-elements for Schema and XSLT, at least, but that's a namespaces-in-context problem, in my opinion). Some proposed best practices for a namespaces-light set of XML schemas and instances: no QNames in content. No prefixed elements (change the mapping of the default prefix: xmlns=). When foreign attributes are used, define the prefix mapping in the same element (xmlns:prefix= wherever prefix:name= appears). But, if we've lost QNames in content, what do we do for references? Next principle: if an application needs reference semantics, let it define them itself. The application can then ensure that it never acquires a chunk of XML that lacks the necessary context for resolution. A simple implementation of this is to use "expanded" (or "jc") names instead of QNames in content. This is the combination of namespace and name into a single unit: {namespace}name (James Clark invented it, so far as I am aware). For vocabularies in which references are uncommon, it's a simple, if somewhat cumbersome solution (XML Schema would groan under the weight, but it makes very extensive use of references by QName). If expanded names seem too cumbersome for the frequency of reference, then define an *application-level* abbreviation or mapping syntax. Nobody else needs to understand it, after all; it's not going to be built into your XML processor. If it's an application responsibility, define it at the application level. And a final principle: dump the bogus URIs, and use the simplest namespaces possible. In namespaces, the 'scheme' portion of a URI provides no information. Drop it. Alternatively: provide a public example of two XML namespaces differentiated by scheme. That leaves domain + path + query | fragment (for common URIS; it's different for URNs or for the mail: scheme and there are other corner cases you can generate, I'm sure, but we're going to focus here on domain-based namespace URIs with some flexibility) (or you can provide examples of other things; and do note that urn.[urn-pattern] is a perfectly reasonable 'domain' replacement that won't collide with DNS). Query and fragment are not generally used, so let's drop them as well (or you can provide an example of two public XML namespaces differentiated by query or fragment). So, domain + path. That can be simplified to reverse-domain naming, with extension, if you care to. Certainly easier, and it's commonly encountered in a number of programming languages. It has the further advantage of removing the nasty non-Name characters, so you could do a form of expanded name as namespace:name rather than {namespace}name. The strongest objection to this would be an example of two public XML namespaces differentiated by identical fragments in the machine name and initial portion of the path, such that this change would make the two identical. Mind you, the above is suggested best practice. No current XML processor will choke on a URI without a scheme, or with extended-reverse-domain naming, and the unique strings compared by users become easier to read and comprehend, with no loss of distinguishing information. But since we're not proposing a change to processors at the moment, the most baroquely ugly and evil URIs remain permissible. An alternative: define an "xmlns" 'scheme' for URIs (that identify namespaces, not pretending to identify any other sort of resource), with a simplified pattern as above. This is slightly more problematic for the longer term, because a QName has one colon, not two. But such a scheme definition could otherwise restrict itself to characters permitted by the XML Name production. Summarizing: we could, at present, start using simplified domain-based non-URIs for namespaces, avoid QNames in content (and replace with application-level mapping as needed), use default-prefix mapping for elements, and only use non-default prefix mapping (xmlns:prefix="namespace") for attributes. This doesn't get us a long way forward, but avoid some problems, and opens the possibility, if enough people decided to do this, that some future cleanups to Namespaces in XML (along these lines, that is) could be implemented. There are some obvious obstacles. W3C XML Schema is one. It makes very extensive use of QNames in content, for reference; it commonly binds the default prefix to the target namespace so that those references need not be prefixed, in content, which means that the structure of the schema (the elements) must be prefixed. This is hard to fix, although it would be straightforward enough for a future version of schema to drop the use of QNames and adopt a schema-parser/validator-level mapping instead. But it must be acknowledged that Schema's use of QNames in content is one of the primary obstacles to making any change to the status quo. XPath (and XSLT and various other things, like XPath2 and XSLT2 and XQuery) are probably easier. XPath currently delegates mapping to its host language, which means that a host language revision or variant could use application-level mapping instead of breaking layers by using XML processor-level mapping. However, a variant of XPath is perfectly feasible, in which the expressions are enhanced with the inclusion of namespaces, using the JC expanded-name form. Instead of //xs:schema/xs:annotation//html:p : //{org.w3c.xml.ns.schema.2.1}schema/annotation//{org.w3c.html.5.1}/p (with an apology for the version numbers). XPath could define the context of a namespace declaration (represented by the {namespace} particle prior to a name) as either 'descendant' or 'following-in-expression' (some investigation would show which would be preferred; descendant seems likely, unless there's more use of non-descendant axes than I've encountered). Note that this would not include foreign attributes, which would appear as @{namespace}name. But what's the point? Sure, I can argue that it's best practice to do some of this stuff, but we've already seen multiple namespace-fixup proposals die in flames. Eh. :-) I think that these are best practices, and can be adopted by folks now, without checking with other people. If they were adopted, then some of the possible solutions outlined in the final 'problems' section might see some traction (but nobody's gonna bother doing expanded names in *major* application dialects unless they've seen other folks adopting expanded names elsewhere). They probably make your XML cleaner, more understandable, and work with current processors. And given enough folks adopting something like this set of practices (especially with regard to QNames: remove them from content, use default prefix only for elements, always declare the prefix in the same element that contains foreign attributes), processors could start to consider optimization (or, more bluntly: put the crufty namespace code off in a de-optimized branch that's only invoked when the simple (and faster) best-practice form isn't working). Given enough adoption of namespace-simplification (to extended-reverse-domain, or something equivalent), then a new set of revisions of core specs might acknowledge that, as well, and might even permit use of un-mapped names (actual "qualified" names in this case: org.w3c.xml.ns.schema.3.0.element, for example). And perhaps even move to a central registry for widely-used vocabularies (org.w3c.xml.ns.schema.42.0 == xs ?). What, are you *still* reading? It's the weekend! Go do something fun! Amy! -- Amelia A. Lewis amyzing {at} talsever.com Merchant, street girl, beggar, yeoman, king or common, man or woman, only two things make us human-- sorrow and love, sorrow and love .... -- The Last Song of Sirit Byar
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|