[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Suggested guidelines for using local types. (was Re: Enlightenment v
James, A long response, slow in coming. Not all the pieces are fully fleshed out, but I think I'm getting to the essence of the issue, at least from my perspective. I agree there seems to be much confusion about what the questions are. However your formulation still doesn't give the questions I've been addressing, which is, of course, my lead in to reformulating them. Hopefully that will also answer why I'm particularly concerned with XSDL, even though I don't view it as the center of the Universe. If it turns out that what I thought was an XSDL-specific problem also applies to RELAX NG, then it would be nice to resolve it similarly. I see three questions: > (a) If the meaning/allowed content of an element is highly > context-dependent, should the name of the element be > namespace qualified or > not? (b) If the answer to (a) is yes, then what should the namespace be? (There is a subordinate question here, which you don't answer, of what a namespace is.) > (c) If an element is declared by an XSL local type, should the name of the > element be namespace qualified or not _under current circumstances_? To foreshadow a little bit, I think the answer to (a) is unequivocally yes, which may surprise readers of this thread. However, the subtle question is (b). If the answer to (b) is not "the schema namespace" (in XSDL terms) then, _under current circumstances_ and following the Engineer's Hippocratic Oath (if you have to make a decision, make the decision which is easiest to back out/change later on - don't look for it, I just made it up), the best thing to do is to leave them unqualified because: 1) it distinguishes them from other elements which can currently be appropriately namespaced. This is important information for applications. 2) once the appropriate mechanisms are introduced to put them in the appropriate namespace - whatever that is determined to be -, current instances can be updated by adding information and those parts of applications which dealt with them isolated and updated. If they've been put into the schema namespace, then existing information must be changed - instances rewritten and schema elementForm values changed. So my basic position, which seems to get lost in the heretical statement that there could be a reason - however tactical - not to put a name in a namespace (they catch colds so easily, you know), is that XSDL support for namespacing local types is incomplete at best. Ultimately, this will need to be resolved (I hope). In the meantime, use that choice (unqualified) which is most likely to cause the least pain in moving to the eventual solution. (Note that even if the final decision is to put locals in the schema namespace, schemas using the defaults can be converted by adding an "elementFormDefault='true'" attribute and instances by adding default namespace attributes to elements. No existing information is touched. Note that for any final decision other than putting all locals in the schema namespace, any existing instance must be touched.) So to me, the crux is the answer to question (b), what should the namespace be? As you might imagine, I don't think it should be the schema namespace. I'll argue this from two directions. I'll start by textual exegesis of the normative portions of the NS rec and associated rfc's. I know this is a dangerous activity (been slapped for interpreting the (w)rec(k) before, but what the hell). Then I'll appeal to referential integrity and referential opacity, concepts developed by Gotlob Frege, the father of modern logic (there's nothing really complicated here, though). Turning to the normative part of the NS rec, we see the following statement: "The combination of the universally managed URI namespace and the document's own namespace produces identifiers that are universally unique". Naturally, I next looked for a definition of identifier. I couldn't find one. But I did find a normative reference to rfc2396, "Uniform Resource Identifiers (URI): Generic Syntax". In there I found the following definition: "Identifier: An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax." If you put this all together, it means something like "the combination of the universally managed URI namespace and the document's own namespace produces a universally unique object that can act as a reference to something that has identity." Tim Bray coined uname for this. Let's use it as the universally unique object for talking about the kind of object formed from the above described combination. The phrase "reference to something that has identity" is not further defined, so now is the time to make reference to the notions of "referential transparency" and "referential opacity". Referential transparency is similar to context-free - something is referentially transparent if it refers to the same "thing" regardless of context. Functional programming languages are generally considered functionally transparent if an expression that equates to a value will always equate to that value - therefore the value can be substituted for the expression. (This is great for optimization.) When dealing with a name, it means that the name means the same thing regardless of where it shows up. It always names (or labels) the same thing. Thousands of years of diplomacy, a hundred years of logic, and 50 years of programming show that referential transparency is a Good Thing. Slightly restating a definition found in [1], it is "the fundamental property of mathematical functions which enables us to plug together black boxes.... There are a number of intuitive reading of the term, but essentially it means that each [uname] denotes a single [type] which cannot be changed ... by allowing different parts of a [schema] to share the [uname]." There's now a very interesting split. It revolves around the two questions: 1) what is a namespace (and does it have any intrinsic 'meaning' at all)? 2) is a uname "a reference to something that has identity"? I would argue that Relax NG (following Makoto's work and XDuce, which is really a restatement of Makoto's work from a different angle) and XSDL take opposing answers to these questions, and those opposing answers explain a lot. Relax NG, following Makoto's strong grammatical view, considers a namespace meaningless from the perspective of the schema - applications may impute meaning later, but from the point of view of Relax, a name is simply a pair of strings, one of which happens to look like a URI, and one of which happens to follow the Name production from XML1.0. Likewise, the uname refers to nothing beyond itself. Trivially, one can enumerate all unames, and each would be different. The uniqueness business is a trivial property of the enumerability. Unames are just tokens to be manipulated by the grammar. The common notion of "element type" is meaningless in Relax NG, although one could write schemas and applications that behaved as if such a thing existed. The types of Relax NG are the <define> elements of the core. This has interesting implications for applications, although I'm intensely jealous of Daisuki Okajima and RelaxNGCC. In particular, type information _never_ shows up in an instance - just as looking at an arbitrary character in a string doesn't tell you which non-terminal that occurrence is defined by. XSDL, on the other hand, considers namespaces, or at least those that are the value of the targetNamespace element, to have a very particular meaning. A namespace names a schema, and a schema creates a set of identifiers and structure definitions. If you don't include local names, then given a valid document, unames in the instance are referentially transparent identifiers. The element uname refers to the unique declaration labeled by the schema with that name. DTDs (at least after pe's have been expanded out) have the same property. (Most of the discussion of context is more about what you do with an element once you have it, than what the element is.) One can quibble with the direct statement of this (Tim obviously does), but then one can bring in the mathematical guns - as long as there's a one-to-one mapping from element names to element types, they're identifiers. Referential transparency has demonstrated its utility for longer than SGML's been around - it should take a good reason to lose any of it, beyond the aesthetics of document appearance or some notion of "best practices" established in an environment in which either referential transparency was assured (DTDs) or unenforceable (Relax NG, well-formedness). One good use of the XSDL approach is making contracts. Suppose two parties want to come to a common agreement about the form of a PurchaseOrder. I think it is very convenient to be able to directly refer to that agreement as the PurchaseOrder type - and when a PurchaseOrder appears in an instance, the appropriate definition can be retrieved. In the Relax approach, one cannot directly reference the definition from the document - the decisions about the structure of PurchaseOrder are indirectly implemented in <define> elements (as I read the spec). Nor is there any direct way to relate the different "effective" content models of the PurchaseOrder element. This is why I generally see RelaxNG and Schematron as functioning best as local schema language and the XSDL approach (sic!) as better for "public" schema languages. The best way to be clear about your type system is to wear it on your sleave. That's why I like naming, extension, etc., to be algorithmic. It has always been my intention to work on type-inference based schema languages after finishing XSDL, but you know how that went.... Which brings up another weakness of the current XSDL, which cannot appropriately namespace its own elements - there is no normal form of a document in which all the pertinent information is available without validating because certain constructs lack unames - some even lack names. Type information in the case of anonymous types, and correct namespace information in the case of local elements, is information that must be added by the PSVI. Were there to be such a canonical form (in other words, were everything nameable in a referentially transparent way), then it would be possible to safely manipulate schema documents with technology for well-formed one XML. All the pertinent (i.e., schema related) metainformation would be directly available in the instance (or inserted by a single pass through a validator) and there'd be no need for most applications to refer to the PSVI. No one should read any of this as an attempt to "dis" RelaxNG, for which I have a great deal of respect (if not enough free time to truly grok). I don't see any issue with a synthesis in the future, particularly if you've left Makoto's beloved closure properties tractable. Matthew > -----Original Message----- > From: James Clark [mailto:jjc@j...] > Sent: Wednesday, September 05, 2001 7:45 PM > To: Fuchs, Matthew; 'Jonathan Borden'; xml-dev@l... > Subject: RE: Suggested guidelines for using local types. (was Re: > Enlighte nment via avoiding the T-word) > > > Not everybody seems to be answering the same question here. We can > distinguish the questions: > > (a) If the meaning/allowed content of an element is highly > context-dependent, should the name of the element be > namespace qualified or > not? > > (b) If an element is declared by an XSL local type, should > the name of the > element be namespace qualified or not? > > From my perspective, question (a) is the primary question, > and although XSD > may be relevant, it's not an XSD-specific question. It's a > namespaces > question. It arises equally if you are using RELAX NG to define your > vocabulary. People who view XSD as the center of the XML > universe may view > (b) as the primary question. > > My answer to (a) would be that it should be namespace > qualified. Here's > why. I don't see a sharp, binary distinction between > context-dependent and > context-independent elements; rather I see a continuum of > different degrees > and kinds of context-dependence. For example, > > 1. At the most context-independent end of the spectrum, we > have an element > like <html> which occurs only as the root element. > > 2. Another step down, would be something like <h1> which > cannot occur as a > root, but has consistent content model and processing > wherever it occurs. > > 3. Another step down, would be something like a <title> > element that can > appear as the child of a <chapter>, <section> or > <subsection>. It has the > same content model, but the processing may partly depend on > the context. > > 4. Another case would be an element subject to SGML exceptions. Say a > <para> make occur inside or outside a <footnote>, but inside > a <footnote> a > <para> may not contain a <footnote>. In a DTD, you would not > be able to > express the distinction. In RELAX NG, you would use a > separate pattern for > the content of a <footnote> in a <para> > > 5. Further towards the context-dependent part of the > spectrum, would be > something like <param> in HTML; it is allowed by both > <object> and <applet> > with a consistent semantic, but it doesn't make any > interpretation outside > its containing <object> or <applet>. > > 6. I guess the most context-dependent would be something like > thead/tbody/tfoot which occur only in a table. > > I don't see any point on this continuum where it makes sense > to draw a line > and say: above this line namespace-qualify, below this line don't > namespace-qualify. > > I would suggest instead that the question of whether to > namespace qualify > should be based on the answer to the question: what is the > namespace that > defines the meaning of this element? If there is such a > namespace, then the > name of the element should be qualified with that namespace. > If there is no > such namespace, then then name of the element should not be > namespace-qualified. > > As for attributes, I would say that the attribute should be namespace > qualified if (and arguably only if) the meaning of the > attribute is not > determined by the namespace of the parent element. This > implies that the > name of the attribute that extends the attributes of a > namespace-qualified > element should be namespace qualified. This seems a natural > guideline to > me. (I think it corresponds to what ##other does in XML Schema for > anyAttribute.) > > One objection to this is that it is not uniform between elements and > attributes. My response would be that this non-uniformity is > appropriate > given that this is primarily a namespaces issue, and given that the > Namespaces Rec does not default namespaces uniformly for elements and > attributes. > > James >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|