[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Some random noise on rational type systems for XML
The intrusion of the W3C XML Schema type system into core XPath/XSLT struck me sufficiently to cause me to want to think about type systems and XML again. So here's *that* old permathread again .... I think one of the worst problems with W3C XML Schema's types is that they do not represent a system. This leads me to ask: is there a universal type system? Answer: apparently not. Types are an imposition of external categories onto information, in order to make that information more amenable to manipulation. If that's so (you don't have to agree, of course), then there are two criteria for evaluating a type system: completeness and comprehensibility. To be a little more precise: a type system ought to either fully represent common categories of data, or ought to have mechanisms for extensions that do so: completeness. A type system ought to have a relatively small number of primitives and a relatively small number of rules for creating more; the rules for derivation/extension/restriction ought to be consistent as well: comprehensibility. Note that I'm leaving "complex types" (structured types) out of this discussion. W3C XML Schema and RNG both do a Pretty Good Job[tm] of establishing a means for creation of structures. I want to focus on "value types", the things that are represented in text and attribute nodes in XML. The most commonly encountered programming languages these days start from registers, and base types on packing the largest amount of possible information into the smallest number of bits. This is not necessarily the best solution for XML. First principle: the XML ur-type is "string". Everything in XML can be represented as a string (MUST be representable as a string). It can therefore be manipulated as a string--truncated, concatenated, case-transformed, etc. Possibly not *meaningfully* from the perspective of the data author, but always *possibly*. Note that "string" is actually a subset of Unicode (which subset depends upon whether you want XML Classic (1.0) or New XML (1.1)). Take a quick look at W3C XML Schema, and let's let that inform some initial discussion. Throw out the twenty-five derived types; they should never have been normative (only the rules to derive them need to be normative). That still leaves us nineteen. Lessee ... well, eighteen, because we've defined string as the ur-type. Okay, drop another seven, by collapsing all the date types into one conceptual date. Lose another two by making double and float numbers. Combine *binary into a single type (it can have an "encoding" attribute, which allows the addition of things like yEnc, if you're so inclined). Drop Notation. WTF is anyURI doing as a primitive? Clear influence of the Church of the Holy and Universal Thingy-that-Identifies-a-Thingy-with-Identity. Hmm, that should leave us about six types (all of which are strings): boolean binary [octet-stream] number date duration Hmm. We're missing one. Ah, that's it: QName. Question: does XML need a pointer type? Which would, of course, be represented as a string. If so, it might include, for instance, QName, XPath expressions, and URIs. Let's say that there's an abstract pointer, maybe. Six types. Even I can remember that. Now, there's an interesting thing that happens when you start passing information around and storing it here and there. The SQL people encountered this, and found a solution, which made them heretics in the eyes of the relational true believers. The problem is that whenever you have a thing that has a value, it is often useful to be able to say "don't know" "not specified" "undefined" "null" or "nil". W3C XML Schema introduces a mechanism for this. So, umm, why? XML already has a way to say nothing. Say nothing. The empty string. No data. Not specified. Presumably a schema need only specify "not nullable" to prevent this appearing, but by default, a specification of "true|false" as permitted values for boolean also includes "" (otherwise known as the Pilate option). Now, who gets to decide what's an Authentic First Class Genuine Type and what's a Shoddy Knockoff? W3C XML Schema's answer is to set up an authoritative agency. Not sure why; it's not the Web Way. Let a Hundred Points of Type blossom! Implementors of validating XML parsers can respond to user demand. "Support the sstl geographic types library!" Or they can design the silly things so that users can plug in validation modules. RNG already has a mechanism for specifying type libraries. Note also that not all the primitive types have to be actually *usable*. We can define the base octet-stream type to be "abstract", so that it has to have a derivation in order to know how that octet-stream is being represented as a string. That gets us to the point of wondering about principles for derivation of types. If, after all, we have a generic "number" type, we prolly *do* want some rules (that are small in number and consistent) to specify, either in a type library definition or in a schema instance, that the number MUST have a range that fits into (coincidentally) a sixteen-bit register using ones-complement notation. Heh. But this is already too long, and besides, I *enjoy* cliff-hangers, so let's just Tune In Next Week for Another Bland Episode .... Amy! -- Amelia A. Lewis amyzing {at} talsever.com So what is love then? Is it dictated or chosen? Does it sing like the hymns of a thousand years or is it just pop emotion? And if it ever was here and it left does it mean it was never true? -- Emily Saliers
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|