Re: What are the characteristics of a good type system for XML
From: "Amelia A Lewis" <amyzing@t...> > >- A type system should be based on a small number of primitive types (much > >smaller than those in XML Schema Datatypes) , and all other types should be > >defined in terms of these. > > Err. I have said that in the past. I've reconsidered, though. I would say > that the type system must define the rules for creating and publishing > primitive types. Then let the authors and users and implementors of XML > decide which of those are interesting and useful. This also means that > private agreements can adopt less "universal" types that happen to be well > suited to their particular domain. Yes, this is necessary. But I'm not sure it is sufficient. > >-A type system should be extensible, but more than that, there should be > >ways to introduce new type systems (for simple types). > > Agreed. > > >- The ways that types can be extended should not be limited to a few > >predefined parameters, e.g., the facets in XML Schema. A type system should > >be able to define its own parameters. > > I think we're on the same page. Apart from defining how to define new > primitive types, the system ought to also define how to define derivation > and composition algorithms. > > >- A type system should provide, for each type it produces, a function to > >answer true or false if a given string is valid, > > Yes. > > > a function to translate a > >string into an instance of the type defined in terms of the primitive types > >mentioned above, > > err, no. I think that's outside the scope of XML. It's outside the scope of XML parsing and validation, but inside the scope of XML transform and query languages. Atomic types cannot be entirely opaque to such languages. At an absolute minimum, users expect to be able to do arithmetic using values of numeric types as operands. While I don't agree with their choice or the hard-wired nature of it, I can certainly see why designers of a query language would jump on a type system that gave them the numeric types their users would demand. The date types are scary, but _some_ representation of dates is obviously an important application requirement. If you want to look at another ugly type system, check out any SQL dialect. Same reason. A small number of types are ubiquitous in business and scientific applications; the rest creep in as implemention details or frozen mistakes. > It might possibly be > useful, but you can't really predict, sitting inside the XML world, what the > type system you're mapping onto looks like, or how the transform is going to > work. Supposing that a library defines a "date" type, how can it reasonably > define the transformation to an instance of that type in Java, C++, Python, > Haskell, and Perl as a single function? That's a good question, but I think it can be answered. (Unfortunately, not concisely. Next time, maybe. ;-) It is only necessary to be as universal as the users of a type demand. The RNG datatype api is defined in terms of Java. Languages that participate in the CLI can use an existing datatype library directly; in the worst case, the library must be hand-translated to another language. Even then, the api is trivial to translate, as it is defined in terms of strings and boolean tests; all the complexity is in the types. That seems about right. The RNG api is carefully designed to make no assumptions about types not necessary to perform validation. Thus, it is reasonable to point to it as an exemplar of good api design, but it is not reasonable to assert that it meets the needs of every other application. As noted elsewhere in this thread, not all types can be collated, but validation only requires an equality comparison, so the issue is avoided. Query/transform languages, however, require sorting, so the issue cannot be ducked. Validation does not need to do arithmetic, either (or at least RNG validation doesn't) but query/transform languages do. Extending an api for sorting is trivial. One simply needs a boolean test whether the type is sortable, and another along the lines you suggest to do the comparison. Extending an api to allow conversion between string and one of int, float, double or boolean is also trivial. These are in the intersection of every language one would bother with, as are arrays of any of these. Beyond that, an api must and should be opaque. There are two ways to approach it. One can declare that there is _no_ way to convert between string and instance, other than read the definition and write the code. This is the approach used today. The api for dates begins, "First, write an ISO8601 parser..." The other is to provide an opaque interface that provides no more than a means of converting between "instance" and string representation, together with a way to determine at runtime whether the interface is available for a given type. Having such interfaces would be more useful than not having any, even if they are not always available for a given type and even if availability varies from language to language. It would open the door just wide enough to allow implementations to slip in; universality would be determined by availability and demand. Most non-trivial instance types must carry along with them some sort of library that provides a means of introspecting and manipulating them. For an object-oriented language, the instance would be an object and the library a set of classes necessary to use the object; for a non-object language like C, the instance would be a struct and the library a set of functions; and so on. Such a library can be outside the domain of XML for a given type as long as people think it should be, but it is well inside the domain of application languages that must use a datatype in a non-trivial way, e.g., to format it for localized presentation, to do whatever arithmetic or algebra the type permits. The "date" types in XQuery/XPath 2.0 are good examples precisely because there is no programming language in use today that provides exactly those types. I would much prefer to have the opaque api and accompanying library, that I might apply to any language, than to have it hard-wired into each implementation of XQuery/XSLT. Bob > > and a function to translate an instance of the type to a > >string. > > Err. Well, it *is* a string. In XML. > > Also, you've left out sorting. > > I would say, so far as functions go: > > For the type gronk: > > Given a string, the gronk type specification allows you to determine if this > is a representation of a valid instance of type gronk. > > boolean gronk(xmlstring); > > Given two strings known to be of type gronk (see preceding function), return > -1 0 1 to indicate whether the first is smaller than, equal to, or larger > than the second (an equality function, plus a bit). > > [-1,0,1] gronkSort(xmlstring, xmlstring); > > Does that help? > > Amy!
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format