Re: limits of the generic
Hi Jonathan, > I think that the way we handle integers is consistent with W3C XML > Schema, which was not designed to define *operations* on types. We > don't change the definition of integers, we simply define operations > on them. Hmm... but W3C XML Schema *does* define *some* operations on types: is equal and is less/greater than (normatively) and how to add durations to date/times (in a non-normative appendix). It has to in order to specify the behaviour of the enumeration and min/maxIn/Exclusive facets. It also defines how values are matched by regular expressions, which I guess is an operation. On the integer front, I guess that I'm confused by things like (in the XQuery/XPath 2.0 WD): It is also possible to construct values of various types by using a cast expression. For example: * cast as hatsize(9) returns an item whose primitive value is the integer 9 and whose type is the user-defined type hatsize, derived from xs:integer. since when I see the term "primitive" I think it means the same as it does in W3C XML Schema. >>In W3C XML Schema, durations are covered by xs:duration; in the >>current WDs for XQuery and XPath 2.0 you have to use either >>xf:yearMonthDuration or xf:dayTimeDuration to get anything useful >>done, even when a general duration would be completely unambiguous. > > Here I think we are dealing with a bug in W3C XML Schema - and > perhaps one could argue that durations are less universal than > integers, especially with respect to time zones. The most basic data > types may also be the most useful. I don't understand the reference to time zones with regard to durations -- as I understand them, durations don't have anything to do with time zones; the only things that (may) have time zones are the date/time types. But I think you're right about how the most basic data types are the most useful. >>> If I take an integer out of a relational database and give it to a >>> Java program, I would often like the Java program to know that it >>> is an integer. Not just for one XML vocabulary, because I want to >>> write tools that can handle more than one XML vocabulary. What's >>> wrong with that? >> >>What's wrong with that is that you are tightly coupling your >>database with your Java program. You are not only transferring the >>data, but also dictating how that data should be interpreted. This >>means that you tie your XML document into a particular use -- you're >>basically using XML for *procedural* markup. Of course procedural >>markup can give you many benefits, but it is not the only way of >>working. > > Hmmm....here's an area where we disagree. I think that a key benefit > of data types is precisely that they are not procedural. They > capture a small, abstract set of semantics which give the > information needed for sensible general processing, like comparing > two items or building an index. Ahh... thank you for this. It flipped a switch somewhere in my head and changed how I view data typing in XML to such an extent that I can't now reconstruct how I was picturing things, or thought you or anyone else were picturing things, before. What you're saying is that data typing in XML is a matter of labelling a value as "xs:integer", not actually dictating how the recipient of that value must treat the value. The recipient may or may not use the label in order to work out how to process the value. So, for example, a Java application could choose to treat a value labelled with "xs:integer" as a floating point number. So we can think of the labelling of values with data types as being very similar to labelling content with elements. A data type definition describes how to validate a values in the same way as an element declaration describes how to validate an element's content. In a physical XML document, content is labelled by elements explicitly; values are labelled with data types through an annotation process which coincides with validation. The two sets of labels kind of exist side-by-side, giving you two different "views" on the data in the document. Sometimes a data type library might define how a processor should perform particular operations over the values that are labelled by the data types that it defines. Similarly, the definition of a markup language (e.g. XSL-FO, XSLT) might include a specification of how content in that markup language should be presented or otherwise processed. I'd call these procedural or proscriptive data types/markup languages. Other data type libraries and markup languages might be much more declarative -- be much more purely "labels" that can be interpreted in different ways by different processors at different times. With elements, the kind of processing that a procedural markup language might specify is usually to do with display or presentation. With data types, the kind of processing is more to do with how to compare values of the data type or perform common calculations with the data type. A data type called "colour", for example, could be described simply as "a colour represented through an RGB string in the format #RRGGBB", without saying anything about how two colours could be compared, or whether you can add or subtract other colours from it. That would be a declarative data type. On the other hand, a specification of the "colour" type could additionally say "a colour is less than another colour if the sum of its red, green and blue components is less than the sum of the other colour's red, green and blue components". This would be procedural because the data type's specification is telling an application how to perform a particular operation with a value of that type. Does that make sense as a way of looking at data typing in XML? I'd view the W3C XML Schema Datatype library as being a prescriptive/procedural set of data types, because they do explicitly specify how they should be processed. You can treat them just as labels if you want, of course, and use some generic data type processing to manipulate them. That would be like treating XSL-FO as just another XML markup language -- displaying a document written in XSL-FO as a tree rather than in the layout that the XSL-FO specifies. Extending the analogy, it seems to me that XQuery/XPath 2.0 is like an XSL-FO processor, in that it's specifically designed to be able to operate over a particular set of data types. I'm wondering how far you could get with a processor that took a more generic view of data types. Perhaps one where the way in which operations can be performed over values of particular data types was described in an external specification, a bit like the way you can define how to display an XML document using CSS. It would need to have some basic building-blocks that it knows how to manipulate, such as "number", "string" and "boolean", in the same way that CSS has some basic building-blocks that it knows how to display, such as "block", "inline" and "list-item". I'm thinking of something where a data type library could provide a formal specification of how to perform operations on a type, which the processor would then read and use when performing those operations. For example: <dt:datatype name="my:UKDate"> <!-- a date in the format DD/MM/YYYY --> <dt:components> <dt:component name="day" select="substring(., 1, 2)" /> <dt:component name="month" select="substring(., 4, 2)" /> <dt:component name="year" select="substring(., 7, 4)" /> </dt:components> <dt:compare to="d" select="if (#year > $d#year) then 1 else if (#year = $d#year) then if (#month > $d#month) then 1 else if (#month = $d#month) then if (#day > $d#day) then 1 else if (#day = $d#day) then 0 else -1 else -1 else -1" /> ... </dt:datatype> [I've used the notation #component to refer to a component of a data type. For example #year refers to the 'year' component of the current value. In a processor that supported generic types, you'd have to have some generic way of defining structured data types like this. Another option would be to use '.' as an operator, I don't think that would be too bad, but it would mean that you'd have to do something like: ". . year" to refer to the year component of the context item, which is probably more confusing than that other wonderful operator-which-is-also-a-location-path, *.] Of course there's always a trade-off to be made between generic processors and specific processors (gosh, I've even managed to make this email relevant to the subject line again!) but if data types are nothing more than labels, are really declarative, then I think that generic processing is a real option. Certainly one that would be interesting to explore. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format