[Home] [By Thread] [By Date] [Recent Entries]
Every time I've read XML Schema Part 2: Datatypes, I've been unhappy with the wide variety of compound types that are considered 'primitives' by the specification. Leaving aside the issue of primitive types that could be derived from other types, we've still got compounds like: * duration * dateTime * time * date * gYearMonth * gMonthDay * QName There's been prior discussion of internationalization (i18n) problems with the date and time formats, and I think it's fair to say that dates and times are the most contentious area on the data typing side. I'm wondering if there's a better way to handle these things, especially in contexts (RELAX, TREX, Schematron, Examplotron, no schema at all) where we aren't necessarily using XML Schema anyway. It seems as if regular expressions could be used not just for validation of typed content, but for fragmentation of typed molecules into smaller atoms. Instead of binding users to a particular (ISO 8601) date format, this approach would let users provide their own rules for fragmenting date strings into the parts we need for processing - year, month, day, etc. It would also open up the prospect of treating other compounds - like the CSS style attribute, some of the path information in SVG, and various other places where the principle of one chunk, one string has been violated - as a set of atoms which could themselves be validated and/or transformed and/or typed. This leads to another kinds of post-processing infoset, where the atoms are available as an ordered set of child nodes, but it seems like a promising road. Simon St.Laurent - Associate Editor, O'Reilly and Associates XML Elements of Style / XML: A Primer, 2nd Ed. XHTML: Migrating Toward XML http://www.simonstl.com - XML essays and books
|

Cart



