[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why XML data typing is hard
James Robertson wrote: > | > <prop name="state" xml:regexp="[A-Z]+">NY</prop> > | > | It's a neat way of doing it, since checking is optional and > | transparent to non-checking applications. > > Wouldn't this be better placed in a DTD? > > By adding a fixed, pre-set attribute with the regexp to > element definitions in the DTD, you can enforce consistency. Absolutely. Now, how would you specify that some pattern should be used for a particular element (or attribute) contents? > Otherwise, can't the user just choose to use this or not, > on an individual, ad-hoc basis? Yea, not such a good thing. > All that being said, I am of the belief that all of > this should be placed in application code. How about a new layer between XML and the application? The layer (a) filters SAX-like 'messages' (function/procedure calls) from the parser to the application and applies patterns to data as necessary, generating new messages or (b) can be applied to a DOM implementation to check the validity of a document more completely. I'm guessing here, but perhaps there's something we could specify (in XML of course) that provides this validity information. As a separate document with its own structure, it could be developed on its own track so as to not add more cruft to XML and checking for applications that don't need it. > XML isn't a solution to any problem, it is a storage and > interchange format for applications ... Well, in a sense, yes. All of the interchange formats that I've delt with (we call them protocols :-), don't stop at the structure of the message but also specify what is acceptable content. > Why try to cram the entire world of computing > science into XML? I wasn't going to try and do that in this thread... :-) Ketil Z Malde wrote: > > No, not specific to a language mapping, that belongs in some API or SAX > > reference not in XML. > > That's what I meant (I think). It would make SAX a whole lot > more complex, though, if it has to understand e.g. standardised > dates, and return some kind of date object (or struct) when it > encounters one. OK, so keep it out of SAX. The work of translating "1.5" into 1.5 has to get done someplace, is done in a very similar way by lots of applications, and seems ripe for standardization. IMHO, it doesn't seem like a huge leap to go from XML documents that contain just text to ones that contain atomic types (boolean, integer, float for starters). I assume that date formats have even more variations than numbers, at least until there is agreement on a stardate! So stick with simple things like binding a regexp pattern to content. There will be debates about a date being an atomic type or a structure (I tend to think of them as integers with a really bad number base). There shouldn't be any need for structure parsing because structures will already be described by the XML document being parsed. > I would have thought it would be simple, but then again, > I'm culturally biased, and hadn't read the Unicode regexp > document. Oh horror! :-) It looks 'hard', but doesn't seem like there's any more real complexity than what hasn't already been solved by some very talented folks. In particular it is written... > (Regular expression syntax varies widely: the issues discussed > here would need to be adapted to the syntax of the particular > implementation.) This pattern definition/association document (this beast needs a name!) can make all that hand wringing and the "levels of support" go away. No need for funky esacpe characters, escaped escape characters, misinterpretation of parens, brackets, braces, stars...gag! Here is a start... <set id="letter"> ABCDEFGHIJKLMNOPQRSTUVWXYZ </set> <set id="digit"> 0123456789 </set> <set id="special"> _$ <!-- $ is a VMS thing --> </set> <set id="namechar"> <set idref="letter"/> <set idref="special"/> </set> <token id="name"> <set idref="namechar"/> <group optional="1" repeatable="1" disjunction="1"> <set idref="namechar"/> <set idref="digit"/> </group> </token> <pattern id="namevalue"> <token idref="name"/> <s ignore="1"/> <!-- 's' is whitespace --> <token idref="AttValue"/> <!-- from XML spec --> </pattern> BTW, if I'm not using 'id' and 'idref' correctly, please forgive me, I'm still very new at this! I'd be happy to take more discussion off-line if it doesn't belong in xml-dev. In the mean time I'll draft a DTD of this for feedback. Joel xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|