[Home] [By Thread] [By Date] [Recent Entries]
At 2001-04-26 01:12, Eric van der Vlist wrote: >W3C XML Schema has sacrificed flexibility (a quality useful for document >centric applications often authored by a wide range of tools including >manual edition that can be seen as a defect by data application) to put >the emphasis on datatypes (an absolute requirement for data >applications). > >This is best seen in the extremist way W3C XML Schema is forbidding any >non determinism just to insure that the datatypes in the PSVI are those >expected by the schema authors. Er, I guess that's one way to look at it. But the non-determinism rule is one we inherited from XML 1.0 DTDs and SGML DTDs. I suspect the developers of ISO 8879 will be surprised to learn that they are guilty of having sacrificed the needs of document-oriented systems for the sake of supporting data-oriented systems. Let me be blunt: some members of the WG do think the non-determinism rule is dumb in ISO 8879, and regret that the XML WG took it over into XML (though I certainly remember why, and I don't see how we could have done it differently), and they argued for an XML Schema language without the non-determinism rule. But we failed to persuade the WG that it served no useful purpose. And anyone who faces the choice between a restricted language and an unrestricted language, and is not 100% convinced that the restriction serves no purpose, is likely to say "Well, let's go with the restriction for now, we can relax it later if we want. But if we relax it now, and then discover we were wrong, then it will be too late to add the restriction in later, because it will break things." I invite people to show that the requirement for determinism (i.e. for LL(1) grammars) can be dropped without hurting anybody, because then maybe we can get consensus on dropping it in some future version. But interpreting it as an innovation imposed on the document community by dataheads is just wrong: if the non-determinism rule is bad for document processing, it's a self-inflicted wound. >For example, you can't define a simple and flexible vocabulary where a >document would have a title, an optional description and any number of >paragraphs without imposing a relative order to the different elements. Huh? It's complicated, but it's doable. (p*, ((title, p*, desc?) | (desc, p*, title)), p*) >One of the examples I am often using in my trainings and papers is a >vocabulary where you can either define an element "inline": > ><character id="character_Peppermint-Patty"> > <name>Peppermint Patty</name> > <since>Aug. 22, 1966</since> > <qualification>bold, brash and tomboyish</qualification> ></character> > >or by reference: > ><character ref="character_Peppermint-Patty"/> You want #CONREF. If it had worked consistently and interoperably in SGML, it might be in XML. But early on, the parsers were not in agreement on which CONREF examples were valid and which were invalid, and the people I know who cared about interoperability all shunned it. As a workaround, define 'character' as containing (char-ref | (name, since, qualification)) But I assume you already know this way to work around the problem. >The W3C XML Schema working group has widely used this construct in their >vocabulary and I find it surprising to see that violations such as: > ><element name="foo" ref="foo" ... > >cannot be captured by the schema for W3C XML Schema! Yes, that one really hurts. Maybe in a future version. -Michael Sperberg-McQueen speaking only for himself
|

Cart



