RE: Victory has been declared in the schema wars ...
Costello, Roger L. said: > Rick Jelliffe wrote: > > [Assertion] grammars are a bad foundation > [Assertion] Schemas should be based on paths > These are two powerful assertions Rick. > > Would you explain why grammar-based languages (e.g., DTD, XSD, RNG) are > a bad foundation? A grammar-based language tells an instance document > author what tags are allowed, how those tags may be arranged, and the > datatypes of the data. Isn't that what we want from a schema language? > Shouldn't that information serve as the foundation for a schema? > > Also, would you explain why paths (XPaths) provide a more suitable > foundation? OK > [Assertion] grammars are a bad foundation 1) They encourage or allow implicit structures (repetition groups are an unmarked-up structure) and so are antagonistic to the fight for tagging: explicit, simple, natural-language labels for structure down to a certain grain. 2) They divorce constraints from the business reason from the constraints, and so are antagonistic to the goals of transparency: constraints and capabilities are costs (and opportunities) that need to be justifiable against business requirements. 3) They are too weak to model non-regular patterns, and so are antagonistic to the goal of being able to represent arbitrary data that can be in any arbitrary graph: with grammars* you have to abandon > [Assertion] Schemas should be based on paths 1) They don't have the problems above 2) Processing XML very often involves XPath processing or transformations currently. One reason some people find some XML processing pipelines to be too heavy weight is that when you first validate against a state-machine then use XPath matching to transform the document, you are in effect repeating pattern-matching using two passes (with different technologies) where it could all be done in one pass. (Indeed it opens up the door for local validation modes, where you only validate the elements that are actually used in a transformation.) 3) If the things that grammars can express that paths cannot is, as it seems to me, to often be a unhelpful set, while the things that paths can express that grammars cannot are a very helpful set, then consolidation should take place in the direction of paths. 4) Path expression implementations seem to be smaller than XML Schema impleemntation, especially if we just talking about the streaming subset where you don't include the extended forward axes: descendant, following-sibling and so on. --------------- What about Schematron and paths for uses beyond simple validation? As far as the idea that you need a grammar to do storage type annotation, it is perfectly feasible to have a pattern like this: <sch:rule context="address/postcode" xxx:type="xs:short"> <sch:assert test="number(.) > 1000"> The post code should be greater than 1000 </sch:assert> <sch:rule> Or that you need a grammar to do semantic annotation (there is already a role attribute in Schematron): <sch:rule context="address/postcode" role="AustralianPostalCode" > <sch:assert test="number(.) > 1000"> The post code should be greater than 1000 </sch:assert> <sch:rule> Note that using paths does make completeness-checking a function of the schema-creation environment rather than being a necessary property of the schema. However, it also means that open or partial schemas are simpler to model. Note also that it is possible to have declarative labeling of patterns that can allow optimized evaluation. Use abstract patterns something like this (sorry untested) <sch:pattern abstract="true" name="ALL"> <sch:rule context=" $parent"> <sch:assert test="count( $child ) = count(*)"> Only the elements in <sch:value-of select=" $child "/> are allowed </sch:assert> </sch:rule> <sch:rule context=" $child "> <sch:assert test=" not(parent::*[name() = $parent ) or count( ../*[name() = current()/name()) <= 1"> The following elements can appear at most once <sch:value-of select=" $child "/> </sch:assert> </sch:rule> </sch:pattern> <sch:pattern is-a="ALL"> <sch:param name="parent" value=" xxx:address "/> <sch:param name="child" value=" number | street | town | state | country" /> </sch:pattern> where the idea is that formulating the abstract patterns for an implementation is an expert task (e.g. for a vendor or consortium) while using the abstract pattern is more a schema developer task. Cheers Rick Jelliffe (Of course, even regular grammars can be made more powerful by making each term contain an XPath and having some kind of axis iteration for each step. I described this in 1999 as "Axis Expressions" and RELAX NG took it up as far as adding attributes to content models; but no-one has ever taken it to its logical extreme and allowed any XPath expression as the particle of a content model...perhaps the nearest is Phillipe P's ALS schema language (is that what it is called? Yikes my memory is so bad today...the interesting French one with explicit if clauses in content models.) Viewed in the terms of Axis Expression, currently we have a choice between powerful grammars over a basic axis (DTD, XSD, RELAX NG just on the child and preceding-sibling::* axes) or very basic grammar (if we treat the Schematron elements as a kind of grammar that sequences constraints) with very powerful axes.)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format