|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Are we losing out because of grammars?
First, let me say that I really appreciate this discussion. It is helping me understand schema development on a deeper level. Thank you. If I may, I'd like to give some feedback to schema language developers as a potential customer. And please don't tell me "it's free". Purchase price is a vanishingly small part of TCO. References to "DOM" below should be read as "any processing approach that requires a generalized model of the entire XML document to be memory resident". 1) Simple things should be simple. Although subjective, I think grammars win easily by this measure. 2) Scalability is important. DOM dependencies need to be explicit. AFAIK, XPath typically requires DOM. Thus, XML Schema features like "unique", "key" and "keyRef" will most likely end up w/ a DOM under the hood. The application designer/developer must be able to make an informed decision about the cost of using schema features. 3) Mr. Jelliffe's real world example is a good one. The solution can be easily implemented by a simple application using XPath+DOM. To my mind, such validation is not a schema language requirement, per se. Note that to apply these rules to some external schema (e.g. DocBook), some additional metadata will be required to decide which DocBook instances to apply the rules and which not. I.e. this processing needs to take place in the context of an application, not a generic process. Note the use of an aggregate operator "count(//news:who)". This is a job for the DOM - at least indirectly. Don't get me wrong, the DOM is useful and makes useful features like compound keys and aggregate operations (long enjoyed by SQL developers) tractable. In the SQL world, the DOM-equivalent layer is always proprietary and hidden. But SQL developers know to avoid certain features when performance is important. For example, any data mart developer knows that - although not logically necessary - summary data should be pre-calculated to get decent, predictable response times and support more simultaneous users. 4) I must respectfully disagree w/ Mr. Bullard when he says, On Fri, 02 Feb 2001 13:18:15 Len Bullard wrote: > >Yes: systems for choosing. If there is only one, >there is no ambiguity. But is that a good thing? >I think it an attractive thing to mammal brains >that strive for closure instinctively and crave >power and esteem physically, but a bad >thing for systems that reciprocally evolve environments. > I think the mammals' requirements take precedence. Systems will evolve in healthier ways when the people that write and use them don't waste a lot of what I call "organizational bandwidth" discussing arcana like ambiguity resolution algorithms. That discussion needs adequate resolution here on this list - or someplace like it. Please don't pass the buck to the users. BTW, +1 for "document order" type resolution. I think most people will find it more intuitive than "most restrictive type". I do, anyway. 5) The open/closed schema issue is probably important. However, there have been several discussions on this list lately about schema extensibility that address the issue more directly. ==== If I have gained anything from this discussion, however, it is probably that layering of rules over schema should not be an afterthought. My modest proposal follows. Layer 1 should be approved "yesterday" to allow the world to start using this stuff! Layer 1: content model + data types DOM never required. 1 pass validation and data type determination. If you can't resolve it, unambiguously, by the time the end element tag appears, it doesn't belong here. Note, required elements and attributes will, by necessity, be the loosest allowed during the life of a document (just like NOT NULL in SQL). Ancestor knowledge is ok. XML Schema supports different definitions for the same element name, based on parent element. I think this feature is overkill, but it is streamable, so ok. For XML Schema, data types also includes some basic constraints: minValue, maxValue, minOccurs, maxOccurs, list vs. scalar. These all look streamable. Are they? Layer 2: constraints and intra-doc references DOM may be required. XML Schema Identity constraints. ID/IDREF integrity checking. In theory, key selectors that choose only child elements -as in example from the XML Schema Primer- do not need DOM support. Implementations will probably vary. Layer 2 is important to allow much more compact documents (e.g. the lookup table in that same example). Layer 3: processing rules Anything that looks or acts like an "if-then-else". Aggregate operations. External doc references? Some rules may not require a DOM, but the analysis required to make the determination may cost more. Some kind of "EXPLAIN PLAN" equivalent would be necessary to let the schema designer know what he is for at runtime. A debugger would be nice, too. Thanks for reading, Charles Reitzel On Fri, 02 Feb 2001, Rick Jelliffe wrote: >From: James Clark <jjc@j...> > >>Whilst I think the approach used by Schematron is an valuable >>complement to grammar based schemas (obviously I'm personally >>delighted to see XPath getting used for validation), I really >>find it very hard to take seriously the idea that the time has >>come to completely discard grammars in favour of path-based rule >>systems. > >XPath and XSLT are great. > >>Let's take a really simple example: >> >><!ELEMENT a (b?, c)> >><!ELEMENT b (#PCDATA)> >><!ELEMENT c (#PCDATA)> > >>or as a TREX pattern: >> >><element name="a"> >> <optional> >> <element name="b"> >> <anyString/> >> </element> >> </optional> >> <element name="c"> >> <anyString/> >> </element> >></element> > >If efficiency and terseness is the criteria, what about: Efficiency: yes. Terseness: only up to a point. I would suggest efficiency, clarity and maintainability as criteria. ><pattern> > <rule context="a"> > <assert test= > "b[1][next-sibling::c[position()=last()]] or >c[1][position()=last()]" /> > </rule> > <rule context="b[* or @*] | c[* or @*]"> > <report test="1=1" >Should be empty.</report> > </rule> ></pattern> > >This has 5 functioning elements compared to TREX's 6. It only >requires looking at the first child. (This is an example of >elaborating each path, which is nasty for larger rules.) But >it is not particularly the way I'd envision people will use >Schematron. Q: Will a Schematron implementation actually look at the rule set and decide whether or not to load a DOM? >This can be pretty printed to give a very direct list of rules >about the schema. I note again that the comprehensibility of a >schematron schema comes not from its paths (though often these >are simple) but because there is a pretty direct path for making >everything explicit in simple natural language statements. If >one element can follow another, we can explain "why". Can you show an example of such pretty printing? >But lets try a different example, quid pro quo. This is a real >one, coming from discussion on how to mark up news stories. >The client gave the following requirement: > "Every news story must have elements to mark up who, what, > where, when and how. There must be one and only one of > each in every story. They can appear anywhere." > >This requirement is very easy to express in words. It is also >trivially easy to express in Schematron: > <pattern> > <rule context="/"> > <assert test="count(//news:who)=1 and > count(//news:what)=1 and > count(//new:where)=1 and > count(//news:when)=1 and > count(//news:how)=1" > >Every news story must have elements to mark up who, > what, where, when and how. There must be one and > only one in every story. They can appear > anywhere.</assert> > </rule> > </pattern> > >One can take these constraints and add them to any schematron >schema without change and it will work. (If the other schema is >closed, then that is an internal inconsistency, which is a >different matter. However, schematron schemas are open by default.) > >Lets say we add this to a schematron schema for full DOCBOOK. The >addition in Schematron is just a single rule, and it will fit in >with all the constraints already in place. It seems to me that >this would cause a grammar-based schema language to explode, if >it could cope at all: XML Schemas could not cope (if the schema >had used <or> groups, and we have to assume that there are already >"<any>" wildcards in place so these new elements are allowed >anywhere.)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








