[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XPath 1.5? (was RE: typing and markup)
From: "Jonathan Robie" <jonathan.robie@d...> > Are you still startled? If so, I'm still listening... Probably. I am a baby bunny transfixed by your sudden headlights :-) When people talk about Schematron, they often seem to think it is just a matter of simply evaluating a single XPath expression to boolean in some way. For example, Examplotron provides an assertion mechanism, but I don't consider it remotely similar to Schematron just because of that. XML Schemas uses simplified XPaths for key-checking, but it is no Schematron. Francis and Eddie's Schematron-embedded-in-XML-Schemas allows some kinds of simple Schematron schemas to be embedded, but even it is certainly not full Schematron. Just because an XPath returns empty on fail is not the minimum requirement. A Schematron schema has four parts: 1) Phases 2) Patterns and rules 3) Assertions (inside rules) 4) Diagnostics For the phases, there is no impact with XPath. What you are saying is that there are only Pre-Schema Validation InfoSets and Post-SchemaValidationInfosets, and that therefore all my Schematron schemas should be either for the former or for the latter. So phases can provide a way to cope with this, because I could, presumably, make one phase for use on a PreSVI and one for a use on the PostSVI. However, what about when someone constructs a document by adding a branch from a PostSVI to a PreSVI, perhaps using some XInclude implementation? Does the whole tree become Pre or Post? For the Schematron user, they could just make a phase to cope with some BastardSVI happily, but not if the XQuery required only PreSVI or PostSVI. Or is it that there can never be a hybrid infoset? Or do we have to re schema-validate any branch added (including all key/keyref checking)? For Patterns and rules, there is a definite impact. Currently, each pattern would typically be implemented as a separate pass through the document. Perhaps a really nice XSLT implementation might just use one pass, and evaluate the rules (and assertions) during that single pass, but I doubt it: having a more optimizable XPath might result in better performance in this regard. But within patterns, rules are evaluated lexically, with the first rule whose context describes the current nodes' context being the one used. One kind of rule that can be used is a guard rule, where first we test whether some bad case has happened, so the subsequent assertions are OK. It is an important aspect in the design of rule-based languages to make case statements implicit in various ways (either by lexical ordering, or by assigning priorities). When we test for the bad case, we really want to test it, not have some other system tell us it is impossible. For example, lets consider the following case: <rule context="caseRef[@idref]"> <assert test="(//*[@id= current()/@idref]) = 1" >A caseRef should reference one element</assert> <assert test="//*[@idref=current()/@idref]/self::case" >A caseRef should only reference a case</assert> </rule> If the query system believes that there is no way there can be an @id element on some element, it will not actually test it. For example, if the schema validation failed for a branch, can we expect that an optimizer will be smart enough to say "oh, they still need access to it, I shouldn't optimize away their query in that regard"? Unimaginable. Within assertions, each assertion is evaulated without regard to another. However, assertions can just as easily be negative as well as positive: as well as <assert test="x">A <name/> must have an x</assert> you can have <report test="x">A <name/> should not have an x</assert> On the one hand it would be nice if assertions that could be guaranteed to fail were never tested, but for a validation language that completely fails a fundamental principle of validation: you don't accept the judgement of another component when you can test yourself. For example, you want to check for errors introduced by schema validation: has an attribute value on a local element been defaulted correctly. It is an utter database-ism to say that "well, the schema says it must have defaulted correctly, because not other value can have been loaded." Documents are obviously not like that. The fundamental purpose of a validation language is to detect errors for whatever reason, not limited to only errors that assume that every other link in a chain has worked properly. Finally, for the diagnostics part of Schematron. This is a really essential part of Schematron: the ability to generate useful messages dynamically, reporting on what has been found. If XPath2 always returns null from a PSVI because a path I am asking for is supposedly impossible, it renders XPath2 useless for diagnostics. As I said, I completely disagree with the idea that Schematron should only validate constraints that have not been validated by a previous stage: that would mean I can have rules that are never checked, and the results of validating could be quite misleading. The schema says one thing, the implementation tests another. Second finally, it is entirely possible that a PSVI can have been constructed using one schema, and a query run using a different version of the schema. Unless there is some mechanism for guaranteeing that the same schema has been used (*not* the same namespace, *not* the same resource or URL, but the same constructed schema) general purpose validation tools need to be able to test whether "impossible" things have occurred. And finally, I do not believe that optimizing a query by default is in fact in accord with the XML Schemas recommendation. As Part 1 of the Schema Spec says "schema validity is not a binary predicate." For a start, [validty] and [validation-attempted] are properties of nodes. A PSVI does not only include valid elements, it also includes invalid elements. For XPath2 to make available PSVI augmentations is one thing, but you seem to be requiring that [validation-attempted] is always full and [validity] is always valid. Document level [validation-attempted] will not be "full" if any component is not retrieved. We may be wanting to validate a PSVI some time after it has been created, when there are no validation messages, and we want to get a scope of the problem or conformance. So I think XPath2 implementations will need to report whether they are set to (screw things up by) optimization and whether they allow this to be changed. I understand that it is reasonable to not overload a DBMS by asking impossible paths, but a validation tool is interested in "impossible paths". QA is concerned with testing what is, not with accepting what someone else claims is. Whenever XPath2 is optimized to use Schema information to cull "impossible" paths, the application is prevented from rationally handling errors or assessment failures. For example, if a component of a schema is to be accessed by URL (by a fully conforming implementation), and becomes unavailable, I would like to be able to have a fallback plan, to handle the element (which would be marked [validation-attempted]=no) generically. That element is still in the PSVI. I still need to access the invalid elements in the PSVI to handle them rationally. Any system that strips out "impossible" elements during schema validation is non-conforming and should be ignored from any consideration in XPath (and XQuery). A PSVI is not constrained to only have valid information items, and all W3C tools which need to access the PSVI must not assume validity. This is particularly true of XPath2. In contrast, XQuery, if it is not really concerned with XML documents by only valid PSVIs, should clearly state that, and the Query group should be very careful not to let DBMS-assumptions (such as PSVI validity or data/reference integrity) that are good for Queries colour what XPath2 and its clients have to deal with. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|