[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XPath 1.5? (was RE: typing and markup)
Hi Jonathan, > I think that optimization of // is a more compelling way to use > knowledge of complex types. Suppose you have a pattern like this: > > //address <niggle> I think you mean 'expression' rather than 'pattern'. If you had a pattern like that, the processor could optimise it to 'address', because // never adds any information at the start of a pattern. </niggle> > Without knowledge of the complex types involved, this requires > examination of all elements in the document to see if they are > "address" elements. Looking at the schema for a particular invoice > document, it is easy to see that the above pattern can only match > shipping or billing addresses found in customers. The optimizer can > rewrite the above pattern as follows: > > /customer/billing/address | /customer/shipping/address > > In at least some environments, this will be much more efficient to > execute. Incidentally, the user does not see whether an > implementation does this rewrite, the user only sees the increase in > speed. Implementations should feel free to do whatever static > optimizations they can, but not required to. Vendors will want to > make their implementations fast. Yes, indeed; people writing stylesheets want them to be fast as well. Mostly, though, people who understand XPath understand that using the expression //address is a surefire way to make your stylesheet take ages. Using the step-by-step paths makes the stylesheet quicker and makes it easier for someone else maintaining it to understand what addresses are actually being processed. Encouraging people to write the "easy path" means that when they come to writing a stylesheet for a markup language with no schema, or move to a processor that doesn't support this particular optimisation, they'll create stylesheets that are very slow. I'd rather have processors warn users when they spot expressions like these than have them rewrite them silently, however effectively. > The user is only affected when it comes to correctness in the use of > complex types. Let's seque to query for a second. Suppose we have > the same schema mentioned above, and the user writes the following > function: > > define function customer-address(element customer $c) > returns element address > { > $c/address > } > > Static type checking will report that $c/address evaluates to an > empty sequence, because the address element is always found in a > billing or shipping element within customer. Static type checking is > optional, but if the user asks for it, the system tells the user > what is wrong with this query. > > If the user did not do static type checking, this would be > discovered at run time, not during static analysis. Right -- which as we've discussed, is at the same time for most XSLT transformations, so this isn't a particular advantage of static type checking in XSLT's case. Currently, of course, XSLT processors will happily process such a stylesheet, returning an empty node set if an XPath doesn't locate any nodes even if, logically, they could access a DTD to provide them with information about what nodes can validly be present. There's a debate here about whether it's better to produce an unexpected result or to produce an error. XSLT has previously fallen on the side of producing an unexpected result for tests that involve the particular markup language, as opposed to the fixed types of functions or operators. I think that's because if you protect people from one error at the markup language level, they might think that you're protecting them from every error. For example, if your address type contained a postcode child element that was defined, in the schema, to match the pattern: "[A-Z]{2}[1-9] [1-9][A-Z]{2}" then doing: address[starts-with(postcode, 'W1')] could logically also give you an error. A user might ask why this doesn't raise an error, when other assertions within a schema do. The "we can tell this stylesheet will produce valid html from docbook" fallacy is just the kind of misconception that arises when you think that static type checking means you know everything about validity. There are several aspects of HTML that can't be validated by grammar-based schema languages, such as the fact that form elements shouldn't occur within other form elements at any level (a constraint Schematron can model nicely, of course). And even if, for simple languages, you could guarantee that you produce a valid document does not mean that document *makes sense* semantically. >>Especially as there are lots of *disadvantages*, such as the added >>complexity in the processors and in the spec to deal with all the >>different kinds of casting and validating of complex types. > > I would like to see more information on the added complexity people > anticipate in processors. Since static analysis is optional, it does > not give overhead if omitted. Optimization based on static analysis > is also optional, and nobody should implement an optimization that > is not more optimal. What I'm arguing is that there is an overhead for users and implementers of XPath 2.0 whether or not processors implement optimisations. Implementers have always been free to carry out whatever optimisations they want to, and with that freedom have provided quite a lot (though despite over two years of reasonably competitive development, none that I know of take the trouble of examining a DTD to provide precisely the kind of information that you claim would save so much time). In particular, support for the cast, treat, assert, and validate expressions over complex types, which require support for the technicalities in the XQuery Semantics, is a major implementation effort and an overhead in a running system. As far as I can tell, implementers can't use information from the PSVI (i.e. an existing XML Schema validator) here; but have to write their own validators in order to perform both the static and dynamic checks that are required to evaluate these expressions. As well as recording the name and type derivation of the top-level types in an XML Schema, they have to resolve the content models so that they have something against which they can check the content of the elements that they generate or query. They have to implement the analysis described in the XQuery Semantics so that they can tell whether one type is the subtype of another type, and, naturally, be able to validate an element against one of these types again (I think) using the XQuery Semantics rather than an existing XML Schema processor. That is the added complexity I am concerned about, but heck, I'm not an implementer -- maybe this is child's play. From the user perspective, we have to *understand* all this stuff so that we can work out what we have to do to make an XPath that isn't working work. Having read it several times, it's still hard for me to grasp what the difference is between 'treat' and 'assert' (though there's a vast improvement over the text in the last version), and I can't imagine the problems for new users will be that much better. I'm not against XQuery processors having their own validation model, and from the little I've seen of it the complex type checking that's provided by XQuery looks really neat. I just seriously doubt that the extra implementation and user effort is worthwhile for XPath 2.0. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|