|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] Requirements on Optimizers [was ANSWERS to "What's wrong with XQuery" question]Michael Kay mike at saxonica.comSun Jul 25 00:22:27 PDT 2010
> To be more precise, the lower bound should be predictable over > different implementations - "must be at least this good, but if you > can do it better, great. > > It's an important difference, because enforcing the same performance > would inhibit creative optimizations, and that's the last thing I > want. In that respect, the degree of freedom that XQuery spec gives to > the optimizer is a point in favor of the language. It just needs to > mandate a particular reasonable minimal complexity for certain > operations. > > Maybe the best way to do it would be to look at the existing > implementations, and see if there are any points on which they all > agree. E.g. for something like this: > > (for $i in 1 to 1000000 return $i * $i)[5] > > If it turns out that all existing implementations out there guarantee > that at most 5 elements in (1 to 1000000) will actually be evaluated > when computing this - which I strongly suspect to be the case - then > perhaps it should be the minimum mandated by the spec. If some > implementations can optimize this further, and simplify this down to > 5*5, they remain conformant, of course. > > I invite anyone to attempt this, but I think it will be extremely difficult. Anyone who's seen how much trouble we've had with the "errors and optimization" section in the spec would be wary of writing a "performance and optimization" section that actually makes testable assertions that implementations are capable of conforming to. (And if it's not a testable assertion, then it has no place in the language spec.) Statements like "will actually be evaluated" turn out to be rather difficult to test, and they aren't always meaningful in the context of a particular implementation. What do you do, for example, if the 1000000 is actually $N, and [5] is actually [$M], and $N is known statically, and the implementation decides to evaluate (for $i in 1 to $N return $i * $i) at compile time into an array? You then get constant performance (independent of $M or $N) at run-time but a higher cost at compile time and more memory used by the compiled code. Is the spec really going to say whether or not that's an acceptable strategy for implementors to adopt? I agree predictability of performance is a very significant problem - not just across products, but within a single product. Part of the answer, I think, is to make performance less reliant on good optimization. In XSLT, the key() function goes a long way towards this: by giving programmers a tool to control when indexes are built and used, performance of many join constructs becomes much more predictable even though the spec doesn't actually mandate how key() is implemented (no-one would buy a product that implemented it as a serial search). I've always felt that the anathema felt in the database query community towards such constructs is misplaced - alhough it's great when optimizers are good enough that they aren't needed, I've seen programmers tearing their hair out trying to second-guess the optimizer, and in such cases it's not clear we're doing programmers a service. Michael Kay
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

![Requirements on Optimizers [was ANSWERS to "What's
wrong with XQuery" question]](/images/get_stylus.gif)




