[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Requirements on Optimizers [was ANSWERS to "What's wrong with XQuery" question]

Michael Kay mike at saxonica.com
Sun Jul 25 00:22:27 PDT 2010


  Requirements on Optimizers [was ANSWERS to "What's
	wrong with XQuery" question]
> To be more precise, the lower bound should be predictable over
> different implementations - "must be at least this good, but if you
> can do it better, great.
>
> It's an important difference, because enforcing the same performance
> would inhibit creative optimizations, and that's the last thing I
> want. In that respect, the degree of freedom that XQuery spec gives to
> the optimizer is a point in favor of the language. It just needs to
> mandate a particular reasonable minimal complexity for certain
> operations.
>
> Maybe the best way to do it would be to look at the existing
> implementations, and see if there are any points on which they all
> agree. E.g. for something like this:
>
>     (for $i in 1 to 1000000 return $i * $i)[5]
>
> If it turns out that all existing implementations out there guarantee
> that at most 5 elements in (1 to 1000000) will actually be evaluated
> when computing this - which I strongly suspect to be the case - then
> perhaps it should be the minimum mandated by the spec. If some
> implementations can optimize this further, and simplify this down to
> 5*5, they remain conformant, of course.
>
>    

I invite anyone to attempt this, but I think it will be extremely 
difficult. Anyone who's seen how much trouble we've had with the "errors 
and optimization" section in the spec would be wary of writing a 
"performance and optimization" section that actually makes testable 
assertions that implementations are capable of conforming to. (And if 
it's not a testable assertion, then it has no place in the language spec.)

Statements like "will actually be evaluated" turn out to be rather 
difficult to test, and they aren't always meaningful in the context of a 
particular implementation. What do you do, for example, if the 1000000 
is actually $N, and [5] is actually [$M], and $N is known statically, 
and the implementation decides to evaluate

(for $i in 1 to $N return $i * $i)

at compile time into an array? You then get constant performance 
(independent of $M or $N) at run-time but a higher cost at compile time 
and more memory used by the compiled code. Is the spec really going to 
say whether or not that's an acceptable strategy for implementors to adopt?

I agree predictability of performance is a very significant problem - 
not just across products, but within a single product. Part of the 
answer, I think, is to make performance less reliant on good 
optimization. In XSLT, the key() function goes a long way towards this: 
by giving programmers a tool to control when indexes are built and used, 
performance of many join constructs becomes much more predictable even 
though the spec doesn't actually mandate how key() is implemented 
(no-one would buy a product that implemented it as a serial search). 
I've always felt that the anathema felt in the database query community 
towards such constructs is misplaced - alhough it's great when 
optimizers are good enough that they aren't needed, I've seen 
programmers tearing their hair out trying to second-guess the optimizer, 
and in such cases it's not clear we're doing programmers a service.

Michael Kay




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.