|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] Release of the GCX XQuery EngineQMichael Kay mike at saxonica.comMon Feb 5 09:07:54 PST 2007
> As I see it, there are two kinds of "streaming" implementations: > - pull-based: expressions are evaluated at-need (lazily) when > and if their results are needed; or > - push-based: expressions are evaluated eagerly, but > sub-parts of the results are "pushed" to a "consumer" as they > are generated, while avoiding creating of complete "reified" > sequences, if possible. > > The "data-base-oriented" implementations seem to be > pull-based, while more "document-oriented" ones may be > push-based. At least Qexo falls into the latter category, > and my impression is Saxon does too. > My impression from your web-page is that GCX is "push-based", > so it should be comparable to Qexo and Saxon. Saxon in fact uses a mixture of pull and push, with some user control over the choice. By default, though, especially when results are being serialized, push seems to work better at present. This is because using push seems to make it easier to avoid constructing the result tree in memory. However, it's not clear to me that the distinction here is really important. The critical issue I think is whether you can process queries without building a tree representation of the *source* document in memory. Saxon currently does that in two limited cases: for the subset of XPath that's used in XML Schema integrity constraints, and for the "serial processing mode" in XSLT (which is applicable only to stylesheets that follow a very stereotyped coding pattern). There's certainly an opportunity to achieve this kind of streaming over a much wider range of queries. One of the obstactles in practice, which I haven't seen addressed in any of the academic research, is the requirement for stability. This means that if a query reads the same document more than once, it needs to get the same result (identical nodes) each time. In turn this means that if a query does doc($x) and doc($y), you can't safely avoid building the tree unless you can prove that $x and $y will be different URIs (or, perhaps, that the identity of the nodes makes no difference to the outcome). This is the kind of corner case that makes optimization in practice much harder than it is in academic theory: you're not allowed to do an optimization that benefits 99.99% of queries if it causes incorrect results for the other 0.01%. Subsetting the language in ways that don't affect the conclusions is legitimate; but ignoring parts of the language specification that have a significant bearing on the issue is not. Michael Kay http://www.saxonica.com/
|
Purchase Stylus Studio Online Today!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






