Subject: RE: Wath is the opposite of the union operator?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 22 Sep 2005 17:03:20 +0100
|
> You can expensively code
>
> A except B =>
>
> A[count(.|B) != count(B)]
>
> A intersect B =>
>
> A[count(.|B) = count(B)]
>
> Hi, Mike,
>
> If you don't mind waxing theoretical for a minute, what makes those
> comparisons expensive?
>
Let's stick to "except", the same applies to "intersect".
It's reasonably easy for an optimizer to do the simple rewrite
let $c := count(B)
return A[count(.|B) != $c]
Saxon certainly does this, other processors might not. If B is a complex
expression with no dependency on the context node, Saxon will also evaluate
B outside the loop, so this reduces to
let $b := B
let $c := count(B)
return A[count(.|$b) != $c]
But that still leaves count(.|$b) inside the loop, to be evaluated once for
each item in A. For Saxon, this means scanning the node-set $b once for each
item in $a, giving overall performance O(count(A)*count(B)). By contrast,
the union, except, and intersect operators in XPath 2.0 are evaluated in
Saxon using a sort-merge strategy, giving performance of the order
O(N*log(N) + M*log(M)) where N is count(A) and M is count(B). In fact, the
sort will in most cases not be necessary since the node-sequence will often
be in document order already, giving performance O(count(A)+count(B)).
The sort-merge approach is also more efficient in memory. When the
expressions A and B deliver nodes in document order, all three operators can
be evaluated without breaking the pipeline: that is, it is not necessary to
allocate memory to hold the values of their operands. Memory allocation (and
de-allocation, through garbage collection) accounts for a significant part
of XSLT/XPath execution cost.
Of course, it's always possible that a different XPath 1.0 processor might
recognize these constructs specially, and rewrite them to give performance
equivalent to the 2.0 operators.
Michael Kay
http://www.saxonica.com/
| Current Thread |
|
Michael Kay - 22 Sep 2005 16:03:48 -0000 <=
|
|