[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XPath expression to check that there are no inter

Subject: Re: XPath expression to check that there are no intervening elements?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 20 Jul 2016 15:36:29 -0000
Re:  XPath expression to check that there are no  inter
> On 20 Jul 2016, at 09:36, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> I did some timings:
> Kay 3 - 32.18ms (ouch!)
> doc ! (let $fsm := map{
> 0: function($x) {if ($x[self::B]) then 1 else 0},
> 1: function($x) {if ($x[self::B]) then 1 else 2},
> 2: function($x) {if ($x[self::B]) then 3 else 2},
> 3: function($x) {3}
> } return
> (fold-left(*, 0, function($state, $node){$fsm($state)($node)}) ne 3))

I decided to do some analysis of this disappointing result.

It turns out that dynamic function calls are quite expensive, and this is
almost entirely due to the very dynamic nature of the type checking that is
done (so-called function coercion).

Running the above in a slightly different test environment I got a baseline
time of 41ms.

Java profiling showed an inordinate amount of time spent computing hashcodes.
It turned out that during the dynamic call of the various "function($x)"
functions, Saxon was checking that the supplied item type element(C) was a
subtype of the required type item()*, which it does by first checking this
pair of types in a cache of known type relationships, and the lookup in the
cache involved an expensive hashcode computation. Avoiding the lookup by
applying the knowledge that everything is a subtype of item()* reduced the
total time to 29ms.

I achieved a further reduction to 22ms by replacing the dynamic function call
$fsm($state) with a static call map:get($fsm, $state).

I then got this down to 13ms by avoiding the inline functions for computing
the state transition, changing the query to

declare namespace map='http://www.w3.org/2005/xpath-functions/map';
doc ! (let $fsm := map{
0: (1,0),
1: (1,2),
2: (3,2),
3: (3,3)}

return (
  fold-left(*, 0, function($state, $node){
             map:get($fsm,$state)[if ($node[self::B]) then 1 else 2]
             }) ne 3))

Even though there is now only one dynamic function call per element in the
input sequence (and this is unavoidable) the cost still seems to be dominated
by dynamic function calling.

Declaring the types of the arguments ($state, $node) made this worse, because
it increases the amount of dynamic type checking that needs to be done. This
is quite different from the usual scenario where declaring types enables the
checking to be done statically.

There is clearly a lot of scope for optimization in this area. It's actually
the first time I've looked seriously at the performance of higher order
functions in a real query.

Michael Kay

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.