[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Function for determining one XPath as subset of a

Subject: Re: Function for determining one XPath as subset of another
From: "Liam R. E. Quin liam@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 26 Jan 2016 21:35:25 -0000
Re:  Function for determining one XPath as subset of  a
On Tue, 2016-01-26 at 16:15 +0000, Adam Retter
adam.retter@xxxxxxxxxxxxxx wrote:
> Given two simple XPaths, say:
> 
> 1. //w
> 
> 2. /x/y/z/w[@a = 'v']
>B [...]
> I wondered if there might be an algorithm or library that someone
> already had or has written which might be able to give me the answer?

There's quite a bit of literature on optimizing XPath in the context of
databases (e.g. in proc. VLDB, in "XML-Based Data Management and
Multimedia Engineering" and elsewhere) but I don't know of a single
coherent summary.

This may be at least in part because what's useful and practical
depends on the implementation and the situation.

For example, a system with an element occurrence index might solve //w
in O(n_w) time, where n_w is the number of occurrences of w elements in
the corpus being searched; one using a tree structure might do better
with the explicit path.

A path like //a//b//c//d is much harder using tree navigation, but
rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an
element index will likely run in time close to O(n_d log D) where D is
the average depth of the tree. A system smart enough to notice that
there are only three "c" elements in the corpus could do better than
O(n_d) though (my text retrieval system did/does that to speed up
multi-word phrase searches, for example).

We encountered each of these implementation strategies during the
development of XQuery. B The cost of building an element index for a
non-database-backed XPath implementation is in some systems amortized
by doing it during parsing, while the input XML is being read; systems
that run XPath against a stored or in-memory tree might not have that
luxury, or might have the even greater luxury of a precomputed index.

I don't know of anything off the shelf, and although open source XQuery
implementations might be a place to look, it's likely that
optimizations are done on an internal representation of the XPath
expression.

> 
> I realise that I can only probably cover a subset of XPath itself,
> but
> it is only the path steps with predicates which I am interested in.
> 
> Ideally I am looking for something in Java.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.