[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Function for determining one XPath as subset of a

Subject: Re: Function for determining one XPath as subset of another
From: "Adam Retter adam.retter@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 21 Feb 2016 19:59:59 -0000
Re:  Function for determining one XPath as subset of  a
Thanks to everyone for their comments and suggestions :-) I am
cross-posting from xquery-talk my results for anyone who is
interested...

Okay so for those that are interested, here is the solution that I
ended up with:

1) I wrote my own XPath 2 parser as I wanted a simplified AST to start
operating from. The project I am working on needs to be under GPL2 and
I could not find a decent Java library that was compatible with that
license - https://github.com/exquery/xpath2-parser/

2) I then wrote some utility functions for descending through the Path
expressions and making subset comparisons -
https://github.com/adamretter/TEI-Completer/blob/master/src/main/java/org/humanistika/oxygen/tei/completer/XPathUtil.java#L79
Yes, this has many limitations and is most likely not complete yet,
but it serves me well for the small subset of XPath path expressions
that I want to support.

3) You can see in my test-cases how I consider one path expression as
a subset of another path expression:
https://github.com/adamretter/TEI-Completer/blob/master/src/test/java/org/humanistika/oxygen/tei/completer/XPathUtilTest.java#L40


On 26 January 2016 at 21:35, Liam R. E. Quin liam@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 2016-01-26 at 16:15 +0000, Adam Retter
> adam.retter@xxxxxxxxxxxxxx wrote:
>> Given two simple XPaths, say:
>>
>> 1. //w
>>
>> 2. /x/y/z/w[@a = 'v']
>> [...]
>> I wondered if there might be an algorithm or library that someone
>> already had or has written which might be able to give me the answer?
>
> There's quite a bit of literature on optimizing XPath in the context of
> databases (e.g. in proc. VLDB, in "XML-Based Data Management and
> Multimedia Engineering" and elsewhere) but I don't know of a single
> coherent summary.
>
> This may be at least in part because what's useful and practical
> depends on the implementation and the situation.
>
> For example, a system with an element occurrence index might solve //w
> in O(n_w) time, where n_w is the number of occurrences of w elements in
> the corpus being searched; one using a tree structure might do better
> with the explicit path.
>
> A path like //a//b//c//d is much harder using tree navigation, but
> rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an
> element index will likely run in time close to O(n_d log D) where D is
> the average depth of the tree. A system smart enough to notice that
> there are only three "c" elements in the corpus could do better than
> O(n_d) though (my text retrieval system did/does that to speed up
> multi-word phrase searches, for example).
>
> We encountered each of these implementation strategies during the
> development of XQuery.  The cost of building an element index for a
> non-database-backed XPath implementation is in some systems amortized
> by doing it during parsing, while the input XML is being read; systems
> that run XPath against a stored or in-memory tree might not have that
> luxury, or might have the even greater luxury of a precomputed index.
>
> I don't know of anything off the shelf, and although open source XQuery
> implementations might be a place to look, it's likely that
> optimizations are done on an internal representation of the XPath
> expression.
>
>>
>> I realise that I can only probably cover a subset of XPath itself,
>> but
>> it is only the path steps with predicates which I am interested in.
>>
>> Ideally I am looking for something in Java.
>>
> 



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.