[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Function for determining one XPath as subset of a
Thanks to everyone for their comments and suggestions :-) I am cross-posting from xquery-talk my results for anyone who is interested... Okay so for those that are interested, here is the solution that I ended up with: 1) I wrote my own XPath 2 parser as I wanted a simplified AST to start operating from. The project I am working on needs to be under GPL2 and I could not find a decent Java library that was compatible with that license - https://github.com/exquery/xpath2-parser/ 2) I then wrote some utility functions for descending through the Path expressions and making subset comparisons - https://github.com/adamretter/TEI-Completer/blob/master/src/main/java/org/humanistika/oxygen/tei/completer/XPathUtil.java#L79 Yes, this has many limitations and is most likely not complete yet, but it serves me well for the small subset of XPath path expressions that I want to support. 3) You can see in my test-cases how I consider one path expression as a subset of another path expression: https://github.com/adamretter/TEI-Completer/blob/master/src/test/java/org/humanistika/oxygen/tei/completer/XPathUtilTest.java#L40 On 26 January 2016 at 21:35, Liam R. E. Quin liam@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, 2016-01-26 at 16:15 +0000, Adam Retter > adam.retter@xxxxxxxxxxxxxx wrote: >> Given two simple XPaths, say: >> >> 1. //w >> >> 2. /x/y/z/w[@a = 'v'] >> [...] >> I wondered if there might be an algorithm or library that someone >> already had or has written which might be able to give me the answer? > > There's quite a bit of literature on optimizing XPath in the context of > databases (e.g. in proc. VLDB, in "XML-Based Data Management and > Multimedia Engineering" and elsewhere) but I don't know of a single > coherent summary. > > This may be at least in part because what's useful and practical > depends on the implementation and the situation. > > For example, a system with an element occurrence index might solve //w > in O(n_w) time, where n_w is the number of occurrences of w elements in > the corpus being searched; one using a tree structure might do better > with the explicit path. > > A path like //a//b//c//d is much harder using tree navigation, but > rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an > element index will likely run in time close to O(n_d log D) where D is > the average depth of the tree. A system smart enough to notice that > there are only three "c" elements in the corpus could do better than > O(n_d) though (my text retrieval system did/does that to speed up > multi-word phrase searches, for example). > > We encountered each of these implementation strategies during the > development of XQuery. The cost of building an element index for a > non-database-backed XPath implementation is in some systems amortized > by doing it during parsing, while the input XML is being read; systems > that run XPath against a stored or in-memory tree might not have that > luxury, or might have the even greater luxury of a precomputed index. > > I don't know of anything off the shelf, and although open source XQuery > implementations might be a place to look, it's likely that > optimizations are done on an internal representation of the XPath > expression. > >> >> I realise that I can only probably cover a subset of XPath itself, >> but >> it is only the path steps with predicates which I am interested in. >> >> Ideally I am looking for something in Java. >> > -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|