[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: I have implemented SAX based XPath Engine
Michael Kay a écrit : >> I have implemented few years ago such a tool, the global >> design is described here: >> http://reflex.gforge.inria.fr/saxPatterns.html > > Thanks, I had seen this article before but it was worth rereading. > > I'm puzzled by the concept of keeping exactly four counters. For an > expression like > > child::p[@a=17][5] you're right, and grouping is not supported either: this one should work: /foo/*/*[last()] but this one doesn't :( (/foo/*/*)[last()] > you need a counter of how many nodes have satisfied the @a=17 predicate. I > would have thought it was better to generalize this so that a counter is > always relative to a predicate, for example p[17] is treated as > child::node()[self::p][17] and the counter is relative to the predicate > [self::p]. great idea ! I will correct my documentation about those unsupported cases > Doing this using threads does look like a recipe for adding performance > overheads. (Saxon's streaming sometimes uses two threads, but never more). I use 2 threads for each filter, and in a pipeline with n filters, there will be n+1 threads I > would have thought the great advantage of using a push model is that you can > push the same events to multiple destinations without the use of multiple > threads. Saxon does this when validating instance documents against XML > Schema identity constraints, where any number of constraints can be active > at any time. I agree with your comment, however, that the ideal would be to > compile some kind of finite state machine. That would work, of course, > regardless of whether the evaluation is in push or pull mode. > > The XSL WG has been doing work on streaming over the last year or so (hence > my interest). There's nothing available to publish yet, but we've made some > good progress and have built up quite a substantial collection of use cases. > We've been trying to define constructs that are amenable to "pure streaming" > without any buffering or lookahead, but it's difficult to draw the > boundaries. Very interesting, I look forward to read all that ! Best regards, Philippe > > Michael Kay > http://www.saxonica.com/ > >> It is designed to filter SAX streams with XPath-based >> patterns, but has also useful filters to process text >> (non-XML) inputs: >> http://reflex.gforge.inria.fr/tutorial-pipelinesAndFilters.html >> >> XPath is rather well supported: position() and last() are >> supported but it doesn't support preceding:: and >> preceding-sibling:: axis except in very few circumstances, >> but I also propose a workaround when filtering a SAX stream >> (juggling with a local DOM subtree when necessary). >> >> To answer to Michael about how predicates are evaluated when >> reading forward is required, the engine uses a lookahead >> buffer and goes on reading until the actual predicate becomes >> evaluable; for that purpose, as explained in the article, the >> engine uses coroutines that are implemented using threads >> (one to evaluate the predicate, the other to hold the >> position in the call stack of the current startElement() >> event); I think that a finite state machine based on a pull >> parser would be much more efficient: although the stuff works >> somewhat well, I have noticed that it runs slowly when I use >> lots of XPath patterns in a pipeline made of lots of filters, >> and it can be an issue when reading GB of XML. I also know >> that things here and there have to be optimized, for example >> instead of evaluating the entire set of XPath patterns on >> each event, I could recognize that a subset is irrelevant for >> a given branch and I should discard them in that branch (but >> currently it doesn't work like that); there are also things >> to do better about partial evaluation specifically when >> comparison operators are involved, for example [count(foo)>9] >> should exit when 10 <foo>s are met rather than when the >> 1000000 specimen are read. I have imagined a strategy where >> the count() function should return something different than a >> number, a numeric object evaluable several times by the >> operator that could fetch more data on demand, until the >> NumberThatIsAtLeast object reach (or not) the expected value. >> Lots of work in sigth. >> >> Of course I will have a look at Santhosh's work :) >> >> -- >> Cordialement, >> >> /// >> (. .) >> --------ooO--(_)--Ooo-------- >> | Philippe Poulard | >> ----------------------------- >> http://reflex.gforge.inria.fr/ >> Have the RefleX ! > -- Cordialement, /// (. .) --------ooO--(_)--Ooo-------- | Philippe Poulard | ----------------------------- http://reflex.gforge.inria.fr/ Have the RefleX !
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|