[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: We need an XPath API
On Sat, 03 Mar 2001 David Megginson wrote: >Charles Reitzel writes: > > Proposal: let's give XPath the SAX treatment. > >I'd actually recommend giving XPath the DOM treatment. Agree. I actually meant "SAX Treatment" in the process sense, rather than from an API design POV. On Sat, 03 Mar 2001 Thomas B. Passin wrote: >We need some requirements engineering here. Especially, >what would the API be used for? There have been a bunch of interesting responses to this idea. Let me briefly respond to see if I have captured concepts accurately. I've also added some comments about the DOM and SAX helpers. I'll take a deeper pass on the interfaces in the next couple days, starting w/ a cross-reference between CSS2 selectors and conditions to XPath expressions. Please let me know if I have misunderstood or otherwise misrepresented anyone's intent. take it easy, Charles Reitzel ============================================================ Feedback: 1) Give it the DOM treatment, rather than SAX treatment On Sat, 03 Mar 2001, David Megginson wrote: >I'd actually recommend giving XPath the DOM treatment. >Well, not really DOM, but maybe a cleaner, in-memory tree. >XPaths (even hairy ones) are extremely small, and the >same path object is likely to be reused many times, so I >see no need to force the pain of an event-based interface >on users (unless someone thinks we're going to >be seeing gigabyte-long XPath expressions). I don't see the need for callbacks at all at the expression level. Better just to slurp up a string and be done with it. XPath expressions appear as either attribute or element values. So even dealing with an InputSource seems unnecessary in an early version. This relates to SAC (see below) as well, In that SAC allows for registering handlers, etc. Rather, I see XPathExpr as objects that could be instantiated in SAX event handlers or from a DOM node element and used to lookup the datatype in the schema. I.e. these are lightweight, possibly transient objects. Note this applies only at the expression level and not to DOM or SAX helpers, per se. 2) SAC as prior art from Simon St. Laurent and Robin Berjon On Sat, 03 Mar 2001, Simon St. Laurent writes: >Just for prior art, there's a Simple API for CSS. Don't >know if there'd be any overlap at all, but you never know: >http://www.xmlhack.com/read.php?item=685 >http://www.w3.org/TR/SAC/ Yes, clearly there is some conceptual overlap. My reading of CSS1 and CSS2 shows no references to XPath, however. So, we have two independent W3C XPath-like syntaxes (hmmph). Perhaps the big difference is the HTML legacy baggage. It also seems that CSS2 is not used much compared to XSLT+XPath. I think SAC should be scanned for condition and selector types. It is probably worthwhile to list CSS2 <-> XPath1.0 equivalents. Also, the representation of XPath expression parts in my posting was clearly weak. Emulating the SAC Selectors, Conditions and their respective factories looks good. Perhaps we could even use these interfaces directly. I'm wary of unwanted dependencies, however. To be clear, this is for XPath, not CSS. On Sun, 04 Mar 2001, Robin Berjon wrote: >I've been down the "XPath OM" path before by hacking an >interface onto Matt Sergeant's XML::XPath module. It can >be very useful, but not as useful in some contexts as a >builder callback style interface. Converting an object >model into another is often harder than simply handling >builder events. That's why CSS has SAC and DOM2 interfaces. >Both are useful, but anyone wishing, say, to build a custom >selector object to get elements out of his own type of tree >will probably use SAC. I think we may have a different use case here for CSS that is unlikely to apply to XPath. When pulling in an entire CSS stylesheet, I can see the sense of the callback approach. But I don't know if anyone will parse documents consisting of XPath expressions only. They are typically found as attribute or element contents within XML. So parse the document however you will and, when you encounter an XPath expression, construct an XPathExpr object and work with it. Unlike a DOM, these objects should be small enough that there is little, if any, wasted effort. Also, lazy evaluation is always an option. 3) DOM3 issues from Mike Champion This starts getting interesting. I didn't get a chance to digest all of the issues in detail. But I think a good guiding principle is, perhaps, "This is XPath not DOM, CSS, et al." (non-sequiter aside: anyone know "This is Boston, not L.A.?") 3a) Namespaces On Sat, 03 Mar 2001 17:58:47, Mike Champion wrote: >- A disagreement whether to do something minimal (a la >Microsoft selectNodes) or a fully-functional XPath API. >Not surprisingly, the minimal solution falls afoul of >namespaces in all sorts of nasty ways; MS has some >workarounds, but they are not terribly elegant. A fully >functional solution requires some mapping between the >XPath and DOM conceptions of a namespace declaration. I don't know if this helps, but the QName doesn't need resolving until the the XPath expression is actually evaluated. I.e. you can parse the expression, which would probably only include NS prefixes (or not). At evaluation time, the NS URI for the prefix is a moving target. You can't forget about the default NS, either. The exact handling of this evaluation will, of course, be different when looking in a DOM or responding to a SAX callback. The Apache SOAP NSStack idea might be helpful here. 3b) XPath vs. DOM Data Model >The wretched inconsistency between the DOM data model and >the XPath data model. DOM "trees" can have CDATA nodes, >adjacent Text nodes, entity reference nodes (and maybe some >other rot) that is transparent to XPath. So, an XPath >expression can point at something that is not neatly aligned >on DOM Node boundaries ... so what should a NodeList or >NodeIterator returned by an XPath expression do? I'd say dish up the XPath data model when returning nodes in a DOM matching an XPath expression. Combine nodes as needed. If the original nodes are needed, get them via DOM calls. I don't understand yet how an XPath expression can point to something "not neatly aligned on DOM Node boundaries". To hazard a guess, is it be related to unexpanded external entities? In which case, "you can't get there from here" may be a reasonable answer from the library. 3c) Live NodeLists >The obvious thing for something like selectNodes to >return would be a NodeList, but keeping this in synch >with the XPath expression as the underlying DOM tree is >edited is non-trivial. NodeIterators are probably a >better idea, but they are less familiar and less widely >supported, and still have some "liveness" semantics that >might be problematic here ... not sure. Just use iterators. I can't think of an implementation language that doesn't support some form of iterator (Java, C, C++, Perl, JavaScript, Visual Basic, Python?). E.g. a list or vector in most script languages is just a list of references to the live object. Iterator staleness is a problem w/ all query result sets. I.e. the database row can get deleted out from under the cursor. A set member can be deleted, leaving a dangling reference in a iterator. There are no perfect solutions to this problems and developers all learn about it after they stub their toes a few times. One small step further, a Java implementation should use the Java 2 Collections. They are compatible w/ Java 1.1.8 (available in a separate JAR file) and provide much improved synchronization options (including stale iterator detection when used in a synchronized block - aka fast-fail). 4) "Just use SAX" from Eric van der Vlist and Sean McGrath In theory, you could generate an XML equivalent to the XPath expression and parse that. Question, how do you generate the XML? I think you'll need to parse the XPath first. Better to define an internal representation of the XPath expression and parse/emit any supported syntax. If that syntax uses XML, then SAX is a great implementation strategy, but not a useful API for XPath expressions by itself. Certainly, an XML syntax is not the highest priority. If it gets at all controversial, better to scrap it. (Ducking pie thrown by Jonathan Robie). 5) Need XPointer Support On Sat, 03 Mar 2001 Thomas B. Passin wrote: >1) Parse and process the XPointer syntax. This would be > useful for developers to create XPointer applications > and toolkits. > >2) Return node-sets. This is more like a query capability, > and would be more useful for application writers. > >3) Construct XPointer expressions based on some existing > tree (fragment). > >4) Construct XPointer expressions based on a schema > (fragment?) Doesn't XPointer just use XPath? In which case, the lib should be able to do these things. I guess this starts getting into XSLT and XPointer-specific extensions to XPath. This probably calls for a couple SAX-style extension identifier URNs. So an app can say "I need XSLT 1.1 XPath extensions" and the parser can say yes or no. 6) DOM/SAX Wrappers I have also used Matt Seargent's XML::XPath module as well - with excellent results. It's a real nice module. It is also what triggered my original posting. Specifically, I could related directly to Joe English's comment about mismatched data structures making it tricky to combine modules. Parsing XPath expressions shouldn't be terribly difficult, but you've got to get it right. It is worth putting in a module by itself. Once you have it, you need to be able to use it in different contexts, such as accessing a DOM or extracting data from a stream. For applications that build their own objects from an XML document, a SAX-based approach seems best. How to use XPath in this circumstance? My idea was to register a set of XPath expressions that identify objects of interest. When the XPathFilter encounters elements and attributes that match one of the expressions, it will make the appropriate ContentHandler call. The only difference from a vanilla ContentHandler is that it seems necessary to pass back to the application the expression that was matched, so the app has a clue what to do with it. For applications that use the DOM, then I think XPath makes a highly useful extension to the existing traversal functions. It is, if you will, a baby, unoptimized query language. It is an open question, however, if the existing DOM traversal functions are sufficient to resolve all XPath expressions.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|