Re: Question for the XPath and DOM folks
> 7/20/2002 4:28:44 PM, Uche Ogbuji <uche.ogbuji@f...> wrote: > > >> > >> "The XPath model relies on the XML Information Set [XML Information set] > >> ands represents Character Information Items in a single logical text node > >> where DOM may have multiple fragmented Text nodes due to cdata sections, > >> entity references, etc. Instead of returning multiple nodes where XPath sees > >> a single logical text node, only the first non-empty DOM Text or > >> CDATASection node of any logical XPath text will be returned in the node > >> set. > > > >Yikes! This is a *very* *very* bad job. Luckily that spec is still a WD and > >I hope they'll fix it before release. If they can't do better than that then > >they should just leave DOM/XPath interaction to application specifics. > > Well, it was a VERY VERY VERY bad job for the "W3C" (if one can think of it > as a unified entity rather than a collection of working groups made up > of competitors, loosely coordinated by the staff and director) > to have created the situation where there are multiple, > inconsistent data models defined by various XML-related Recommendations. Agreed. Of course I shudder to think of how the simplicity and elegance of XPath would have been marred by a full-blown DOM model. I thin DOM should have done what SAX did. Level one handles the 80/20: elements/attributes/text. Add everything else as optional higher levels. Most E/A/T-based specs have compatible data models, so I suspect there would have been less problem. > More importantly, the W3C has learned from this mistake, and I don't > think it would happen under the current organization and process. I suppose this "learning" is what is causing the XQuery/XPath meld? If so, I'm not sure there is a net gain from learning such a lesson specs influence each other to become *more* complex. > In the long run my personal (and official corporate, FWIW) position is that the > data models MUST be reconciled, even at the cost of some backwards incompatibility. > ("Re-breaking the bone so that it can heal cleanly" is my favorite metaphor here). > In the short run, it's not at all clear what is to be done. I/we do not want > to hold DOM Level 3 hostage to this, however, because it could take awhile .... > > DOM Level 3 provides basically 2 ways to deal with this: Load-time options to > create an "InfoSet" view with no CDATA sections and unexpanded entity references, This is a good start. Basically, it's adding the more modest profile I talk about above, but ex post facto. Not ideal, but the best of a bad pick. It does sharply reduce my objection to XPath/DOM. I as an implementor would probably nd up *mandating* this Infoset view on DOMs on which the user chooses to call XPath APIs. > and the XPath interfaces to allow one to essentially translate between the XPath > view of a document and the DOM view of a document. The key point is that an > XPathResult doesn't return "a" node, it returns a way of iterating across the > DOM view of the nodes corresponding to the XPath view of the nodes. > That's what the "manually gather" bit means here: Ah. Mike Olson, who has recenty been looking into DOML3/XPath (he's always been our DOM champion) mentioned this, and he was very excited about it. It translates *extremely* well to Python 2.2, where you can end up just returning generator objects from the XPath data model, with *huge* efficiencies to be gained thereby. > >> Applications using XPath in an environment with fragmented text nodes > >> must manually gather the text of a single logical text node possibly from > >> multiple nodes beginning with the first Text node or CDATASection node > >> returned by the implementation." > > Just to make life more interesting, there's a couple more issues to wrestle with: > how to map the XPath "nodes have a namespace property" view onto the DOM "namespace > declaration nodes upwards in the tree define the namespace a node is in" view; Yeah. Fix DOM. It's broken. DOM L2's treatment of namespaces is one of the most egregious examples of design-by-committee that I've ever seen. I can almost reproduce the committee discussion in my head by reading the spec. Member A says "damn it! Namespaces are local properties of the node. Period". Member B says "Damn it! We signed up to handle XML 1.0 not thei newfangled namespaces thing. Namespace declarations are just special attributes that can be interpreted with the XMLNS pixie dust semantics by apps, if they so choose." The committeee is deadlocked, so both views, regardless of the fact that they are contradictory, are directly supported. The result is DOM Level 2. Bah! I must say, Mike. Knowing you, I bet you fought for a simpler resolution than what resulted. This is one example of why I'd like the W3C veil of secrecy abolished. I want to be able to flame whoever led us into this mess. Of course, this is precisely why the veil is never likely to be lifted ;-) > and > how to deal with the fact that the XPath data model is in flux. Again simple. Stick to XPath 1.0. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Track chair, XML/Web Services One Boston: http://www.xmlconference.com/ The many heads of XML modeling - http://adtmag.com/article.asp?id=6393 Will XML live up to its promise? - http://www-106.ibm.com/developerworks/xml/li brary/x-think11.html
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format