[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Question for the XPath and DOM folks
> Given the following XML in a DOM document > > <foo> > bar > <![CDATA[ > baz > ]]> > quux > </foo> > > and the following XPath > > //text() > > what should be the resulting DOM nodes and why? I can think of two answers but they both have problems. > > PS: Why is http://www.w3.org/TR/2002/WD-DOM-Level-3-XPath-20020712/ returning a 404 when it is linked from http://www.w3.org/DOM/ ? > XPath is defined against a certain model of an XML document. The section that answers your question is 5.7: "Character data is grouped into text nodes. As much character data as possible is grouped into each text node: a text node never has an immediately following or preceding sibling that is a text node. The string-value of a text node is the character data. A text node always has at least one character of data. "Each character within a CDATA section is treated as character data. Thus, <![CDATA[<]]> in the source document will treated the same as <. Both will result in a single < character in a text node in the tree. Thus, a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of < and & were replaced by < and & respectively." Therefore to a conforming XPath processor, <foo> bar <![CDATA[ baz ]]> quux </foo> Is precesely the same as <foo> bar baz quux </foo> i.e. one element node with one text node child. There is actually an open bug against 4XPath right now that it leaks a bit in this performance. e.g. in some cases, it can return a text node child of an attribute when operating on a DOM (this is so in DOM but not XPath). Your pos is a handy reminder for me to fix this bug. As an illustration, here's a session with 4XPath does (interactive Python prompt): >>> DOC = """<foo> ... bar ... <![CDATA[ ... baz ... ]]> ... quux ... </foo>""" >>> from Ft.Xml.Domlette import NonvalidatingReader >>> doc = NonvalidatingReader.parseString(DOC, "http://dummybaseuri.com") >>> from Ft.Xml.XPath import Evaluate >>> result = Evaluate("//text()", contextNode=doc) >>> print result [<cText at 0x81ae434>] >>> print result[0].data bar baz quux >>> -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Track chair, XML/Web Services One Boston: http://www.xmlconference.com/ The many heads of XML modeling - http://adtmag.com/article.asp?id=6393 Will XML live up to its promise? - http://www-106.ibm.com/developerworks/xml/li brary/x-think11.html
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|