XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Solomon FriedSubject: XPaths change when the 'Indent Tags' button is use
Author: Solomon Fried
Date: 08 Nov 2001 03:11 PM
I have an XML file which when loaded into Excelon, will correctly show 5 text() nodes at the XPATH...
//html/body/table[4]/tr/td[3]/div

However, once the "Indent Tags" options is used, the same XPath will now show 7 text() nodes, having added 2 empty nodes at the top.


Any ideas would be appreciated.

P.S. The XML file is, hopefully, attached to this message


Applicationout.xml
XML File

Postnext
Minollo I.Subject: Re: XPaths change when the 'Indent Tags' button is use
Author: Minollo I.
Date: 08 Nov 2001 03:24 PM
Solomon,
this seems to be the expected behavior; what's happening is that in the
first case you are running the query against something like:

{td width="530"}{div class="cnnCopyright"}{b}blabla{/b}{br/}blabla{br/}...

after you have indented the file, the query is run against:
{td width="530"}
{div class="cnnCopyright"}
{b}blabla{/b}
{br/}blabla{br/}...

....and you get an extra text node (which is the whitespace between {div
class="cnnCopyright"} and {b}blabla{/b}; same thing for the next whitespace
between the {b} and the {br/} element

Indenting your XML document can definitely affect the text nodes returned
by a query.

Minollo

Postnext
Hans-Peter KüchlerSubject: Re: XPaths change when the 'Indent Tags' button is use
Author: Hans-Peter Küchler
Date: 13 Nov 2001 11:46 AM
Hi,

I think, the whitespace text nodes are wrong.
Have a look at XSLT 1.0 §3.4:
"3.4 Whitespace Stripping
After the tree for a source document or stylesheet document has been constructed, but before it is otherwise processed by XSLT, some text nodes are stripped. A text node is never stripped unless it contains only whitespace characters. Stripping the text node removes the text node from the tree. The stripping process takes as input a set of element names for which whitespace must be preserved. The stripping process is applied to both stylesheets and source documents..."
So the whitespace text nodes have to be stripped.
Hans-Peter Küchler

Postnext
Minollo I.Subject: Re: XPaths change when the 'Indent Tags' button is use
Author: Minollo I.
Date: 13 Nov 2001 12:22 PM
The section you mention in the XSLT specs seems to apply specifically to
XSLT and not to how on the DOM tree is seen by XPath.

The sentence:
"3 Data Model: The data model used by XSLT is the same as that used by
XPath with the additions described in this section."

....can be interpreted in that direction, in the sense that XSLT uses the
same data model used by XPath with the exceptions listed in the rest of the
section, like 3.4 Whitespace Stripping.

It looks like the main XSLT processors currently available have the same
kind of interpretation of the specs as we do.

If you have an XML document like:
{a}{b}{c}{/c}{d}{/d}{/b}{/a}

....and you run a stylesheet containing the instruction:
{xsl:value-of select="count(/a/b/text())"/}

....the result is "0" in Stylus, Xalan-J and MSXML.

If you indent the XML document, to have...
{a}
{b}
{c}
{/c}
{d}
{/d}
{/b}
{/a}

....the value returned by the same XSLT is "3" in Stylus, Xalan-J and MSXML.

So, even if the specs don't sound that clear in that area, it looks like
the common interpretation is that the whitespace rule applies to the data
model processed by XSLT but not to the data model processed by XPath.

Minollo

Postnext
Hans-Peter KüchlerSubject: Re: XPaths change when the 'Indent Tags' button is use
Author: Hans-Peter Küchler
Date: 14 Nov 2001 10:30 AM
Yes, all processors show the same result, but it is wrong. The expression
{xsl:value-of select="count(//text())"/}
shows seven text nodes for the second example. This could not be the expected result because the elements {a} or {b} are certainly not defined with mixed content. So they have to have only element nodes and not text nodes with white space. I think this topic should be discussed with more people (and with the W3C?).
Hans-Peter Küchler

Posttop
Minollo I.Subject: Re: XPaths change when the 'Indent Tags' button is use
Author: Minollo I.
Date: 14 Nov 2001 11:10 AM
The concept of "mixed content" only belongs to the schema associated to the
document; if your XML document violates the directives of the associated
Schema, that means that the XML document is invalid. But it wouldn't affect
the result you get from XPath or XSLT (which basically means that I believe
that XPath is not a validating parser and that in any case XSLT doesn't
require the use of a validating parser).

Also, I believe that "mixed content" refers to non-whitespace text:
"XML spec, 3.2.1: Definition: An element type has element content when
elements of that type must contain only child elements (no character data),
optionally separated by white space (characters matching the nonterminal S)".
So, in any case the XML document containing whitespace text nodes would be
valid even if the elements are not defined as mixed content; and XPath
would be correct, I believe, to count them in.

Anyway, I agree with you that probably this discussion doesn't belong here,
but it belongs to some public XSLT discussion forum and to people involved
in the W3C work.

Minollo

 
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.