[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Is XSLTs handling of CDATA sections too aggresive?
Maybe somebody would like to comment on the question below I sent to the official XSLT mailing list? The question is related to the XML 1.0 errata and to the general problem of whitespace handling in XML 1.0 without the guiding hand of a DTD. I discovered this problem while using the Alphaworks XSLT processor to process marked-up text. It's more than a practical issue, there are some important principles involved I think: let us be able to make XML 1.0 express trees that are not full of whitespace garbage (to be filtered by extraneous programs, DTDs, whatever,...). thanks in advance, /Nils Excerpts: Message-ID: <04bb01bf7273$90095740$b2e3cf87@r...> From: "Nils Klarlund" <klarlund@r...> To: <xsl-editors@w...> Cc: "klarlund" <klarlund@r...> Date: Tue, 8 Feb 2000 15:32:03 -0500 Subject: Is it right to remove whitespace nodes stemming from CDATA sections? (No, I think!) I believe that the way CDATA sections are treated in XPATH/XSLT is not compatible with the latest Errata to XML 1.0. (http://www.w3.org/XML/xml-19980210-errata). Moreover, the way CDATA sections are treated makes it impossible to adopt a simple view of XML, namely remove all whitespaces nodes, without a provable loss of expressive power! This radical pruning view is desirable for many applications, especially for database applications, but, also for document oriented processing, where the usual semantics that introduce tons of whitespace nodes is an aesthetic and practical problem. The problem with XSLT is that even a very explicitly marked whitespace such as <![CDATA[ ]]> is eaten up if not in company with non-whitespace characters. So, I can't insert spaces between nodes! In other words, assuming that it is unreasonable that a DTD or application should make decisions about which whitespace nodes are for real and which are not, I'm in trouble: I want to prune all whitespace nodes, except those that I mark as important. Clearly, as indicated, in the section below, XML 1.0 makes semantic distinctions between ' ' and <![CDATA[ ]]>. Thus, XSLT cannot be used to determine whether some content is "element content". Does it appear in error to water down XPATH to that point? I suggest that the stripping of whitespace nodes explicitly excludes nodes gotten from or involving CDATA sections. Thanks /Nils >From Errata: Section 3 Change item number 2 of the list of valid cases for the "Element Valid" VC to read: The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between the start tag and the first child element, between child elements or between the last child element and the end tag. Note that a CDATA section containing only white space does not match the nonterminal S, and hence cannot appear in these positions.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|