RE: MSXML Whitespace handling
Jeni, Your reply is well-written and well-researched, and it exposes a simplification I made in my original mail. The fact is that the MS DOM does not parse the input (I was trying to simplify the discussion, but instead caused confusion). Instead, it's the MS XML Parser that actually parses the input XML and makes SAX-like calls to the application to consume. At this level, full whitespace is provided to the "application" that is consuming the stream of events. And it is the MS DOM which consumes these events and builds the in-memory representation, making it an application as defined in the XML 1.0 spec. Here is the architecture, represented graphically: __________________ | | | XSL Processor | | (Application) | |________________| ^ _______|__________ ___________________________ | | | | | DOM Cache |--->| User Application | |________________| | (may perform read-only | ^ | operations on cache | _______|__________ | concurrently with XSLT) | | | |_________________________| | MS DOM Builder | | (Application) | |________________| ^ _______|___________ | | | MS XML Parser | | (XML Processor) | |_________________| ^ _______|________ | | | XML Document | |______________| The user loads the DOM, with code like this: dom.load("my-xml.xml"); and if the user did not set: preserveWhiteSpace = true; then there is absolutely no way for MS XSLT to recover the whitespace stripped during load, since the application it depends upon (MS DOM) has already stripped it. The MS XSL processor has not even been instantiated yet. How can it reach back and instruct the DOM to preserve whitespace? Do you see the problem? Your mail made it sound like MSXSL somehow controls the load. It does not, nor should it be required to. Instead, the user controls the load. If the user allows whitespace to be stripped, then there is absolutely nothing that MS XSLT can do to recover it. Now, I do see that defaulting to preserveWhiteSpace = true has caused a lot of confusion to XSLT users, but remember that the decision to use this default was made long ago, before the XSLT spec even existed. I'll let the UE guys know that they should prominently discuss preserveWhiteSpace = true in the XSLT docs, so that people know how to get the behavior they want. Here is a snippet of JScript that shows how to transform a fully preserved cache: xml_dom = new ActiveXObject("MSXML2.DOMDocument); xsl_dom = new ActiveXObject("MSXML2.DOMDocument); xml_dom.preserveWhiteSpace = true; xsl_dom.preserveWhiteSpace = true; strResult = xml_dom.transformNode(xsl_dom); The advantage to this architecture is that the user can load both the XML and XSL, change either of them via the DOM API, and then perform the transformation. If XSLT directly loaded the .XML and .XSL, this would not be possible. ~Andy Kimball MSXSL Dev -----Original Message----- From: Jeni Tennison [mailto:jeni@xxxxxxxxxxxxxxxx] Sent: Tuesday, August 01, 2000 5:04 PM To: xsl-list@xxxxxxxxxxxxxxxx Subject: RE: MSXML Whitespace handling At 13:51 01/08/00 -0700, Andrew Kimball wrote: >As for mangling by default, that is a beef with the design of the MS DOM, >not with the conformance of MS XSL. The MS DOM defaults towards performance >and low memory consumption, while still staying within the XML 1.0 spec. I >think it was the right decision for the vast majority of users. Users who >need to preserve whitespace can always set preserveWhiteSpace=true when >loading the DOM, or use xml:space="preserve" to tag significant whitespace. As Andy says, it is a beef with the design of the MS DOM rather than MS XSL. >From a standards point of view, it all comes down to whether MS DOM is counted as an XML processor or an XML application. The XML Recommendation states: "A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application." Andy said: "The application responsible for parsing the input XML and building the tree cache is the DOM, not XSLT. Therefore, it is perfectly reasonable to view the DOM as the "application" referred to in the XML 1.0 spec." It seems the job of MS DOM is to read in (parse) and provide access to the content and structure of the XML document: squarely in the preserve of the 'XML processor' rather than the 'XML application'. (If that's not the case, how does MS DOM *apply* the information in the XML document as a standalone application?) It seems to me that it is MS XSL that actually performs some action as a result of the XML: MS XSL is an XML application, MS DOM is an XML processor. In the section on Whitespace Processing (2.10) the XML Recommendation states: "An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content." Given that MS DOM is an XML processor, it should be passing the whitespace within xsl:text through to MS XSL so that it can deal with it properly. >From a usability point of view, in my experience one of the main uses of xsl:text is to add whitespace in some output. I'm sure that it makes MS DOM quicker and leaner not to worry about whitespace, but it seriously detracts from its utility as a XML processor to be used by an XSLT Processor like MS XSL. If there was a normative XSLT DTD, and the XSLT DTD specified: <!ATTLIST xsl:text xml:space (preserve) #FIXED 'preserve'> then presumably MS DOM would preserve the whitespace within xsl:text. As it is, the DTD that is supplied within the XSLT Recommendation is non-normative and I imagine that most XSLT processors decide what to do on the basis of an implicit understanding of the intention behind the definitions given within the XSLT Recommendation rather than relying on an explicit DTD. It is clearly the intention within [http://www.w3.org/TR/xslt#strip] that xsl:text should preserve whitespace; XML applications that deal with XSLT should treat these elements as if they had xsl:space="preserve" declared on them. As a compromise, could MS DOM treat xsl:text as if xml:space="preserve" were defined on it? Perhaps unfortunately, because it would be nice if a small compromise were all that's needed, the rules governing whether whitespace is significant within XSL elements is more complex that whether an element has xml:space="preserve" or even whether it's an xsl:text element. In XSLT, you can define elements within which whitespace should be preserved using xsl:preserve-space (in combination with xsl:strip-space). If MS XSL is not given sufficient information to process these elements according to the XSLT Recommendation, then these elements are useless when used with it. A larger compromise would involve MS DOM treating all mixed-content and #PCDATA XSLT elements as if xml:space="preserve" were defined on them. However, for true compliance as a XML processor, to avoid spurious exceptions for XSLT elements, and to enable MS XSL (and, eventually, other XML applications) to perform in a useful and compliant manner, MS DOM should preserve whitespace by default. If MS DOM does not, MS XSL should use a conformant XML processor instead, to enable it to conform to the XSLT Recommendation. My 10p worth :) Cheers, Jeni Dr Jeni Tennison Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE tel: 0115 906 1301 * fax: 0115 906 1304 * email: jeni.tennison@xxxxxxxxxxxxxxxx XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format