[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: How to spell "No PSVI" in XSLT 2.0 without getting sucked
Hi Evan, > As much as I resonate with the intent of your approach, I think it's > the wrong one. While there are examples of abuse (e.g. > M$tripping-whitespace), XSLT 1.0's deliberate de-coupling of data > model and serialization was a good and successful decision. > xsl:output is an optional feature for very good reasons. This design > has made for a remarkable marriage of flexibility and > interoperability. Absolutely. I was imagining that xsl:input would similarly represent a decoupling of parsing issues and transforming issues, and support for xsl:input be an optional feature. This would provide interoperability between the high percentage of XSLT processors that provide an interface by which you can supply simply the URI of a source document and have them manage the parsing for you. It strikes me that Elliotte's argument that XInclude elements should not be processed by an XSLT processor or not is *precisely* the kind of issue that would be resolved if there was a recommended method of controlling the document-to-node-tree process within the stylesheet. > As far as I see it, there are two things that were *crucial* in > ensuring the success of this approach: > > 1. There was always an unambiguous, lossless, > one-to-one mapping between the data model > and its serialized form, and > > 2. with few exceptions, all information in the > data model was present in the corresponding > serialized *instance* document. > > The PSVI is the antithesis to both of the above. And this is why > many of us are worried about how interoperable our stylesheets will > be in a PSVI-oriented world. First, I don't think the mapping from the data model to its serialized form is actually one to one. For any given set of attribute values on xsl:output, there are always options: should the processor use UTF-8 or UTF-16? Should it escape characters with decimal or hexadecimal character references? Should it use single quotes or double quotes around attribute values? Second, you seem to be saying that there's no one-to-one mapping between a document+schema to a node tree. If that is the case then, as with xsl:output, those flexibilities that are judged important should be parameterised. That's already being done through the ignore-whitespace/ignore-comments/ignore-processing-instructions flags. Uche and I were suggesting a process-xinclude flag. There might be others. Could you expand, though, on your second point. I'm not sure how the fact that the node tree doesn't contain the entire PSVI impacts on the ability of information within the stylesheet to control the creation of the node tree. > However, I don't think we can dictate a processing chain from the > stylesheet any more than we could before. Even if we tried, this > won't buy us interoperability, given the many processing frameworks > that aren't controlled by "XSLT applications" but nevertheless use > XSLT processors. If anything, such an attempt will complicate > interoperability problems in the same way that > xsl:disable-output-escaping does today. I disagree. The reason that xsl:disable-output-escaping causes such problems is that it short-circuits the clean distinction between the process of building the node tree and the process of serializing that node tree. I agree that such short-circuiting is really awful, but I'm not suggesting that there be a special feature whereby users can get hold of e.g. the declaration of an element despite it not being in the node tree. What I'm suggesting is that the node-tree-creation process could, optionally, be controlled from within the stylesheet in a similarly clean and de-coupled way to the (majority of the) node-tree-serialisation process. Just as with xsl:output, if a processor is used in a situation where it's passed a node tree directly, then it can ignore xsl:input. But something like xsl:input would provide consistency between the vast majority of XSLT processors that can be run from the command line having been given the URL of a source document to transform. We haven't needed that up til now because, with the exception of XInclude processing, the set of information that you get out of an XML document is pretty much fixed. Now that there's a major parameter that affects the content of the node tree (namely which schema you use to validate the document), I think we do need it. I get the argument that XSLT stylesheets should only be concerned with the transformation part of the process. But on the other hand, given that a major mode of stylesheet processing is a document-to-document (rather than tree-to-tree) transformation, and that XSLT says something about the tree-to-document part of the process, I think it's right that XSLT should say something about the document-to-tree part of the process. So I think I've missed the basis of your argument against using an xsl:input kind of optional control over the parsing process. Could you try explaining again? > Rather, what is needed is a way to dictate what *kinds* of > information can be present in a source/result tree, based on some > flag in the stylesheet. In particular, the stylesheet writer should > have a way to switch between plain vanilla XML/Infoset, and PSVI > with PSVI-specific information items. In short, this is a data model > issue more than a processing model issue. > > Such a flag would enable a stylesheet to process an XML document as > vanilla XML, regardless of its processing history. Its processing > history may or may not include XML Schema validation. In the event > that it does, the visible PSVI augmentations will be constrained to > the kinds of information that can occur in the restricted, vanilla > data model, namely defaulted attributes, etc. This implies a > straightforward algorithm for interpreting a PSVI as an augmented > Infoset without the PSVI-specific information items. Such an > algorithm would be akin to taking the PSVI, serializing the > instance, and parsing it again without respect to a schema. When I first read this, I thought you were talking about changing the underlying data model based on a flag in the stylesheet. But I think that what you're saying is that there are different levels of augmentation of the basic XML Infoset that you might be interested in, even when validating against a schema. You think that there should be a flag that states that the typing information (i.e. typed value and type properties on nodes) should be omitted (aside from certain attributes have the ID type, presumably?). I'm assuming (perhaps wrongly) that users will be given the options of not validating against a schema at all, using a DTD, or using a schema that they've designed specifically to give them the information they need in the stylesheet. What do you see as the benefits of giving users this option -- of validating against a schema but ignoring some of what that tells you -- as well? Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|