[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XPath 1.5? (was RE: typing and markup)
Hi Jonathan, > It's clearly true that //address is easier, and requires less > precise knowledge of the structure of the data. Calling it "the easy > path" implies that it is not the right way to go, but for data that > is governed by a DTD or schema, for stylesheets that are compiled, I > think that the main reason not to use // is that the tools currently > do not exploit schema and DTD information for optimization. As I said, I think that step-by-step paths (//author is an absolute path -- it starts at the document node) are also easier for people to follow. I spend a lot of time going through other people's stylesheets, having little knowledge of the markup language that they're using, and paths that use // can make it very hard to understand how the document is structured, what that means in terms of what's generated, and where changes might be required should the markup language or the desired output change in the future. > Since DTDs do change, and schemas are combined, I often prefer to > write queries that do not depend on absolute paths in documents. The > query "//author" can find authors with a variety of structures found > at many places in various document structures. I would have thought that, in a real life version of the example that we're talking about, it would be just as likely that you wouldn't want that new address to be displayed as that you would. And that being the case, you'll generally have to rewrite your paths anyway. > I'm not so sure. I continue to encounter errors in widely used XSLT > stylesheets, including the XMLSpec DTD, that result in invalid HTML > when I write a document with a structure that has apparently not yet > been tested. This often involves quite straightforward structural > errors which I believe *could* be caught by static analysis. > > Perhaps you don't want this when you run XSLT, it might be more > useful as a standalone 'lint' utility. This 'lint' utility might > even be part of the same tool that optimizes your patterns based on > a schema. Absolutely. This kind of analysis and assistance to XSLT authors is great at *authoring time*. It shouldn't have to be embedded in the XSLT processors, which should instead be lean and mean and focus on the job of transformation rather than schema analysis and validation. >>I think that's because if you protect people from one error at the >>markup language level, they might think that you're protecting them >>from every error. For example, if your address type contained a >>postcode child element that was defined, in the schema, to match the >>pattern: >> >> "[A-Z]{2}[1-9] [1-9][A-Z]{2}" >> >>then doing: >> >> address[starts-with(postcode, 'W1')] >> >>could logically also give you an error. A user might ask why this >>doesn't raise an error, when other assertions within a schema do. > > This feels like all or nothing thinking to me. We should be clear > with our users that we don't catch all errors. No query or > programming language does. But most do catch some errors. Catching > more errors, rather than fewer, is a good thing. If we want to make > it plain to the user that no errors will be caught until the > relevant code is invoked on data that exposes the bug, I think XSLT > does far too much error checking already. > > I am always happy to remove a bug from a program even if there may > still be another bug. You're right, it was rhetoric. I'm finding it hard to express why this static typing stuff makes me feel so unnerved. It comes down to the fact that I don't want to have to jump through hoops to create stylesheets if those hoops don't give me a tangible benefit, and I don't want the processors I use to be jumping through those hoops either. The argument that I'm hearing is that the benefit of jumping through the strong typing hoops is the predictability of the transformation result and the optimisability of the XPath queries. What I was trying to say above is that the former benefit is not nearly so great as the designers of XPath seem to think. That doesn't mean there's no benefit, just as I agree that there is some benefit in optimising XPath queries. I just don't think that there's sufficient benefit for the cost that will be made in terms of implementation and user effort. > I do know of XPath implementations that perform DTD based > optimization. I don't want to name names, but these are systems that > use XPath as a standalone language for querying persistent data. I > don't know whether any XSLT processors do this. As we've discussed, that's a radically different situation from the majority of XSLT transformations, or indeed other uses of XPath, such as XPointer. I expect that XPath implementations that perform DTD-based optimisation will become XQuery implementations rather than XPath 2.0 implementations. >>In particular, support for the cast, treat, assert, and validate >>expressions over complex types, which require support for the >>technicalities in the XQuery Semantics, is a major implementation >>effort and an overhead in a running system. > > These *do* add a lot of complexity, and in the context of XSLT, I > also wonder how much bang for the buck the give us. XQuery clearly > needs them. > > This is, of course, a matter for use cases to sort out ;-> The closest that XPath 2.0 has to use cases is a bunch of requirements. I can't see anything in that which indicates that cast, treat, assert or validate is required for XPath 2.0, although there might be technical reasons that I haven't seen. Part of the point of discussing this is to learn what makes the WGs think that these are required. >>As far as I can tell, implementers can't use information from the >>PSVI (i.e. an existing XML Schema validator) here; but have to write >>their own validators in order to perform both the static and dynamic >>checks that are required to evaluate these expressions. > > At least some schema validators do make the PSVI information > available (via regrettably proprietary interfaces), so I don't see > why this information can't be exploited. Again, it might make more > sense to use a separate "lint-and-optimizing-rewrite" tool to check > and optimize a stylesheet rather than do this every time a > stylesheet is executed. I (or the XQuery Semantics WD) might be behind the times, but the current WD indicates that "XML Schema is based on named typing, while the XQuery type system is based on structural typing." The definition of a "subtype" in the XQuery Semantics WD is not the same as a "derived type" in XML Schema. That's why I say that implementers need to implement this validation themselves rather than reuse the code of XML Schema validators. I agree that a separate stage of linting and optimisation would be more useful. > Would you really suggest using *none* of the type operators, or are > there some that you think would be worthwhile if they were easy to > implement? I suspect that any XSLT processor that has access to the > PSVI would find 'treat' and 'cast' reasonably easy to implement - > 'cast' requires facet checking, but this amounts to about 10 > relatively simple functions. Let me see if I can describe what I think each of these expressions are supposed to do; after all, I might be misinterpreting what they're supposed to be useful for. First, "instance of". I think that "instance of" meets the requirement of being able to select elements based on their type, which is something listed in the XPath 2.0 Requirement document. So I think that this is worthwhile. On the other hand, I think it should be based on named typing rather than structural typing, firstly because I think that's simpler for implementers (they can look at the PSVI to work out whether one type is derived from another) and secondly because I think it would be frustrating for users if they don't have control over what types count as subtypes of each other. For example, if I have the following two element declarations: <xs:element name="address" type="addressType" /> <xs:complexType name="addressType"> <xs:sequence> <xs:element name="line" type="xs:string" /> </xs:sequence> </xs:complexType> <xs:element name="poem" type="poemType" /> <xs:complexType name="poemType"> <xs:sequence> <xs:element name="line" type="xs:string" /> </xs:sequence> </xs:complexType> then if I do: <xsl:template match="*[. instance of element of type addressType]"> ... </xsl:template> then I want to select those elements of the addressType, not the poemType. The structure of addressType and poemType might be the same, but they do not have the same semantics. Second, the "cast" operator. From what I can tell, cast is used to cast one simple type to another simple type. In that way, it's similar to the XPath 1.0 functions of string(), number() and boolean(). Now XPath 1.0, and XPath 2.0 in XSLT, has a flexible type exception policy, so most of the time explicit casting from one type to another isn't required. It's fairly rare to need to cast in XPath 1.0; the times when it's necessary are: - when you want to test whether an element has a string value, as opposed to whether the element exists (i.e. test="string(foo)" rather than test="foo") - when you want to test whether the value of a node is numeric (i.e. test="number(foo)" rather than test="foo") - when you want to use the numeric value of a node within a predicate (i.e. select="foo[number(bar)]" rather than select="foo[bar]") - when you want to sort a bunch of nodes based on whether they have a particular characteristic or not (i.e. in xsl:sort, select="boolean(foo)" rather than select="foo") The first is really a shorthand for test="foo != ''". The last is only an issue because you can't have data-type="boolean" in XSLT 1.0; that isn't an issue in XSLT 2.0 because you could use data-type="xs:boolean". In XPath 2.0, the second should be done with test="foo instance of element of type xs:decimal" instead, I think. The third can't be done in any other way. The question is whether we'll ever need to explicitly convert a value to other kinds of values. I can think of potential use cases, but neither are compelling: - to print out the canonical representation of a particular data type (but then there should be format-number() and format-date() etc. functions for them) - to test whether a node is of a particular type (but then there's the "instance of" expression for that) I haven't yet seen a good use case, and unless there is one I think that cast should be omitted. Now onto the difficult ones, "treat" and "assert". From what I can tell, "treat" states that the type of a node or value is a supertype of a given type, whereas "assert" states that the type of a node or value is a subtype of a given type. The only benefits that I can see from these is that it means the processor might reject certain things during compilation (and as above I think that this should be done by a separate tool), and that explicitly casting one complex type to another enables optimisation etc. I looked for use cases for these in the XQuery document to try to see where it might be helpful. There isn't a use case that involves "assert"; the use case for "treat" seems to be that it prevents the processor from complaining when you try to access a child node that, according to the schema, shouldn't be present for a node of a particular supertype. Since I don't think XSLT processors should be raising errors in those kinds of situations anyway, I don't see the point of supporting either of these expressions. Finally, "validate", which takes the result of an expression and validates it, usually in some context. Again there's no use case in the XQuery Use Cases document, so it's hard to tell how the WGs are imagining this will be useful. The only thing that I can think of is that this is a way of adding default values to elements and attributes that you generate; but then, if you're generating those nodes, surely you can indicate what type they are when you generate them rather than taking an extra step to do so? So again, I don't see any reason for validate to be present in XSLT, but there might be one that I'm not aware of. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|