Evan Lenz wrote:
> Reinventing the Wheel
I think I should start by responding to the title of this
message. The phrase "reinventing the wheel" usually refers to reinventing
something that already exists because you don't know about it. The editors
of XQuery include a former member of the XSL Working Group who has written a
fair number of stylesheets. They also include one of the inventors of SQL,
one of the inventors of XML-QL, and one of the inventors of XQL, a precursor
of XPath. We considered quite a few syntax approaches, including building on
XSLT, before arriving at the approach we used.
Also, you imply that we are off on a completely different
track than XSLT. In fact, we are working closely together with the XSL
Working Group to define XPath 2.0. This includes not only adding features,
but deriving a new model for XPath that is able to account for XML Schema
types.
> After reviewing the XQuery spec, I'm concluding that
the
> overlap between XQuery and XSLT is far too
great for the
> W3C to reasonably recommend them
both as separate languages.
XQuery and XSLT will share a common expression language,
including path expressions. XSLT is really two languages, an XML-based
language used to write the templates, and XPath, an expression language used
for patterns. Both XQuery and XSLT will use XPath 2.0, and the two Working
Groups are working closely together on this. So the two languages will share
a great deal.
Why have a new language? Three reasons: (1) ease of use for
our use cases, (2) optimizability, (3) strong data typing.
1. Ease of use
XQuery is significantly more straightforward for a lot of
common database queries. To some extent, what is straightforward is a matter
of taste, a realm where logic does not reach, but I think that some of the
reasons are worth stating.
First, simple queries are simpler in XQuery. For instance,
an XPath 2.0 expression that uses the abbreviated syntax is also a valid
query by itself. This is not true of XPath. Your document http://www.xmlportfolio.com/xquery.html
incorrectly labels XPath expressions as XSLT, but an XSLT processor will not
process your examples unless you place them in a template. Consider a simple
query that looks for all employees in a set of documents:
//emp
This is much easier to read and write than the equivalent
XSLT stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template
match="/emp">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
This difference is also present for some moderately complex
queries. When you consider the following XQuery expression:
/emp[rating =
"Poor"]/@mgr->emp/@mgr->emp/name
you compare it to the following XSLT fragment:
<xsl:variable name="poorEmpManagers"
select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
I think it would be a fairer comparison if you typed in the
entire stylesheet that you would have to write in XSLT. I didn't test this,
but I think the following is approximately what you would have to
write:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template
match="/">
<xsl:variable name="poorEmpManagers"
select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
<xsl:copy-of
select="id($poorEmpManagers/@mgr)[self::emp]/name"/>
</xsl:template>
</xsl:stylesheet>
The fact that *any* expression in XQuery is a valid query
makes it easier to write simple queries, without the overhead associated
with a stylesheet. For what it's worth, here's the shortest XQuery
expression that can be executed as a stand-alone query:
1
Also, the keyword-oriented approach of XQuery is more
familiar and comfortable to many programmers. I would rather
write:
FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND
$b/year = "1998"
RETURN $b/title
than
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each
select="document('bib.xml')//book">
<xsl:if test="publisher='Morgan
Kaufmann' and year='1998'">
<xsl:copy-of
select="title"/>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:transform>
Note that there's been no great rush to create an XML syntax
for Java, JavaScript, Visual Basic, or other high level programming
languages. Several people have attempted to make XML syntaxes for SQL, but I
have not been impressed by the results.
2. Conventional Database Functionality
XQuery is more suitable to many of the kinds of queries that
SQL programmers are used to. Joins and the distinct() function account for a
lot of this - no surprise, since XQuery's FLWR expressions are quite similar
to SQL's SELECT/FROM/WHERE. It may make sense, incidentally, to add these to
XSLT as well. Another reason for XQuery and XSLT to continue to work
together on XPath 2.0.
To a database person, it is somewhat surprising that your
paper does not explicitly mention joins, which are one of the biggest
reasons for FLWR expressions in XQuery. Joins are central to database
functionality, and it is important to express them in a way that allows
optimization based on patterns detected in the expressions. I also notice
that the examples in your paper do not include any examples from Section 3
of the XQuery paper, which shows how conventional SQL-like queries are done.
In your paper, you point out that FLWR expressions do have
some syntactic similarity to XSLT's <xsl:foreach />. This is true, but
it misses the purpose of FLWR expressions, which is to provide general
SQL-like functionality for joins and declarative restructuring. A
naive mapping of FLWR expressions to <xsl:foreach /> is not likely to
give you an efficient implementation of joins.
You do give an example that combines a join with distinct().
The XQuery looks like this:
FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")
/book[publisher = $p]/price)
RETURN
<publisher>
<name> $p/text()
</name> ,
<avgprice> $a </avgprice>
</publisher>
The equivalent XSLT looks like this:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each
select="document('bib.xml')//publisher[not(.=preceding::publisher)]">
<xsl:variable
name="prices"
select="document('bib.xml')/book[publisher=current()]/price"/>
<xsl:variable
name="avgPrice" select="sum($prices) div count($prices)"/>
<publisher>
<name><xsl:value-of select="."/></name>
<avgprice><xsl:value-of
select="$avgPrice"/></avgprice>
</publisher>
</xsl:for-each>
</xsl:template>
</xsl:transform>
Again, I find the XQuery solution much easier to read and
write. This is the kind of thing XQuery was designed for. More important, in
XQuery, we have been thinking of database optimization, and I think we will
be able to figure out how to optimize the XQuery equivalent
better.
2. Optimizability
A query language needs to be optimizable for queries. To
make this possible, we need to be able to discover equivalences so that
queries can be rewritten flexibly based on the performance parameters of
various kinds of access. Both the XQuery language and the XML Query Algebra
are designed to make this possible.
3. Strong Typing
XQuery will be a strongly typed language. This typing will
extend to content models - a function whose return type is "paragraph
element" will return a valid paragraph element. This level of strong typing
is very helpful in industrial strength programming environments, and
difficult to achieve with the current XSLT. Much of the effort, and much of
the justification for the Query Algebra is achieving strong
typing.
In fact, XSLT may benefit from this work. It would be
helpful to have stronger typing in XSLT as well. For instance, I would like
to be able to check whether a given stylesheet will always produce valid
HTML 4.0 for a given DTD. Several people are investigating this - it is much
to early to say whether it can be achieved.
At any rate, I hope this helps explain why I think XQuery is
worth developing as a language, in addition to XSLT.
Jonathan