Evan Lenz wrote:
> Reinventing the Wheel
I think I should start by responding to the title of this
message. The phrase "reinventing the wheel" usually refers to reinventing
something that already exists because you don't know about it. The editors of
XQuery include a former member of the XSL Working Group who has written a fair
number of stylesheets. They also include one of the inventors of SQL, one of
the inventors of XML-QL, and one of the inventors of XQL, a precursor of
XPath. We considered quite a few syntax approaches, including building on
XSLT, before arriving at the approach we used.
Also, you imply that we are off on a completely different
track than XSLT. In fact, we are working closely together with the XSL Working
Group to define XPath 2.0. This includes not only adding features, but
deriving a new model for XPath that is able to account for XML Schema
types.
> After reviewing the XQuery spec, I'm concluding that the
> overlap between XQuery and XSLT is far too great
for the
> W3C to reasonably recommend them both as
separate languages.
XQuery and XSLT will share a common expression language,
including path expressions. XSLT is really two languages, an XML-based
language used to write the templates, and XPath, an expression language used
for patterns. Both XQuery and XSLT will use XPath 2.0, and the two Working
Groups are working closely together on this. So the two languages will share a
great deal.
Why have a new language? Three reasons: (1) ease of use for
our use cases, (2) optimizability, (3) strong data typing.
1. Ease of use
XQuery is significantly more straightforward for a lot of
common database queries. To some extent, what is straightforward is a matter
of taste, a realm where logic does not reach, but I think that some of the
reasons are worth stating.
First, simple queries are simpler in XQuery. For instance, an
XPath 2.0 expression that uses the abbreviated syntax is also a valid query by
itself. This is not true of XPath. Your document http://www.xmlportfolio.com/xquery.html
incorrectly labels XPath expressions as XSLT, but an XSLT processor will not
process your examples unless you place them in a template. Consider a simple
query that looks for all employees in a set of documents:
//emp
This is much easier to read and write than the equivalent XSLT
stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template
match="/emp">
<xsl:copy-of
select="."/>
</xsl:template>
</xsl:stylesheet>
This difference is also present for some moderately complex
queries. When you consider the following XQuery expression:
/emp[rating =
"Poor"]/@mgr->emp/@mgr->emp/name
you compare it to the following XSLT fragment:
<xsl:variable name="poorEmpManagers"
select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
I think it would be a fairer comparison if you typed in the
entire stylesheet that you would have to write in XSLT. I didn't test this,
but I think the following is approximately what you would have to
write:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template
match="/">
<xsl:variable name="poorEmpManagers"
select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
<xsl:copy-of
select="id($poorEmpManagers/@mgr)[self::emp]/name"/>
</xsl:template>
</xsl:stylesheet>
The fact that *any* expression in XQuery is a valid query
makes it easier to write simple queries, without the overhead associated with
a stylesheet. For what it's worth, here's the shortest XQuery expression that
can be executed as a stand-alone query:
1
Also, the keyword-oriented approach of XQuery is more familiar
and comfortable to many programmers. I would rather write:
FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND
$b/year = "1998"
RETURN $b/title
than
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each
select="document('bib.xml')//book">
<xsl:if test="publisher='Morgan
Kaufmann' and year='1998'">
<xsl:copy-of
select="title"/>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:transform>
Note that there's been no great rush to create an XML syntax
for Java, JavaScript, Visual Basic, or other high level programming languages.
Several people have attempted to make XML syntaxes for SQL, but I have not
been impressed by the results.
2. Conventional Database Functionality
XQuery is more suitable to many of the kinds of queries that
SQL programmers are used to. Joins and the distinct() function account for a
lot of this - no surprise, since XQuery's FLWR expressions are quite similar
to SQL's SELECT/FROM/WHERE. It may make sense, incidentally, to add these to
XSLT as well. Another reason for XQuery and XSLT to continue to work together
on XPath 2.0.
To a database person, it is somewhat surprising that your
paper does not explicitly mention joins, which are one of the biggest reasons
for FLWR expressions in XQuery. Joins are central to database functionality,
and it is important to express them in a way that allows optimization based on
patterns detected in the expressions. I also notice that the examples in your
paper do not include any examples from Section 3 of the XQuery paper, which
shows how conventional SQL-like queries are done.
In your paper, you point out that FLWR expressions do have
some syntactic similarity to XSLT's <xsl:foreach />. This is true, but
it misses the purpose of FLWR expressions, which is to provide general
SQL-like functionality for joins and declarative restructuring. A naive
mapping of FLWR expressions to <xsl:foreach /> is not likely to give you
an efficient implementation of joins.
You do give an example that combines a join with distinct().
The XQuery looks like this:
FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")
/book[publisher = $p]/price)
RETURN
<publisher>
<name> $p/text()
</name> ,
<avgprice> $a </avgprice>
</publisher>
The equivalent XSLT looks like this:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each
select="document('bib.xml')//publisher[not(.=preceding::publisher)]">
<xsl:variable name="prices"
select="document('bib.xml')/book[publisher=current()]/price"/>
<xsl:variable
name="avgPrice" select="sum($prices) div count($prices)"/>
<publisher>
<name><xsl:value-of
select="."/></name>
<avgprice><xsl:value-of
select="$avgPrice"/></avgprice>
</publisher>
</xsl:for-each>
</xsl:template>
</xsl:transform>
Again, I find the XQuery solution much easier to read and
write. This is the kind of thing XQuery was designed for. More important, in
XQuery, we have been thinking of database optimization, and I think we will be
able to figure out how to optimize the XQuery equivalent
better.
2. Optimizability
A query language needs to be optimizable for queries. To make
this possible, we need to be able to discover equivalences so that queries can
be rewritten flexibly based on the performance parameters of various kinds of
access. Both the XQuery language and the XML Query Algebra are designed to
make this possible.
3. Strong Typing
XQuery will be a strongly typed language. This typing will
extend to content models - a function whose return type is "paragraph element"
will return a valid paragraph element. This level of strong typing is very
helpful in industrial strength programming environments, and difficult to
achieve with the current XSLT. Much of the effort, and much of the
justification for the Query Algebra is achieving strong typing.
In fact, XSLT may benefit from this work. It would be helpful
to have stronger typing in XSLT as well. For instance, I would like to be able
to check whether a given stylesheet will always produce valid HTML 4.0 for a
given DTD. Several people are investigating this - it is much to early to say
whether it can be achieved.
At any rate, I hope this helps explain why I think XQuery is
worth developing as a language, in addition to XSLT.
Jonathan