A Conversation with Michael Kay on XML Technologies

Dr. Michael Kay is the author of XSLT Programmer's Reference from Wrox Press, the standard reference work on XSLT. He is also the editor of the W3C's XSLT 2.0 specification, which is currently a Last Call Working Draft, and his Java-based Saxon XSLT processor is one of the most successful and popular XSLT processors in the language's history. The branch of Saxon supporting XSLT 1.0 is currently at release 6.5.3, and regular readers of this column will know that the 7.x branch of Saxon has been implementing more and more support for XSLT 2.0, and has also added support for XQuery. Michael recently formed his own company, Saxonica, and has launched a commercial version of Saxon with support for XML Schema.

Stylus Studio®, the leading XML IDE for XML data integration, which previously included out-of-the box support for XSLT editing, mapping, debugging and profiling for Saxon 6, recently added new built-in Saxon XQuery development tools support for the Saxon SA 8.6 processor. Ivan Pedruzzi, Stylus Studio®'s Sr. Product Architect, and editor of 'The Stylus Scoop' newsletter, recently had the opportunity to interview Dr. Kay on behalf of the Stylus Studio® developer community. The two chatted about XQuery and XPath 2.0 technologies, XML development tools, recent updates to Dr. Kay's Saxon product line, and other XML topics that are sure to be of interest to you.



Ivan Pedruzzi: Hi, Michael. We appreciate you taking the time to talk with The Stylus Scoop today about upcoming XML technologies. First, by means of introduction, tell us a bit about the exciting XML standards work you are engaged in as a member of W3C XML related working groups.

Michael Kay: It's good to have the reminder that the standards are exciting, because at times the work of producing them is mind-numbingly tedious! We're all desperately keen to get the current round of standards finished, but the W3C process is very demanding in terms of quality control, and we've got over 1000 public comments to work through at the moment. They range from simple typos to fundamental questions addressing the formal semantics of the languages, so it's not an easy job.

But yes, the end product is indeed exciting. We're working on three closely-related languages, XSLT 2.0, XPath 2.0, and XQuery 1.0. The new XSLT and XPath versions have double the capability of their predecessors, and XQuery is completely new, so there's a lot of new processing power arriving for XML.

I'm involved right across the spectrum of the three languages, though as the XSLT 2.0 editor it's sometimes been my job to act as spokesman for the large XSLT user community and chief advocate of backwards compatibility in the languages. (I was taught as an undergraduate that "compatibility means deliberately repeating other people's mistakes", and it's absolutely true in this game. If users can't move forwards easily, they won't move forwards at all.)

For XSLT, I think the excitement is that we get rid of most of the really weird coding workarounds that are needed to solve common problems. We're also making the language much more suitable for up-conversion applications (in other words, creating markup rather than just transforming it), which takes the language into a whole new market. For XQuery, as far as I'm concerned the key benefit is that XQuery does database access, which XSLT was never designed for. However, there's a lot of overlap between the functionality of XSLT and XQuery, and there seem to be many people who want to use XQuery just because they prefer the syntax.

IP: So how is the work on the XQuery specification progressing?

MK: It's taking longer than most of us would like, but we're getting there. If you look at the recent drafts you'll see that the language is now very stable, but there are still quite a few corner cases where the semantics need to be pinned down, which all takes time. The big decisions have all been made and it's now a question of getting the small print right. Probably the most exciting thing about XQuery is the amount of implementation activity. There are implementations from the big database vendors, from specialists such as Stylus Studio®, from open source independents like my own company, Saxonica, and from research outfits. This creates a very stimulating environment: all the implementers compete with each other, all the products are better as a result, and it's the user who benefits. This is important, because there are plenty of standards activities that attract lots of attention but never catch on in practice. But when you get dozens of implementations (as XSLT did when it came out nearly five years ago) then you can have a pretty high level of confidence that something significant is happening. In fact, the relational vendors are starting to talk about XQuery as a long-term replacement for SQL, and that's something that really would be a sea-change in our industry.

IP: As you know, XSLT has become quite a popular programming language and I would say that most of our users are well-versed in the technology. Can you explain the key differences between XQuery and from XSLT 2.0 for our users?

MK: One of the things I find interesting is that many of the early users of XQuery are people who are new to XML, certainly users who haven't come from an XSLT and XPath background. Very often, that's because they are database people rather than document people. I hate making that distinction, because I think one of the most important reasons for XML's success is that it brings those two worlds together, but the fact is, people are coming to it from different directions and they do have different expectations. For most database people, XSLT simply isn't something they can relate to as a data manipulation language. Conversely, many XSLT users instinctively resist XQuery: they've become comfortable with the way XSLT does things, they've learnt to appreciate its strengths and to ignore its weaknesses.

That doesn't really answer the question you asked! But it does give one kind of answer, which is that the differences of style are as important as the differences of substance. There are many jobs you can do equally well using either language, but the two solutions appeal to different kinds of user.

To my mind, XQuery is better at doing the traditional database query jobs: finding data, joining data from different sources, aggregating. XSLT 2.0 is still better at processing data with unpredictable structure (the "document" end of the spectrum), and it's better for many transformation tasks. I hope that people will use both languages for what they are best at, rather than fighting any religious wars. I think that once you've learnt either of the languages, you're 80% of the way towards learning the other.

IP: What kind of applications can you envision being ideal candidates for XQuery?

MK: Firstly, if you're actually doing queries against an XML database, or against XML data in a relational database, then there's no competition. XSLT simply won't do that. There have been people who've argued that it could be made to do that, but it wasn't designed for the job and no one is trying to make it fit into that role. Then beyond that there's an interesting range of applications where you could use either language. If the logic is simple then the XQuery code is likely to be much shorter than the XSLT code: 5 lines rather than 20. So I think we'll see XQuery being embedded in Java or C# applications in a way that's never really been convenient with XSLT. It's also better at joining data from multiple sources. But there are other tasks where XSLT is better, especially with the 2.0 version: if you want to produce a copy of a document with all the NOTE elements removed, it's vastly easier in XSLT than in XQuery. I think XSLT is better for big applications: the docbook stylesheet suite is 84,000 lines of XSLT, and I can't imagine writing that in XQuery.

IP: Doesn't XSLT 2.0 do everything that XQuery can do?

MK: In a strict computer science sense, yes. That's how I was able to support XQuery using the Saxon XSLT run-time, just by implementing a different parser as a front end. But there are some areas where the XQuery syntax is much more appealing. And actually, the fact that the XQuery language is smaller is one of its strengths. With a database query language, everything revolves around optimization to make maximum use of indexes defined in the database, and the smaller the language is, the more it can be optimized. I think in practice we'll see some applications that can only be done using XQuery, some that are much more easily done in XSLT, and a middle ground where people will use either language based on personal preference. I also think we'll see applications where the two languages are used together: typically, XQuery to grab the data from the database, and XSLT to handle the end-user presentation. The fact that they share the same data model means it should be possible for the two languages to interwork very effectively.

IP: And how does SQL/XML fit into all of this?

MK: SQL/XML is (if you'll forgive the simplification) about how to create an XML view of relational data. XQuery is about how to query data that's either stored in XML form, or pretends to be. So the two things are very complementary.

IP: There has been a flurry of activity in the Saxon community coinciding with a few recent releases — what's the scoop there?

MK: Yes, Saxon has been moving forward relentlessly. It's actually moving forward on three fronts: as an XSLT processor, as an XQuery processor, and now as an XML Schema processor. It now has its own company, Saxonica, to develop and support it, which also means it has to pay its own way in the world. Since the W3C specifications make the distinction between a schema-aware processor and a basic (non-schema-aware) one, I decided to do the same with Saxon, and to have a commercial product with schema support alongside the open-source product without. I think that also fits the market profile: people who need a schema-aware processor are typically using XML in some fairly mission-critical ways.

IP: And is Saxon the only component to fully support XSLT 2.0 and the XQuery 1.0 working draft?

MK: Actually, Saxon is still the only XSLT 2.0 processor in town, quite apart from its XQuery and XML Schema capabilities. I think that the more serious commercial XSLT vendors have been waiting until the specs are finished, while the amateur open-source players have probably been finding it's too much work. Microsoft is another story again: the message from the bloggers seems to be that they never made any money on their XSLT 1.0 products, so they've shifted resources to XQuery in the hope they can do better there. Which is sad, but it leaves the field wide open for other players. Adding XQuery as a front-end to the existing Saxon run-time engine gave Saxon an immense head-start over people developing XQuery from scratch, because 90% of the code is common between the two, and all the optimization work carries over unchanged. The level of user interest in Saxon is fantastic — 3000 downloads of Saxon 8.0 in a month - and I'm seeing evidence that there's a corporate market that's interested in the commercial product not because of its extra capabilities, but simply because it is a commercial product.

IP: We heard recently on an XML-DEV thread that you started using Stylus Studio® — what made you come to that decision?

MK: I wouldn't say it was a decision as such. I'm trying out new XML tools all the time, and in most cases I stop using them either because I get irritated with them or because they don't offer anything I can't do with a text editor. (Or because they can't do things that I can do with a text editor.) Somehow Stylus Studio® must have got the balance right because I'm still using it. I don't use all the facilities, I don't imagine anyone does, because it caters to a very wide range of different kinds of user with different needs and different foibles. I've never been a great fan of visual programming myself, I prefer hacking in the angle brackets, but I know there are people who swear by it. One of the things Stylus Studio® has definitely got right is that it doesn't lock you in - I'm very reluctant to commit my work to any tool if I think it will cut me off from using different tools. I use Stylus Studio® mainly for XML editing. I work most of the time in the text view, switching occasionally to the tree view to navigate my way around a big document, or using XPath expressions to follow a cross reference. I don't actually do much stylesheet development myself, and when I do it's because I'm debugging a new Saxon version, so that side of things is less important to me; but I was doing some work recently for a client, and Stylus Studio® made it much easier for me to convince them that XSLT was within their capabilities. Another thing I find really handy is the ability to run a document through different schema processors at the touch of a button. Overall Stylus Studio® is an exceptional XML IDE and the main reason I think is that it allows you to work in your own way, rather than forcing you into a style you're not comfortable with.

IP: Well thanks for that, and obviously we think highly of Saxon, too. As well as pushing the technology forwards at a fairly astounding pace, you seem to have a knack for finding the sweet spot between standards conformance and extensions, between performance and good diagnostics. Where should people go to learn more about Saxon?

MK: It's all at http://www.saxonica.com/. If you're already using another XSLT processor, especially a Java processor, then you'll normally find that you can just plug Saxon into your existing application with no hassle at all. One user I know of did that and got a tenfold performance improvement overnight. But I think that the main reason people are downloading Saxon is because they want to try out XQuery and XSLT 2.0. There are people who think it's too risky to adopt these languages before the specs are signed in blood (and I understand that position), but there are lots of others who are taking the plunge and reaping the benefits.

IP: So how significant is Saxon's ability to work with XML Schema?

MK: I think it will take a little time before the XQuery user community really learns to appreciate what this can achieve. I've found that it changes the development experience completely. Instead of writing a stylesheet (or query) that produces some kind of output, and then gradually tweaking it so that it produces the right output for all the documents that you've tested it on, you get error messages right up front, even at compile time, telling you that your code is incorrect. That can be pretty startling at first. It's a particular shock when it reports bugs in code that's been in live use for a year or two. In fact, it makes stylesheet development much more like programming in conventional languages like Java: most of your mistakes result in error messages from the compiler, not in garbage output. I think schema-aware XSLT and XQuery is much more suitable for creating robust enterprise-critical applications. It's a big leap from where we are today.

IP: And now for the shameless book plug. You have yet another book coming out this summer on XSLT! This is your ... fourth book published to date? Just how do manage to do all of this, anyway?

MK: It's the third edition of XSLT Programmer's Reference, and it's being published in two volumes this time, one covering XSLT 2.0 and another book on XPath 2.0. They should hit the streets together in August. Yes, it was a big project. I started in September last year and it kept me busy most evenings and weekends through the winter. Half a million words, I reckon. But we managed to keep it up to date with the changes in successive W3C drafts; in fact, it's looking likely that when the books appear they will cover some features that W3C hasn't published yet! I've always enjoyed writing, as with developing software it's a very creative activity and it's particularly rewarding when you know that you're reaching a large and appreciative audience. The publishers want me to add a third volume on XQuery, but I haven't committed yet - I need a break!


Editor's Note: If you liked this interview, consider subscribing to The Stylus Scoop, our bi-monthly XML developer newsletter!

PURCHASE STYLUS STUDIO ONLINE TODAY!!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Try Stylus XQuery and XSLT tools

Download a free trial of Stylus Studio and start building advanced XML applications today!

Attend a Live Webinar This Week!

Learn about Stylus Studio's unique features and benefits in just under an hour. Register for the Stylus Studio QuickStart Training WebCast!

Why Pay More for XML Tools?

With Stylus Studio® X16 XML Enterprise Suite, you get the most comprehensive XML tool suite at one incredibly low price. Value: it's just one of many reasons why smart XML developers are choosing Stylus Studio!

Using Relational Data in XML Applications

Working with Relational and XML Data? Not sure where to start? Read Using Relational Data in XML Applications by Dr. Michael Kay. This tutorial covers an overview of the mostcommon use cases of where it's typical to use Relational and XML data together, and provides insight into the benefits & drawbacks of several approaches, including XQuery and SQL/XML.

 
Free Stylus Studio XML Training: