Priscilla Walmsley on XQuery and XML Schema Technologies
Priscilla Walmsley has been working closely with XML Schema and XQuery for years.
She was a member of the W3C XML Schema Working Group from 1999 to 2004,
Stylus Studio® is the leading XML IDE for XML data integration, featuring advanced support for XQuery development, including XQuery editing, mapping, debugging, and performance profiling. Ivan Pedruzzi, Stylus Studio®'s Senior Product Architect, and editor of The Stylus Scoop newsletter, caught up with Ms. Walmsley at the XML Conference & Exhibition 2004 (XML 2004) last month, where Ms. Walmsley gave a presentation entitled Introduction to XQuery (Session Slides PDF 2.0MB). The two chatted about the XQuery buzz, XML Schema, XQJ technologies , and other hot topics in the XQuery development arena.
Ivan Pedruzzi: Hi, Priscilla. Thanks for taking the time to meet with The Stylus Scoop today. What got you first interested in XQuery?
Priscilla Walmsley: Hi Ivan, it's good to talk to you, too. I was immediately attracted to XQuery because it has an intuitive syntax that I enjoy using and stretching to its limits. Having spent many years using SQL, XQuery feels familiar, yet much more powerful.
I've enjoyed working with XSLT and XPath 1.0 over the years, but for some of the work I've done they felt like an awkward fit. For a transformation scenario where I'm saying "every time you get an x element, do this" it works great. But for applications that involve selecting a subset of an XML document, joining it with other data, and performing calculations or manipulating it in some way, I've sometimes felt like XSLT was making me force a square peg into a round hole.
XQuery embedded in program code is a great way to reduce (and transform) the set of data you're working with rather than tediously traversing the DOM model of an entire document. In the past I've done this with XPath, but XQuery lets me join multiple data sources easily and sort my results, actions that are not part of XPath.
Being a true data-head, I also really like the typing capabilities of XQuery. Some of the advanced functionality of XQuery is more data-oriented, and there are some compelling benefits for using XQuery with XML Schemas.
IP: Can you follow up on that for our readers? To what extent does XQuery provide support for XML Schema? And who cares?
PW: (laughs) You can use XML Schemas with your queries, both to assign types to data and to validate it. This includes both the input XML and your query results. There are a few ways that schemas can improve your queries, the most important being the ability to find errors in your queries that you might not have found otherwise. For example, suppose you misspell the name of an element, or you assume that a child element can only appear once, when the schema says it can be repeating. These errors can be identified by the processor during what's called static analysis, before the query is even run on any particular input.
To use an SQL analogy, you wouldn't want an SQL statement that had a misspelled column name to just come back with nothing instead of raising an error. Without XML schemas, this is exactly what will happen. Some of these errors may become obvious if you notice that your results come back wrong or incomplete. But in all likelihood, your testing process is not going to uncover all of them.
Being able to find query errors based on the schema becomes especially handy as the structure of the XML you're querying changes. As we all know, there are continual changes made to Web Services interfaces, XML stored in databases, and even XML documents stored in file systems. If I have a modified schema, I can immediately tell where my queries need to be revised to match it. Of course, managing the many versions of schemas and queries as they evolve can be quite a challenge.
Another benefit of using schemas with queries is that you can query elements differently depending on their type. To use a popular example, say you have address elements that have different types depending on the country, like USAddressType, UKAddressType, etc. You can use an "instance of" expression in XQuery to determine what kind of address it is, then select or format it differently depending on that type. You could do this kind of thing based on element name in XSLT 1.0, but not if you had multiple elements with the same name but different types. XQuery really allows you to take advantage of some of the features of more complex schemas like derived complex types, type substitution, and substitution groups.
I should mention that all of these benefits also apply to XSLT 2.0. Whether you use XQuery or XSLT 2.0 is more a matter of style, and possibly performance depending on your particular use case.
IP: Have XML Schema, built-in types, namespaces, XPath, and white space handling made it too difficult to develop an XML query language? Should XQuery have played the rebel and done some of this its own way?
PW: There's no doubt that the XQuery recommendation is large and complex. This does put more of a burden on implementers, since they have to support all the XML Schema built-in data types, and almost 200 built-in functions and operators.
But, in my opinion, the benefits outweigh the costs of implementation. And from a user's perspective, much of this complexity is optional. Query authors don't have to use schemas at all if they don't want to. They don't even have to care about types in most cases, because a lot of type conversions happen automatically. For example, suppose you call the sum function to add up the values of some price elements. If there is no schema, the price elements are untyped, and they are automatically converted to numbers. If you're not using namespaces, you don't have to bother with them in XQuery, except perhaps to prefix some function names.
It might have been nice if they had published a simpler XQuery 1.0 a while ago, and then added the full type system and schema support in a later version. But then they would have run the risk that retrofitting these features would result in something klugey. I think they went about it the right way.
IP: There was quite a lot of buzz around XQuery technologies in both the technical sessions and the vendor presentations at XML 2004 was there anything new that you found to be particularly innovative or interesting in terms of new technology trends in XQuery?
PW: There are obviously many use cases for XQuery (almost as many as there are for XML!), but one that seems to be gaining increasing interest is its role in integrating hugely disparate data sources. Back in my DBA days, we would look at all that semi-structured and unstructured data stored in Word documents, and stuck in design tools and all those other formats, and shake our heads. We knew there was good information there, but figured it was hopeless to try to capture it or reuse it for anything — getting it would have simply required too much effort. These days, just about anything (even Word documents!) can be represented as XML. And XQuery is a natural fit for integrating this disparate data, not just because it allows you to select from multiple data sources, but also because it is flexible enough to let you do a lot more than just your typical SQL-like joins.
One of the neat ideas in this space was DataDirect XQuery, which lets you query relational and XML sources together using XQuery. From your Java code, you just write a query and execute it using XQJ, which is an XML data access API with JDBC-like syntax, and DataDirect processes the XQuery and takes care of the access to the data sources. I think that most database developers will find this approach to be more familiar and easier to work with, especially compared to existing ways of implementing this kind of work, such as DOM or XSLT programming.
IP: Priscilla, I just want to point out that you can sign up to be a beta tester for the DataDirect XQuery Preview on our Web site. Now, without further commercial interruption...
PW: (laughs) Does this mean I don't get to plug my new book, Definitive XQuery, to be published in 2005?
IP: Of course not!
PW: Thank you, Ivan. Now, where were we?
IP: I was asking you about the XQuery buzz at XML 2004...
PW: Right! Something that really stood out to me was the amount of detailed information and demos from the major relational database vendors on their support for XML and XQuery. The line between the traditionally relational DBMS products and the "native" XML databases is really starting to blur.
IP: Is the addition of native XML storage and XML Query facilities to mainstream relational databases like Microsoft SQL Server and Oracle9i a vindication for native XML databases, or a sign of impending doom?
PW: Well, probably both, though "doom" is a bit overdone. Obviously Microsoft, Oracle and other relational database vendors are responding to market pressure by providing XQuery implementations to some degree or another simply because people want to store and query XML. I don't think anyone expects people to be moving mission-critical business data from relational to XML storage any time in the foreseeable future, but the native XML database vendors like Sleepycat Software or Mark Logic Corporation are finding markets in which native XML storage is the right solution, so they don't have to sell themselves as just providers of generic database management systems for XML. And these are not just niches, either — content management and integration, document indexing and searching, Web services, relational-to-XML caching these are all examples of native XML storage being the right solution. And vendors like Sleepycat or Mark Logic are likely to add value by providing specialized extensions and tools to enhance productivity in those areas.
IP: What do you think of XQuery tools?
PW: Until fairly recently, the only XQuery tools available were simple prototypes. In the past few months several great implementations have been released. I've been impressed by how easy to use the Stylus Studio® user interface is. Lately I've been using Stylus Studio® to write and test the example queries for my latest book. I wish I'd had Stylus when I started writing! It has helped me avoid a lot of trial-and-error debugging. And integrated support for Saxon (www.saxonica.com) has really impressed me too; Saxon is an excellent processor around which to build XQuery. I like that I can use Stylus Studio® and Saxon together. And of course, as I was saying earlier, XQuery is very much related to XML Schema and so the integrated XML Schema Editor is definitely a plus.
IP: Well thanks. We’ve been very bullish about the promise of XQuery from the jump our partnership with Saxonica is the most outwardly visible sign of this, of course. But we work hard to bring vision and innovation to XML development tools, and it's gratifying to know we're hitting the right notes.
Now, where can people learn more about XQuery?
PW: I'm currently finishing up work on my book, Definitive XQuery, for Prentice Hall. Like my other book [Definitive XML Schema, Ed.]), it is designed to be used as both a tutorial and a reference. It covers the entire XQuery language, including all the overlap with XPath 2.0. The reference part of the book has detailed descriptions and examples of all the built-in functions and types; something that will be useful to both XQuery and XPath 2.0 users. It will probably be out in the third quarter of 2005, depending on when XQuery becomes a Candidate Recommendation.
IP: And you've written an XQuery Function Library?
PW: Yes as I was writing the book, I came up with a series of illustrative examples which I call "useful functions". What sets them apart from regular examples is that they are likely be used by readers in their own queries. They range from string functions like substring-after-last and last-index-of, to functions that modify element and attribute nodes, such as add-attribute, change-element-namespace, and so on. These functions are not built into XQuery because clearly there is a benefit to keeping the recommendation smaller. But they would be useful to a lot of query authors. I eventually realized that I had hundreds of ideas for these functions — far too many to put in the book. So, I'm working on putting these functions together into a library that will be available through my company (http://www.datypic.com). The first version should be out in March 2005.
IP: Ah, too late for holiday giving!
PW: Well, a good XQuery function library is a gift that can be enjoyed year 'round, Ivan!
IP: (laughs) I've enjoyed chatting with you, Priscilla. I look forward to reading your book and maybe we can touch base at XML 2005 it should be an exciting year for XML technologies, and it will be fun to look back.
PW: Yes, it will. Cheers!
Editor's Note: If you liked this interview, consider subscribing to The Stylus Scoop, our bi-monthly XML developer newsletter! Also, if you're interested in learning more about XQJ and becoming a Beta Tester for the DataDirect XQuery Technology Preview, visit our Beta tester sign-up page.
PURCHASE STYLUS STUDIO ONLINE TODAY!!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Learn XQuery in 10 Minutes!
Say goodbye to 10-minute abs, and say Hello to "Learn XQuery in Ten Minutes!", the world's fastest and easiest XQuery primer, now available for free!