Re: The limitations of XPath and navigation for XML datab

From: mike@a...
To: "Michael Kay" <mike@s...>
Date: Fri, 08 Feb 2008 20:46:22 +0000

Play the video

My reply to Michael Kay is in red.

>XPath user navigation for selecting data going down a path does require coding the data types in the order they are retrieved.

Sorry that simply isn't true. You can (and products do) implement a path expression such as A/B[@id='4']/C using a wide variety of different access strategies. You do not have to "go down the path" from the root to the leaves; you can start at the leaves and work up, or start in the middle and work both ways. This expression is not a sequence of step-by-step instructions, it is a request to find C elements that have a B parent whose @id is 4 and whose parent is an A: it is a declarative specification of the characteristics of the elements that you want to retrieve. Suggesting this is procedural is like suggesting that regular expressions are procedural - you've confused the specification and your first instinct at a naive algorithm for implementing it.

I will say I should have used the word ?encountered? instead ?retrieved? in my statement above. A before B, B before C. The user needs to know the structure and specify the navigation in that order. As an external user communicating what I want, A/B[@ID=?4?]/C, says: starting at the As, goto Bs with an ID of ?4?, and then select all Cs with the qualified B parents. How it is performed internally is not the user?s concern as long as the internal processing semantics match the external operational semantics. In SQL: ?SELECT C.vals FROM ABCview WHERE B.id=?4?? is a more nonprocedural having separated structure definition from the data request no longer requires the user to have knowledge of structure enabling their query to be free of the structure. This is navigationless processing. This does not imply that no navigation is occurring internally.

>Multi-leg queries (known as LCA queries) such as selecting data from one leg of a hierarchical structure based on data in another leg of the structure was used in the example at the end of the article. So there was an example demonstrating a multi-leg query. LCA stands for Lowest Common Ancestor and also goes by a couple of less used acronyms.

Sorry, I only got to about page 6 before giving up. Page 6 has a "Conclusion", namely "Even XQuery designed from the ground up to process XML structures requires procedural navigation keeping its processing limited to linear processing for the most part." which I think is so fundamentally wrong (and it's not exactly a pleasure to read either) that I didn't feel it was worth reading on to the Appendix.

The paragraph this is mentioned in is talking about the XML industry has yet to advance to full nonlinear multi-leg hierarchical processing because separate navigations of linear legs can not be practically joined to preserve and process full hierarchical LCA queries in XQuery. XQuery also supports the inner join which flattens hierarchical structures invalidating hierarchical structures for correct hierarchical output.

The ANSI SQL hierarchical prototype used in the example can perform full multi-leg (nonlinear) hierarchical processing by limiting its operation to only defining and operating on hierarchical structures to produce fully accurate hierarchically processed multi-leg queries. The standard ANSI SQL engine is performing this full hierarchical processing. ANSI SQL?s Cartesian product operation supports full LCA processing when performing hierarchically. The SQL hierarchical prototype is using a standard ANSI SQL commercially available processor as its hierarchical processor engine.

I believe that XPath 2.0 is relationally complete and that it can therefore perform any query that SQL can perform. In fact it can do more, because it can do some (not all) recursive queries, which are beyond the capability of SQL without extensions, and which are very important in processing many kinds of hierarchical data. For queries of this complexity, however, XQuery is a much more suitable candidate than XPath. I agree that XPath 1.0 was not able to perform arbitrary joins - but that's history.

With SQL-92, one sided (hierarchical) outer joins with separate ON clause joins at each join point, they can link back on the hierarchical structure being modeled to define multi-leg structures. SQL?s syntax models the hierarchical structure; SQL?s semantics processes it hierarchically. I do not beleive XPath 2.0 or XQuery can match this natural full multi-leg hierarchical data modeling and processing capability

SQL-99 does support recursive joins.

The main reason XQuery is more suitable is that it allows you to write functions, which are the equivalent of views in SQL (though again they are more powerful because they can be recursive, which is needed for hierarchical data).

The ANSI SQL hierarchical prototype uses standard SQL views to define full nonlinear multi-leg hierarchical structures. The prototype treats them as hierarchical structure meta data and can adapt processing automatically to each specific query. This allows it to hierarchically optimize the query request based on the query selection specification eliminating overhead for global views by dynamically removing unneeded hierarchical paths from processing. Hierarchical optimization can only be performed on hierarchically structured data. This is performed totally before standard SQL optimization processing is performed. These hierarchical views can be easily dynamically combined into hierarchical structures also using the standard one sided outer join. SQL-99 supported recursive structures may be able to be adapted to full hierarchical recursive structures.

I haven't attempted to code your example in XQuery because I don't think I have fully understood the query specification. That's partly because I haven't written any SQL for about 25 years and I find it very hard to remember its arcane syntax and semantics; also you seem to be using SQL extensions that I haven't come across before.

With SQL?s arcane syntax, the SQL ON (join) clause is similar to XPath?s navigation and data filtering/qualification processing, while the WHERE clause operates hierarchically across the total nonlinear multi-leg structure being processed. So there is a big difference between SQL?s ON and WHERE clauses, especially for hierarchical processing use. Multi-leg queries require LCA processing as described above to operate automatically in ANSI SQL. The more hierarchical paths involved in the query, the more compounded the LCA processing becomes with multiple LCAs that can also be nested. This is too complex and error prone for XQuery to perform practically. As some proof of this, google ?XQuery LCA? and you will locate a number of academic projects that attempted to add LCA processing on top of XQuery and why they felt the need to.

Regards,

/Mike (David)

Follow-Ups:
- Re: The limitations of XPath and navigation for XML database processing
  - From: Jim Melton <jim.melton@a...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >