[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: The limitations of XPath and navigation for XML database

  • From: "Michael Kay" <mike@s...>
  • To: "'Jim Melton'" <jim.melton@a...>,<mike@a...>
  • Date: Fri, 8 Feb 2008 23:58:00 -0000

RE:  The limitations of XPath and navigation for XML   database
Yes, sorry, I was using the the term "SQL" as a synonym for relational, which is of course incorrect. I was thinking of "relational" in the sense of Ted Codd, and in that sense a lot of things that have been added over the years (like recursion) are indeed extensions to the relational model. And I was using "relationally complete" in the sense of Ted Codd's original papers - first order predicate calculus, no recursion. Sorry that my carelessness caused confusion.
 
Michael Kay


From: Jim Melton [mailto:jim.melton@a...]
Sent: 08 February 2008 22:27
To: mike@a...
Cc: Michael Kay; xml-dev@l...
Subject: Re: The limitations of XPath and navigation for XML database processing

I just have to leap into this discussion, since SQL is being bandied about ;^)  (As readers will either know already or see in my signature block at the end of this message, I have a very close relationship with SQL that has lasted more than two decades.)

First, I think that I agree with Mike Kay on one important point.  He said "I believe that XPath 2.0 is relationally complete and that it can therefore perform any query that SQL can perform."  XPath 2.0, along with XQuery 1.0, was designed to be a complete expression language and to be computationally complete.  I'm not quite certain what Mike means by "relationally complete", but I assume that he means that the language can perform operations equivalent to those defined by the relational model.

Second, I have to disagree with Mike Kay on another very important point.  He went on to say "In fact it can do more, because it can do some (not all) recursive queries, which are beyond the capability of SQL without extensions, and which are very important in processing many kinds of hierarchical data."  This statement is simply not true at all.  Standard SQL has had recursive querying capabilities -- meaning *without extensions* -- since SQL:1999 was published (which Mike David mentioned).  A number of SQL products have that capability built in (although some use non-standard syntax that accomplishes the goal).  Of course, SQL:2003 also has that feature, and the soon-to-be-published SQL:2008 will have it as well.

Finally, Mike Kay's statement that "...I haven't written any SQL for about 25 years" caught my eye.  If my arithmetic is correct, that would be about 1983, long before SQL-86 was published, and about the time that commercial products really began to appear.  Things have changed a lot since then, Mike ;^)  Seriously, I hope that your use of the word "extensions" doesn't imply that you think that all SQL features added since SQL-86 are "extensions".

Enjoying the thread,
   Jim


At 2/8/2008 01:46 PM, mike@a... wrote:

 My reply to Michael Kay is in red.

 

>XPath user navigation for selecting data going down a path does require coding the data types in the order they are retrieved.

 

Sorry that simply isn't true. You can (and products do) implement a path expression such as A/B[@id='4']/C using a wide variety of different access strategies. You do not have to "go down the path" from the root to the leaves; you can start at the leaves and work up, or start in the middle and work both ways. This expression is not a sequence of step-by-step instructions, it is a request to find C elements that have a B parent whose @id is 4 and whose parent is an A: it is a declarative specification of the characteristics of the elements that you want to retrieve. Suggesting this is procedural is like suggesting that regular expressions are procedural - you've confused the specification and your first instinct at a naive algorithm for implementing it.

 

I will say I should have used the word ?encountered? instead ?retrieved? in my statement above. A before B, B before C. The user needs to know the structure and specify the navigation in that order. As an external user communicating what I want, A/B[@ID=?4?]/C, says: starting at the As, goto Bs with an ID of ?4?, and then select all Cs with the qualified B parents. How it is performed internally is not the user?s concern as long as the internal processing semantics match the external operational semantics. In SQL: ?SELECT C.vals FROM ABCview WHERE B.id=?4??  is a more nonprocedural having separated structure definition from the data request no longer requires the user to have knowledge of structure enabling their query to be free of  the structure. This is navigationless processing. This does not imply that no navigation is occurring internally.

 

>Multi-leg queries (known as LCA queries) such as selecting data from one leg of a hierarchical structure based on data in another leg of the structure was used in the example at the end of the article. So there was an example demonstrating a multi-leg query. LCA stands for Lowest Common Ancestor and also goes by a couple of less used acronyms.

 

Sorry, I only got to about page 6 before giving up. Page 6 has a "Conclusion", namely "Even XQuery designed from the ground up to process XML structures requires procedural navigation keeping its processing limited to linear processing for the most part." which I think is so fundamentally wrong (and it's not exactly a pleasure to read either) that I didn't feel it was worth reading on to the Appendix.

 

The paragraph this is mentioned in is talking about the XML industry has yet to advance to full nonlinear multi-leg hierarchical processing because separate navigations of linear legs can not be practically joined to preserve and process full hierarchical LCA queries in XQuery. XQuery also supports the inner join which flattens hierarchical structures invalidating hierarchical structures for correct hierarchical output.

 

The ANSI SQL hierarchical prototype used in the example can perform full multi-leg (nonlinear) hierarchical processing by limiting its operation to only defining and operating on hierarchical structures to produce fully accurate hierarchically processed multi-leg queries. The standard ANSI SQL engine is performing this full hierarchical processing. ANSI SQL?s Cartesian product operation supports full LCA processing when performing hierarchically. The SQL hierarchical prototype is using a standard ANSI SQL commercially available processor as its hierarchical processor engine.

 

I believe that XPath 2.0 is relationally complete and that it can therefore perform any query that SQL can perform. In fact it can do more, because it can do some (not all) recursive queries, which are beyond the capability of SQL without extensions, and which are very important in processing many kinds of hierarchical data. For queries of this complexity, however, XQuery is a much more suitable candidate than XPath. I agree that XPath 1.0 was not able to perform arbitrary joins - but that's history.

 

With SQL-92, one sided (hierarchical) outer joins with separate ON clause joins at each join point, they can link back on the hierarchical structure being modeled to define multi-leg structures. SQL?s syntax models the hierarchical structure; SQL?s semantics processes it hierarchically. I do not beleive XPath 2.0 or XQuery can match this natural full multi-leg hierarchical data modeling and processing capability

SQL-99 does support recursive joins.

 

The main reason XQuery is more suitable is that it allows you to write functions, which are the equivalent of views in SQL (though again they are more powerful because they can be recursive, which is needed for hierarchical data).

 

The ANSI SQL hierarchical prototype uses standard SQL views to define full nonlinear multi-leg hierarchical structures. The prototype treats them as hierarchical structure meta data and can adapt processing automatically to each specific query. This allows it to hierarchically optimize the query request based on the query selection specification eliminating overhead for global views by dynamically removing unneeded hierarchical paths from processing. Hierarchical optimization can only be performed on hierarchically structured data. This is performed totally before standard SQL optimization processing is performed. These hierarchical views can be easily dynamically combined into hierarchical structures also using the standard one sided outer join. SQL-99 supported recursive structures may be able to be adapted to full hierarchical recursive structures.

 

I haven't attempted to code your example in XQuery because I don't think I have fully understood the query specification. That's partly because I haven't written any SQL for about 25 years and I find it very hard to remember its arcane syntax and semantics; also you seem to be using SQL extensions that I haven't come across before.

 

With SQL?s arcane syntax, the SQL ON (join) clause is similar to XPath?s navigation and data filtering/qualification processing, while the WHERE clause operates hierarchically across the total nonlinear multi-leg structure being processed. So there is a big difference between SQL?s ON and WHERE clauses, especially for hierarchical processing use. Multi-leg queries require LCA processing as described above to operate automatically in ANSI SQL. The more hierarchical paths involved in the query, the more compounded the LCA processing becomes with multiple LCAs that can also be nested. This is too complex and error prone for XQuery to perform practically. As some proof of this, google ?XQuery LCA? and you will locate a number of academic projects that attempted to add LCA processing on top of XQuery and why they felt the need to.

 

Regards,

 
               /Mike (David)

 

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
  Co-Chair, W3C XML Query WG; XQX (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.