[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The use of XML syntax in XML Query


Re:  The use of XML syntax in XML Query

Dare Obasanjo wrote:

> I'm still waiting for someone with a theoretical background to debunk the
> statements in the paper "On Database Theory and XML"[0] that it is is an
> unsolvable problem to guarantee that one can create a query that for any
> given XML input will generate output that conforms to a specified XML schema.
>
> [0] http://www.cs.washington.edu/homes/suciu/files/_F2066943700.ps

That paper has an interesting example.  Paraphrasing some,
given the input DTD:

	<!ELEMENT root (elm*)>

the query:

	element result {
	    for $x in //elm RETURN element a { $x/text() },
	    for $x in //elm RETURN element b { $x/text() },
	    for $x in //elm RETURN element c { $x/text() }
	}

and the output DTD:

	<!ELEMENT result (
		( (a,a)*, (b,b)*, (c,c)* )
	    |   ( a,(a,a)*, b,(b,b)*, c,(c,c)* ) ) >

(which says that the number of a's, b's, and c's all have
the same parity, i.e., either an even number of each or an
odd number of each), then the query will always produce a
valid output given a valid input.  However, no type system
based on regular grammars will be able to deduce this fact.
The best a type inferencing algorithm will be able to come
up with given the input DTD and query is:

	<!ELEMENT result (a*, b*, c*)>

which is a proper superset of the output DTD.  The *real*
language generated by the query is

	element result { a*n, b*n, c*n : n >= 0 }

(where "a*n" means, roughly, <element name="a" minOccurs="n"
maxOccurs="n"/>), which is a proper subset of the output DTD
(so the query is type-safe) but this language is not expressible
in a regular grammar.

Basically this means that any type inferencing algorithm
for XQuery (or an XQuery-like language) based on DTDs (or a
DTD-like type system) cannot be complete: it will always
reject some type-safe queries.

That doesn't mean that a *sound* type inferencing algorithm
isn't possible, i.e., one that produces no false positives
but may answer "no", "maybe", or "I don't know" on some
correct inputs.  Personally I don't think this limitation
is much of a problem in practice: ML and Haskell have the
same property and their type systems are still tremendously
useful.

The astute reader will note that the example output DTD is not
deterministic.  It's not clear whether requiring deterministic
content models helps the problem any; my guess is that it doesn't.

As for Suciu's claim that "When one adds joins, typechecking
becomes undecidable" [p. 5]:  I'm not quite up to debunking it,
but it sounds bogus to me; regular languages are closed under
cartesian product and homomorphisms, which is basically what
a join gives you.


--Joe English

  jenglish@f...

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.