|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] A Couple of Questions - OOXML and SQLJim Melton jim.melton at acm.orgThu Apr 3 10:17:28 PST 2008
Scott, Actually, that is a very good (and, IMHO, important) question. I'm glad you asked it in this forum. First, I have to admit that I don't know anything specific about OOXML. I have a general clue what it is, of course, but don't know details about the structure, schemas, etc. IF OOXML happened to contain only formatting information (which a typical word processing document would), then XQuery wouldn't be any help locating, say, all red parts or all employees whose birthdays are this month. You'd be stuck looking for <heading1> or <emphasis> information. On the other hand, if (as I suspect it does) OOXML permits document authors to define their own elements to represent semantic concepts such as "part", "color", "employee", and "birthday", then XQuery would provide one good way of looking for such data in OOXML documents. Somebody with more specific information about OOXML will have to pursue this question for you. Now, to respond to your question about query algebras... As you obviously know, SQL is based on the relational model, which is in turn based on set theory. This is possible in large part because of the way SQL (and relational) data is structured. And I used the word "structured" very carefully to imply two things: "...the way the data is logically organized..." *and* "...the fact that the data is highly structured in nature...". It's the latter aspect that makes set theory work so well with SQL/relational data. In the SQL/relational models (they're not quite the same thing, as you also know), data is represented as "things" that we call "tuples" or "rows", each of which has one or more "attributes" or "columns", each of which contains a piece of data (in SQL's case, that piece of data is permitted to be the special flag that we call "the null value" to indicate that the data is absent, irrelevant, unknown, not applicable, etc.). We refer to data with this characteristic as "structured" or "regular". By contrast, data represented in XML is often much less regular. In some kinds of XML, the data might well be very regular and thus highly structured (with, in fact, more "structure" than is normally feasible in the relational model). But XML doesn't require that. Consider the case of a book or newspaper or contract, in which characteristics such as boldface, italics, underlining, etc. are sprinkled (effectively) at random throughout paragraphs. That sort of data is very difficult to represent in a relational world because of its unpredictability. Consider another form of data in which values captured from physical sensors are represented. Every second, data is gathered, put into an XML format, and transmitted to a consumer. But sensors (all physical objects) are unreliable and sometimes one or more sensors do not respond to the inquiry, or some of the data from a single sensor is garbled but the rest valid. It is necessary to be able to transmit partial information to the consumers. That, in turn, means that the structure of the XML data is (allowed to be) irregular, or less structured. When data becomes semi-structured (as in the sensor example) or unstructured (consider a 4 year old child playing with a word processor), it's difficult to apply mathematical principles, especially set theory, to such data. That doesn't mean that all is lost. But, first, allow me to clarify the SQL situation a bit more. Yes, SQL can be transformed into an algebra, and most products do this to a large degree for execution. But SQL itself is more nearly (not completely, though!) a calculus than it is an algebra. SQL, like XQuery, is mostly a declarative language in which query authors state the intent of the query instead of the algorithm for finding the answers. Thus, in essence, an SQL compiler transforms a calculus-like language into an algebra for execution (well, for further compilation into executable code). It is the optimizer in an SQL system that guides that transformation to make queries reasonably efficient. XQuery is not substantially different from 10,000 meters. It is a declarative, calculus-like language that is often (usually? always?) translated into a sort of algebraic form. And optimizers are responsible for making the result of that transformation efficient. But that's where the similarity breaks down. There is not, as far as I am aware (full disclosure: I do not read a lot of research papers, so I could be 'way out of date here), a well-defined, rigorous algebra associated with XML data, the XPath/XQuery Data Model, or XQuery. I would not be surprised if there never was, but I wouldn't be stunned if there will be, either. Hope this helps, Jim >Date: Wed, 2 Apr 2008 22:29:39 -0700 >From: "Tsao, Scott" <http://x-query.com/mailman/listinfo/talk> >Subject: A Couple of Questions - OOXML and SQL >To: <http://x-query.com/mailman/listinfo/talk> > >During a recent XQuery Overview presentation, there were a couple of >questions raised which I am searching for answers: > > 1. Office Open XML (OOXML) is a file format used by the Microsoft >Office 2007 applications. Can XQuery be used to get meaningful >information from an OOXML document, or would it only return items based >on formatting aspects (all heading 1s, or all list items). > > 2. SQL is based in part on Set theory from Mathematics, and Set >algebra. It allows set operations "update all red projects to green." >Does XQuery support set algebra? For example, SQL join is a set >operation that has inner, outer, Cartesian forms. >http://en.wikipedia.org/wiki/Algebra_of_sets > >Do you have answers to those questions? If you do, please do share! > > >Thanks, > >Scott Tsao >Associate Technical Fellow >The Boeing Company ======================================================================== Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Co-Chair, W3C XML Query WG; XQX (etc.) editor Fax : +1.801.942.3345 Oracle Corporation Oracle Email: jim dot melton at oracle dot com 1930 Viscounti Drive Standards email: jim dot melton at acm dot org Sandy, UT 84093-1063 USA Personal email: jim at melton dot name ======================================================================== = Facts are facts. But any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. = ========================================================================
|
Purchase Stylus Studio Online Today!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






