[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

XQuery and id()/idref(); Controlling the children of nodes in the result sequence

Maik Stührenberg maik.stuehrenberg at uni-bielefeld.de
Wed Apr 23 12:23:41 PDT 2008


  XQuery and id()/idref();
 Controlling the children of nodes in the result sequence
Hello,

I'm new to the list and tried to find the answer to my questions in 
several locations (including the list archive). So I apologize if I 
haven't searched thoroughly enough and the anwer has been given already.

Here's my problem:

We use a standoff annotation format for storing multiple annotated text 
files. The text files are used for defining a:span elements which 
delimit the textual information annotated by means of start and end 
positions (see example below).
The annotation is stored separately as children of the a:data element. 
In principle, everything is allowed underneath the a:data element (in 
the underlying XSD 'a.xsd' the a:data element is a wrapper for elements 
derived from a different namespace), however, there won't be any text 
nodes, only elements containing other elements or empty elements. So I 
won't have any information about the hierarchy of the children of a:data.
Connection between annotation and the annotated text is saved by the 
a:span attributes (which is declared as xs:IDREF in the XSD).

<a:collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.example.org/a a.xsd"
   xmlns="http://www.example.org/a" xmlns:a="http://www.example.org/a">
   <a:entry xml:id="c1" type="text">
     <a:spans>
       <a:span xml:id="seg1" start="0" end="20"/>
       <a:span xml:id="seg2" start="0" end="20"/>
       <a:span xml:id="to1" start="0" end="4"/>
       <a:span xml:id="to2" start="5" end="8"/>
     </a:spans>
     <a:data xmlns:b="http://www.example.org/b"
       xsi:schemaLocation="http://www.example.org/b b.xsd">
       <b:text a:span="seg1">
         <b:para a:span="seg1"/>
       </b:text>
     </a:data>
     <a:data xmlns:c="http://www.example.org/c"
       xsi:schemaLocation="http://www.example.org/c c.xsd">
       <c:sentence id="w35" a:span="seg2">
         <c:word a:span="to1" id="w36"/>
         <c:word a:span="to2" id="w37"/>
         <!-- ... -->
       </c:sentence>
     </a:data>
   </a:entry>
</a:collection>

When I try to use an XQuery to subsum all annotation that corresponds to 
a specific a:span element with the following XQuery example, I receive 
the output below.

declare namespace a="http://www.example.org/a";
declare namespace b="http://www.example.org/b";
declare namespace c="http://www.example.org/c";
element resultset
{
let $d := doc('instance.xml')
for $s in $d/a:collection/a:entry/a:spans/a:span
return
   <result span="{$s/@xml:id}" start="{$s/@start}" end="{$s/@end}">
     { $d/a:collection/a:entry/a:data//*[@a:span = $s/@xml:id] }
   </result>
}

<resultset>
   <result start="0" end="20" span="seg1">
     <b:text xmlns:b="http://www.example.org/b"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        a:span="seg1">
       <b:para a:span="seg1"/>
     </b:text>
     <b:para xmlns:b="http://www.example.org/b"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        a:span="seg1"/>
   </result>
   <result start="0" end="20" span="seg2">
     <c:sentence xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a"
        id="w35"
        a:span="seg2">
       <c:word a:span="to1" id="w36"/>
       <c:word a:span="to2" id="w37"/>
       <!-- ... -->
      </c:sentence>
    </result>
   <result start="0" end="4" span="to1">
     <c:word xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a" a:span="to1"
        id="w36"/>
   </result>
   <result start="5" end="8" span="to2">
     <c:word xmlns:c="http://www.example.org/c"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.example.org/a"
        xmlns:a="http://www.example.org/a" a:span="to2"
        id="w37"/>
   </result>
</resultset>

Several things are not perfect here:
- Is there any way to suppress the output of the namespaces in each 
element? Or to be more specific: what do I have to change to output all 
namespaces once (and only once) in the resultset element?

- The biggest issue is that the b:para element is output twice: as child 
element of the b:text element (which is quite fine) and alone. The same 
problem appears when looking at the c:word elements: they should not be 
included as children of the c:sentence element because they are related 
to different spans, but only as children of the respective result element.

- The third question I'd like to ask concerns the use of the fn:idref 
function in XQuery. My first examples of the query used idref() to 
select all those nodes underneath a:data that are related to a certain 
span -- but I didn't manage to get any output although all XSD files are 
available (I use Saxon-SA 9). What has to be changed in the XQuery to 
use the idref function?

Again I apologize for asking three questions in my first post to the list.

Kind regards,

Maik Stührenberg




Purchase Stylus Studio Online Today!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.