[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Comparing documents: what of P is a subset of D?

Subject: Re: Comparing documents: what of P is a subset of D?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 27 Feb 2014 11:25:53 +0100
Re:  Comparing documents: what of P is a subset of D?
<cca><!-- a D XML -->
  <rela _ix='0' fa='0' fb='1'>
     <fc _ix='1' fc_fa='X1' fc_fb='1'/>
     <fc _ix='2' fc_fa='X2' fc_fb='2'/>
  </rela>
  <rela _ix='1' fa='10' fb='11'>
     <fc _ix='1' fc_fa='Y1' fc_fb='11'/>
     <fc _ix='2' fc_fa='Y2' fc_fb='12'/>
  </rela>
  <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1' fc_fa='A1' fc_fb='51'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52'/>
  </rela>
  <relb>...</relb>
  <relc>...</relc>
</cca>

<cca><!-- a P XML -->
  <rela _ix='1' fa='10'>
     <fc _ix='1' fc_fa='Y1' fc_fb='99'/>
  </rela>
 <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1'                 fc_fb='51' fc_fc='123'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/>
  </rela>
</cca>

Expected output:

/cca/rela(1)/fa   10
/cca/rela(1)/fc(1)/fc_fa   Y1
/cca/rela(5)/fa   50
/cca/rela(5)/fa   51
/cca/rela(5)/fc(1)/fc_fb   51
/cca/rela(5)/fc(2)/fc_fa   A2
/cca/rela(5)/fc(2)/fc_fb   52

Note that parentheses enclose values of @_ix.

-W

On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> It would be easier to understand the problem with some example data.
>
> Michael Kay
> Saxonica
>
> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
>
>> The data model for a set of similarly (but not identically) built XML
>> documents is: a collection of arrays of records, which may contain
>> (recursively) arrays, records and scalars. (The terms "array" and
>> "record" are used in their "classic" meaning as, e.g., in Pascal.)
>> Document structures are fairly stable, but they do change over time.
>> Array elements are identified (indexed) by @_ix, not by position.
>> Record fields can be elements or attributes (when they are scalar).
>> Order is undefined, since XPaths plus @_Ix's pinpoint each node.
>>
>> One XML document D contains a full population for such a data set
>> (O(1MB)). A second XML document P contains "patches", i.e., each node
>> appearing in P is expected to be in D as well.
>>
>> If S(P) is the sequence of nodes (annotated with their XPaths) in P
>> and S(D) the one with nodes from D, how can I determine S(P) intersect
>> S(D) (except all @_ix, whose values are bound to be identical)? Of
>> course, I don't want the common set of *data items* - I want the XML
>> paths of those common data items.
>>
>> A solution (in XSLT 2.0) should not need individual adaption for each
>> kind of data set.
>>
>> I'm confident that I can create text files for D and P containing one
>> line <path> <value> for each node and run diff (after sort).
>>
>> Any better ideas?
>>
>> Cheers
>> Wolfgang

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.