Re: Comparing documents: what of P is a subset of D?

Play the video

Subject: Re: Comparing documents: what of P is a subset of D?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 27 Feb 2014 11:25:53 +0100

<cca><!-- a D XML -->
  <rela _ix='0' fa='0' fb='1'>
     <fc _ix='1' fc_fa='X1' fc_fb='1'/>
     <fc _ix='2' fc_fa='X2' fc_fb='2'/>
  </rela>
  <rela _ix='1' fa='10' fb='11'>
     <fc _ix='1' fc_fa='Y1' fc_fb='11'/>
     <fc _ix='2' fc_fa='Y2' fc_fb='12'/>
  </rela>
  <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1' fc_fa='A1' fc_fb='51'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52'/>
  </rela>
  <relb>...</relb>
  <relc>...</relc>
</cca>

<cca><!-- a P XML -->
  <rela _ix='1' fa='10'>
     <fc _ix='1' fc_fa='Y1' fc_fb='99'/>
  </rela>
 <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1'                 fc_fb='51' fc_fc='123'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/>
  </rela>
</cca>

Expected output:

/cca/rela(1)/fa   10
/cca/rela(1)/fc(1)/fc_fa   Y1
/cca/rela(5)/fa   50
/cca/rela(5)/fa   51
/cca/rela(5)/fc(1)/fc_fb   51
/cca/rela(5)/fc(2)/fc_fa   A2
/cca/rela(5)/fc(2)/fc_fb   52

Note that parentheses enclose values of @_ix.

-W

On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> It would be easier to understand the problem with some example data.
>
> Michael Kay
> Saxonica
>
> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
>
>> The data model for a set of similarly (but not identically) built XML
>> documents is: a collection of arrays of records, which may contain
>> (recursively) arrays, records and scalars. (The terms "array" and
>> "record" are used in their "classic" meaning as, e.g., in Pascal.)
>> Document structures are fairly stable, but they do change over time.
>> Array elements are identified (indexed) by @_ix, not by position.
>> Record fields can be elements or attributes (when they are scalar).
>> Order is undefined, since XPaths plus @_Ix's pinpoint each node.
>>
>> One XML document D contains a full population for such a data set
>> (O(1MB)). A second XML document P contains "patches", i.e., each node
>> appearing in P is expected to be in D as well.
>>
>> If S(P) is the sequence of nodes (annotated with their XPaths) in P
>> and S(D) the one with nodes from D, how can I determine S(P) intersect
>> S(D) (except all @_ix, whose values are bound to be identical)? Of
>> course, I don't want the common set of *data items* - I want the XML
>> paths of those common data items.
>>
>> A solution (in XSLT 2.0) should not need individual adaption for each
>> kind of data set.
>>
>> I'm confident that I can create text files for D and P containing one
>> line <path> <value> for each node and run diff (after sort).
>>
>> Any better ideas?
>>
>> Cheers
>> Wolfgang

Current Thread
Comparing documents: what of P is a subset of D? Wolfgang Laun - 27 Feb 2014 08:05:21 -0000 Michael Kay - 27 Feb 2014 09:55:56 -0000 Wolfgang Laun - 27 Feb 2014 10:26:03 -0000 <= Wolfgang Laun - 27 Feb 2014 11:11:30 -0000 Michael Kay - 27 Feb 2014 14:32:42 -0000 Wolfgang Laun - 28 Feb 2014 10:57:16 -0000

<- Previous	Index	Next ->
Re: Comparing documents: what, Michael Kay	Thread	Re: Comparing documents: what, Wolfgang Laun
Re: Comparing documents: what, Michael Kay	Date	Re: Comparing documents: what, Wolfgang Laun
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >