Re: comparing nodesets to each other

Play the video

Subject: Re: comparing nodesets to each other
From: "Aron Bock" <aronbock@xxxxxxxxxxx>
Date: Mon, 11 Apr 2005 17:12:29 +0000

Kai,

IMO the general problem of finding the differences between any 2 XML documents is, shall we say, challenging. Something that helps such an operation is being extremely precise about what constitutes a difference, and being able to formulate precedence rules in comparision operations. An earlier respondent illustrated the need for this with an example that "added" a node in the second document. It's very likely *you* have a good idea of what you're after, but in these types of problems you'll get the most help if you can express your "rules for comparision" in [formal] written form.

Consider the following documents:

doc1.xml
=======
<doc>
<chapter n="1"/>
<chapter n="2"/>
</doc>

doc2.xml
=======
<doc>
<chapter n="1"/>
<chapter n="2">
 <para n="1"/>
</chapter>
</doc>

What *exactly* would you like in your final output? Do you want to see only the node <para n="1"/>? Do you want to see <para n="1"/> and all its parent nodes? You see where this is going? It helps to be precise.

Also, while writing a "general" differencing algorithm would be worthwhile, it's probably not simple. To start you'll have better luck if you constrain your problem, as it relates to your domain. One way to do this is by identifying a least granular level for your purposes--perhaps a node or "level" below which identifying differences is superfluous. In the example above, you could say:

--chapter nodes are compared by their "n" attribute --if there are any differences betweein 2 <chapter> nodes or any of their descendents, the entire <chapter> node is considered "changed", and that of doc2.xml is output

I've done this type of "constrained" comparision with success.

Here's another approach to consider: preprocess each xml document to a "standard" format, then use a textual diff tool. The idea here is that you apply an XSL transform to doc1.xml so that <chapter> nodes are sequential, their descendents are ordered is a specific way, etc. Do the same with doc2.xml. Then use a diff tool ( eg: beyondcompare, from http://www.scootersoftware.com/ ) to check differences. Note, this method is susceptible to line-breaks, so it's not trivial to implement.

Regards

--A

From: "Kai Hackemesser" <kaha@xxxxxx>
Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re:  comparing nodesets to each other
Date: Mon, 11 Apr 2005 18:18:47 +0200 (MEST)

Hello, David,

Thanks for the response. The errors you mentioned already have happened,
that's why I'm currently clueless how to solve it.

I try to show the structure of the recipe (eased):

<object>
  <relation>
    <Attribute Type="string" Name="FindNumber">
      <Value><![CDATA[0005]]></Value>
    <Attribute>
    <Attribute Type="float" Name="...
    <object>
      <Attribute Type="string" Name="PartNumber">
        <Value><![CDATA[Part1]]></Value>
      </Attribute>
    </object>
  </relation>
  <relation>
    <Attribute Type="string" Name="FindNumber">
      <Value><![CDATA[0010]]></Value>
    <Attribute>
    <Attribute Type="float" Name="...
    <object>
      <Attribute Type="string" Name="PartNumber">
        <Value><![CDATA[Part2]]></Value>
      </Attribute>
    </object>
  </relation>
  <relation>
    <Attribute Type="string" Name="FindNumber">
      <Value><![CDATA[0015]]></Value>
    <Attribute>
    <Attribute Type="float" Name="...
    <object>
      <Attribute Type="string" Name="PartNumber">
        <Value><![CDATA[Part3]]></Value>
      </Attribute>
    </object>
  </relation>
</object>

needs to be compared against a similar structure:
<object>
  <relation>
    <Attribute Type="string" Name="FindNumber">
      <Value><![CDATA[0005]]></Value>
    <Attribute>
    <Attribute Type="float" Name="...
    <object>
      <Attribute Type="string" Name="PartNumber">
        <Value><![CDATA[Part1]]></Value>
      </Attribute>
    </object>
  </relation>
  <relation>
    <Attribute Type="string" Name="FindNumber">
      <Value><![CDATA[0015]]></Value>
    <Attribute>
    <Attribute Type="float" Name="...
    <object>
      <Attribute Type="string" Name="PartNumber">
        <Value><![CDATA[Part3b]]></Value>
      </Attribute>
    </object>
  </relation>
</object>

(Attribute nodes are more than one per object or relation node)

So I need to extract all differences like attribute change, missing nodes, altered nodes, added nodes. To identify a node I use the findnumber Attribute node of each relation node. A missing node is one, where the corresponding Findnumber Attribute value is missing in nodelist 'b'. An added node is one where the corresponding Findnumber Attribute value is missing in nodelist 'a'. An altered node means the Findnumber Attribute value is there in bothe nodelists, but the Attribute nodes or the object/Attribute nodes are different. I think a simple text compare would be enough for the test of alternation.

Regards,
Kai

_________________________________________________________________ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/

Current Thread
comparing nodesets to each other Kai Hackemesser - 11 Apr 2005 15:14:41 -0000 David Carlisle - 11 Apr 2005 15:35:49 -0000 Kai Hackemesser - 11 Apr 2005 16:19:10 -0000 Aron Bock - 11 Apr 2005 17:12:52 -0000 <= Kai Hackemesser - 11 Apr 2005 19:46:48 -0000 Aron Bock - 12 Apr 2005 03:57:02 -0000

<- Previous	Index	Next ->
Re: comparing nodesets to eac, Kai Hackemesser	Thread	Re: comparing nodesets to eac, Kai Hackemesser
Re: comparing nodesets to eac, Kai Hackemesser	Date	Re: comparing nodesets to eac, Kai Hackemesser
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >