[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XML diff tool / algorithm

  • From: "David Lee" <dlee@calldei.com>
  • To: "'Johannes.Lichtenberger'" <Johannes.Lichtenberger@uni-konstanz.de>, <xml-dev@l...>
  • Date: Tue, 28 Jun 2011 06:37:20 -0400

RE:  XML diff tool / algorithm
This is an interesting use case.  I have not personally seen anything that can handle what you're asking 'ideally'
(although I've used several diff tools).
In your case you need a tradeoff between ideal representation and time and space.

I suggest maybe a compromise, depending on how your DB handles fragmentation.   A lot of real-world XML, especially the big ones, are really large lists or heterogeneous collections that can be efficiently split at the root+1 level nodes.
If you split documents at  this level (or some customizable level) then do a binary compare, or MD5 of each fragment (or however your nodes are represented) you may find an efficient algorithm which performs reasonably in time & space for a useful set of cases.
Now finding inserts/deletes may be harder. - might require multiple passes.  But this approach would turn the diff into a linear instead of hierarchical diff, which has many known/published algorithms.  Certainly not perfect but may be vastly better then nothing.



----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org


-----Original Message-----
From: Johannes.Lichtenberger [mailto:Johannes.Lichtenberger@uni-konstanz.de] 
Sent: Tuesday, June 28, 2011 6:25 AM
To: xml-dev@lists.xml.org
Subject: Re:  XML diff tool / algorithm


On 06/28/2011 11:49 AM, Michael Kay wrote:
> On 27/06/2011 23:42, Johannes.Lichtenberger wrote:
>> Hello,
>>
>> does someone use an XML-diff tool, which can handle large XML instances
>> up to several GBs and more?
>
> DeltaXML specializes in this area.
>
> www.deltaxml.com

Hello Michael,

that's one of the things I've found very early, but I've forgotten to
meantion that I'm searching free software or just an algorithm which can
be used in conjunction with our open source XML database system.

It should be used for importing the data, to update the stored data to
new revisions, since updates can be handled efficiently based on the
used algorithm in the database system (Full, Incremental, Differential
and another one, which is going to be explained in a paper soon). For
the incremental and differential approach the found changes have to be
minimal or at least smaller than inserting a full dump.

I haven't found something which fulfills my criterias (minimal edit
scripts, which in our case means minimal updates to the storage, handle
large instances and is running in reasonable times (maybe only in the
worst case O(n^2)).

regards,
Johannes

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.