[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: XML diff tool / algorithm
This is an interesting use case. I have not personally seen anything that can handle what you're asking 'ideally' (although I've used several diff tools). In your case you need a tradeoff between ideal representation and time and space. I suggest maybe a compromise, depending on how your DB handles fragmentation. A lot of real-world XML, especially the big ones, are really large lists or heterogeneous collections that can be efficiently split at the root+1 level nodes. If you split documents at this level (or some customizable level) then do a binary compare, or MD5 of each fragment (or however your nodes are represented) you may find an efficient algorithm which performs reasonably in time & space for a useful set of cases. Now finding inserts/deletes may be harder. - might require multiple passes. But this approach would turn the diff into a linear instead of hierarchical diff, which has many known/published algorithms. Certainly not perfect but may be vastly better then nothing. ---------------------------------------- David A. Lee dlee@calldei.com http://www.xmlsh.org -----Original Message----- From: Johannes.Lichtenberger [mailto:Johannes.Lichtenberger@uni-konstanz.de] Sent: Tuesday, June 28, 2011 6:25 AM To: xml-dev@lists.xml.org Subject: Re: XML diff tool / algorithm On 06/28/2011 11:49 AM, Michael Kay wrote: > On 27/06/2011 23:42, Johannes.Lichtenberger wrote: >> Hello, >> >> does someone use an XML-diff tool, which can handle large XML instances >> up to several GBs and more? > > DeltaXML specializes in this area. > > www.deltaxml.com Hello Michael, that's one of the things I've found very early, but I've forgotten to meantion that I'm searching free software or just an algorithm which can be used in conjunction with our open source XML database system. It should be used for importing the data, to update the stored data to new revisions, since updates can be handled efficiently based on the used algorithm in the database system (Full, Incremental, Differential and another one, which is going to be explained in a paper soon). For the incremental and differential approach the found changes have to be minimal or at least smaller than inserting a full dump. I haven't found something which fulfills my criterias (minimal edit scripts, which in our case means minimal updates to the storage, handle large instances and is running in reasonable times (maybe only in the worst case O(n^2)). regards, Johannes _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|