|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: hashing
md5sum is a cryptographic hash using the MD5 algorithm. It's not fast, but it will do what you want. It's available in linux, in cygwin, and probably other ways. In a reasonable command shell, where unix commands are available along with md5sum, md5sum *.xml | sort will put the duplicate files on neighboring lines. Jeff ----- Original Message ----- From: "Eric Hanson" <eric@a...> To: <xml-dev@l...> Sent: Thursday, April 29, 2004 12:58 PM Subject: hashing > I have a large collection of XML documents, and want to find and > group any duplicates. The obvious but slow way of doing this is > to just compare them all to each other. Is there a better > approach? > > Particularly, is there any APIs or standards for "hashing" a > document so that duplicates could be identified in a similar way > to what you'd do with a hash table?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








