|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: hashing
If you're concerned about byte-for-byte identical, hashing each file is okay; if you're concerned about semantic identical (e.g., the order of attributes doesn't matter) than use standard XML canonicalization or something similar (but it won't be as good:) Her's a portable python script that compares all files named on the command-line: ; cat x.py import sys,sha from xml.dom.ext.reader import PyExpat from xml.dom.ext.c14n import Canonicalize hashes = {} for f in sys.argv: o = sha.sha() if 1: # simple hash of contents o.update(open(f).read()) else: # sha(c14n(doc)) r = PyExpat.Reader() dom = r.fromStream(open(f)) o.update(Canonicalize(dom)) h = o.digest() other = hashes.get(h, None) if other: print 'duplicate', f, other else: hashes[h] = f ; -- Rich Salz Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Cast Your Vote
We need your help – Vote for DataDirect XML Products!
Winners and finalists announced at SOA World Conference in November. Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||







