Re: is there a way to hash an element?

Play the video

Subject: Re: is there a way to hash an element?
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 13 Jun 2016 01:17:24 -0000

On Sat, Jun 11, 2016 at 05:21:09PM -0000, Dimitre Novatchev dnovatchev@xxxxxxxxx scripsit:
Hi Dimitre --
> Actually, I believe that calling deep-equal() can be more efficient
> than comparing hashes.
> 
> The reason is simple: deep-equal() most probably returns false at the
> first possible moment -- for example, noticing that an element has
> different attributes than its counterpart.
> 
> On the other side, with hashing,  the hashes for the two whole
> subtrees have to be calculated and only after that they can be
> compared.
> 
> To summarize, with the exception of the case when the two subtrees are
> equal, deep-equal may perform faster than generating and comparing
> hashes on the subtrees.

I've got one input document with ~5000 trees that are mappable to XSD
schema definitions; about half are complexTypes.  Many are structurally
the same but have different names. (All ~5000 have unique names.)

The idea is to group them by structural sameness; deep-equal, even very
efficiently implemented deep-equal, gives me n^2 as I have to go through
the whole tree for each element and ask "are you like me?" pairwise.
Some of the equivalent structures will have a lot of matches -- hundreds
-- where I can't expect deep-equal to fail quickly and thus efficiently.

Going through and decorating every element with its hash value
(@hash="something") and then using for-each-group on the lot on the
basis of the hash gives me 2n.  Even if it's a very naive hash
implementation, I'd expect 2n to beat n^2 performance.

Am I missing something?

(I'll certainly keep deep-equal in mind if the hash approach has
unacceptable performance.)

-- Graydon

Current Thread

Re: is there a way to hash an element?, (continued)

<- Previous	Index	Next ->
Re: is there a way to hash an, Dimitre Novatchev dn	Thread	Re: is there a way to hash an, Dimitre Novatchev dn
Re: Generating an index of te, Charles Muller acmul	Date	Re: is there a way to hash an, Dimitre Novatchev dn
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >