[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: is there a way to hash an element?

Subject: Re: is there a way to hash an element?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 13 Jun 2016 07:04:00 -0000
Re:  is there a way to hash an element?
I agree with you, computing a hash and adding it as an attribute to the top
level element, then grouping on the hash looks like a good strategy.

I may have missed something in this thread, but I don't recall seeing a
specification of your matching rules that is sufficiently precise to enable
one to write a hash algorithm. We need to see a definition: "two elements A
and B are considered to be the same if and only if they satisfy the following
conditions: ....".

Michael Kay
Saxonica

> On 13 Jun 2016, at 02:17, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Jun 11, 2016 at 05:21:09PM -0000, Dimitre Novatchev
dnovatchev@xxxxxxxxx scripsit:
> Hi Dimitre --
>> Actually, I believe that calling deep-equal() can be more efficient
>> than comparing hashes.
>>
>> The reason is simple: deep-equal() most probably returns false at the
>> first possible moment -- for example, noticing that an element has
>> different attributes than its counterpart.
>>
>> On the other side, with hashing,  the hashes for the two whole
>> subtrees have to be calculated and only after that they can be
>> compared.
>>
>> To summarize, with the exception of the case when the two subtrees are
>> equal, deep-equal may perform faster than generating and comparing
>> hashes on the subtrees.
>
> I've got one input document with ~5000 trees that are mappable to XSD
> schema definitions; about half are complexTypes.  Many are structurally
> the same but have different names. (All ~5000 have unique names.)
>
> The idea is to group them by structural sameness; deep-equal, even very
> efficiently implemented deep-equal, gives me n^2 as I have to go through
> the whole tree for each element and ask "are you like me?" pairwise.
> Some of the equivalent structures will have a lot of matches -- hundreds
> -- where I can't expect deep-equal to fail quickly and thus efficiently.
>
> Going through and decorating every element with its hash value
> (@hash="something") and then using for-each-group on the lot on the
> basis of the hash gives me 2n.  Even if it's a very naive hash
> implementation, I'd expect 2n to beat n^2 performance.
>
> Am I missing something?
>
> (I'll certainly keep deep-equal in mind if the hash approach has
> unacceptable performance.)
>
> -- Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.