Re: is there a way to hash an element?

Play the video

Subject: Re: is there a way to hash an element?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Jun 2016 09:30:24 -0000

If you have the opportunity to use XSLT 3.0, this might be a good use for
accumulators; these visit every node in the tree and compute a value based on
the previous value and the content of the node: in your case the value of the
accumulator could be the hash function. With 2.0 you could achieve a similar
effect using apply-templates with sibling recursion.

Something like this:

<xsl:template match="*" mode="hash" as="xs:integer">
  <xsl:param name="h" as="xs:integer"/>
  <xsl:apply-templates select="." mode="local-hash">
    <xsl:with-param name="h">
      <xsl:apply-templates select="following-sibling::*[1]">
        <xsl:with-param name="h">
          <xsl:apply-templates select="*[1]">
            <xsl:with-param name="h" select="$h"/>
          </
        </
      </
  </
</

and then in mode local-hash, you can define rules for individual elements that
compute a hash for that particular element based on its attributes and text
content; each template takes the old hash value and updates it as necessary.
To combine two hash values you can use addition, or if you prefer an XOR
function which you can get from the EXPath binary library.

Michael Kay
Saxonica

> On 9 Jun 2016, at 23:09, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello all --
>
> So I've got about half a gibabyte of XML messages describing various
> health care actions.  Many of these are structural duplicates of each
> other; the top elements differ by their attribute values, but the
> structure and values of the descendant elements is the same.  The amount
> of duplication varies from none to thousands.
>
> I've got an apparently useful heuristic based on descendant attribute
> values, but would -- it is health care data -- really like to have a
> more robust way to group the elements into set of equivalent top-level
> names by their structural sameness.  (I can't hand-check the whole data
> set.)
>
> So I find myself wanting an equivalent of sha256sum for elements so I
> could generate a grouping key from the descendant elements and their
> associated attributes as a unit.
>
> Is there such a thing?  Equivalent approaches?
>
> Thanks!
> Graydon

Current Thread

Re: is there a way to hash an element?, (continued)
- Michael Kay mike@xxxxxxxxxxxx - 10 Jun 2016 09:30:24 -0000 <=
  - Graydon graydon@xxxxxxxxx - 10 Jun 2016 21:49:09 -0000

<- Previous	Index	Next ->
Re: is there a way to hash an, Michael Kay mike@xxx	Thread	Re: is there a way to hash an, Graydon graydon@xxxx
Re: is there a way to hash an, Graydon graydon@xxxx	Date	Re: is there a way to hash an, Dimitre Novatchev dn
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >