Re: Fw: Question on duplicate node elimination

Play the video

Subject: Re: Fw: Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Fri, 27 Aug 2010 14:59:56 +0200

Michael,

> ... Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
> either.

yesterdays solution [1] based on id() function was working good.


But I thought again and below single file solution based on applying
key() function twice for duplicate elimination is much better:
* does not need any separately created structure (like idcopy in [1])
* is really short, just a few lines (not counting comments)
* works on ALL major browsers (IE support by David Carlisle's trick [4])

Below are
* execution by xsltproc
* listing of dupelinm3.xsl [2]
* listing of ancestor.xml [3] (open that in browser)


$ xsltproc dupelim3.xsl ancestor3.xml
<html><pre><h2>Duplicate node elimination by applying key() function
twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)


ids(//*)
a      id2619817
+-b    id2619788
! +-c  id2619830
! +-c  id2619802
+-b    id2619245
! +-c  id2619317
! +-c  id2619321
<hr>
ids(//c):
<id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id>
<hr>
nodes="ids(//c)"<br>ids($nodes/ancestor::*):
<id>id2619817</id><id>id2619788</id><id>id2619245</id>
</pre></html>
$
$ cat dupelim3.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl"
>
  <xsl:output method="html"/>

  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>


  <xsl:template match="/">

    <!--
         initial node-set sample, represented by <id> nodes
    -->
    <xsl:variable name="nodes">
      <xsl:for-each select="//c">
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


    <!--
         do ancestor location step
    -->
    <xsl:variable name="result">
      <!--
           application of "ancestor::*" on $nodes;
           $aux might contain duplicate id nodes
      -->
      <xsl:variable name="aux">
        <!--
             use key() function to determine real nodes
        -->
        <xsl:for-each select="key('nodes-by-genid',exslt:node-set
($nodes)/id)">
          <!--
              location step on each real node
          -->
          <xsl:for-each select="ancestor::*">
            <!--
                generate <id>s for new nodes
            -->
            <id><xsl:value-of select="generate-id()"/></id>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:variable>

      <!--
           use key() function for duplicate elimination
      -->
      <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)">
        <!--
            generate <id>s, now for unique new nodes
        -->
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


<html><pre>
    <h2>Duplicate node elimination by applying key() function twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)

    <!-- node name vs genid output -->
    <xsl:text>&#10;ids(//*)</xsl:text>
    <xsl:for-each select="//*">
      <xsl:value-of select=
        "concat('&#10;',substring('! +-',5-2*count(ancestor::*)),name(),
         substring('    ',1+2*count(ancestor::*)),'  ',generate-id())"/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- for verification -->
    <xsl:text>ids(//c): </xsl:text>
    <xsl:copy-of select="$nodes"/>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- output of result -->
    <xsl:text>nodes="ids(//c)"</xsl:text><br/>
    <xsl:text>ids($nodes/ancestor::*): </xsl:text>
    <xsl:copy-of select="$result"/>

    <xsl:text>&#10;</xsl:text>
</pre></html>

  </xsl:template>


<!--
  from http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
-->
<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>

</xsl:stylesheet>
$
$ cat ancestor3.xml
<?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?>
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html
[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml
[4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw:  Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

Current Thread
Re: Fw: Question on duplicate node elimination, (continued) Hermann Stamm-Wilbrandt - 24 Aug 2010 18:52:44 -0000 David Carlisle - 24 Aug 2010 19:24:03 -0000 Hermann Stamm-Wilbrandt - 24 Aug 2010 22:40:23 -0000 Hermann Stamm-Wilbrandt - 26 Aug 2010 14:38:43 -0000 Hermann Stamm-Wilbrandt - 27 Aug 2010 13:00:15 -0000 <=

<- Previous	Index	Next ->
Re: Fw: Question on duplicate, Hermann Stamm-Wilbra	Thread	[no subject], Unknown
Re: Help on node selection, Dimitre Novatchev	Date	fo:table-cell with dynamical, Red Light
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >