[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fw: Question on duplicate node elimination

Subject: Re: Fw: Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Fri, 27 Aug 2010 14:59:56 +0200
Re: Fw:  Question on duplicate node elimination
Michael,

> ... Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
> either.

yesterdays solution [1] based on id() function was working good.


But I thought again and below single file solution based on applying
key() function twice for duplicate elimination is much better:
* does not need any separately created structure (like idcopy in [1])
* is really short, just a few lines (not counting comments)
* works on ALL major browsers (IE support by David Carlisle's trick [4])

Below are
* execution by xsltproc
* listing of dupelinm3.xsl [2]
* listing of ancestor.xml [3] (open that in browser)


$ xsltproc dupelim3.xsl ancestor3.xml
<html><pre><h2>Duplicate node elimination by applying key() function
twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)


ids(//*)
a      id2619817
+-b    id2619788
! +-c  id2619830
! +-c  id2619802
+-b    id2619245
! +-c  id2619317
! +-c  id2619321
<hr>
ids(//c):
<id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id>
<hr>
nodes="ids(//c)"<br>ids($nodes/ancestor::*):
<id>id2619817</id><id>id2619788</id><id>id2619245</id>
</pre></html>
$
$ cat dupelim3.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl"
>
  <xsl:output method="html"/>

  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>


  <xsl:template match="/">

    <!--
         initial node-set sample, represented by <id> nodes
    -->
    <xsl:variable name="nodes">
      <xsl:for-each select="//c">
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


    <!--
         do ancestor location step
    -->
    <xsl:variable name="result">
      <!--
           application of "ancestor::*" on $nodes;
           $aux might contain duplicate id nodes
      -->
      <xsl:variable name="aux">
        <!--
             use key() function to determine real nodes
        -->
        <xsl:for-each select="key('nodes-by-genid',exslt:node-set
($nodes)/id)">
          <!--
              location step on each real node
          -->
          <xsl:for-each select="ancestor::*">
            <!--
                generate <id>s for new nodes
            -->
            <id><xsl:value-of select="generate-id()"/></id>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:variable>

      <!--
           use key() function for duplicate elimination
      -->
      <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)">
        <!--
            generate <id>s, now for unique new nodes
        -->
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


<html><pre>
    <h2>Duplicate node elimination by applying key() function twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)

    <!-- node name vs genid output -->
    <xsl:text>&#10;ids(//*)</xsl:text>
    <xsl:for-each select="//*">
      <xsl:value-of select=
        "concat('&#10;',substring('! +-',5-2*count(ancestor::*)),name(),
         substring('    ',1+2*count(ancestor::*)),'  ',generate-id())"/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- for verification -->
    <xsl:text>ids(//c): </xsl:text>
    <xsl:copy-of select="$nodes"/>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- output of result -->
    <xsl:text>nodes="ids(//c)"</xsl:text><br/>
    <xsl:text>ids($nodes/ancestor::*): </xsl:text>
    <xsl:copy-of select="$result"/>

    <xsl:text>&#10;</xsl:text>
</pre></html>

  </xsl:template>


<!--
  from http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
-->
<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>

</xsl:stylesheet>
$
$ cat ancestor3.xml
<?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?>
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html
[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml
[4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw:  Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.