Finding unique nodes in a non-sibling nodeset

Play the video

Subject: Finding unique nodes in a non-sibling nodeset
From: Mike Berrow <mberrow@xxxxxxxxxxx>
Date: Sat, 29 Jun 2002 10:04:38 -0700

In a code generation transform that I am working on, I frequently encounter
situations where I need to eliminate duplicate expressions or event calls.
The nodes with the commonality to be detected are often scattered around
different parts of a large (preprocessed) reference document that is loaded
with a document call.

Previously, I had eliminated duplicates with something of the form
 $list[not(@key1=preceding-sibling::*/@key1)]
or
 $list[not(@key1=preceding::*/@key1)]
... If I wanted to look back through the whole document.

In this situation however, the nodes to be duplicate-trimmed are

[A] Selected out of the reference document in very specific contextual
  ways (e.g. deep inside xsl:template / xsl:for-each usages)
[B] Not all sibling nodes
[C] The preceding axis can't be used since it looks at the whole
    preceding area of the document, not just my carefully selected nodes.
[D] The definition of duplication requires use of multiple node
    attributes.  i.e. needs a composite key.

Even if [D] were not true, the "preceding-sibling" axis approach would not
work because of [B] and the "preceding" axis approach would not work
because of [C].

I eventually hit on a way to solve this (since I use Saxon) using
saxon:tokenize. But I always wondered if there was a non-extension
way to do it.

What I did was build an aggregate string with delimiters from the nodes
in the set in question (in a variable called "$list"), like so ...

  <xsl:variable name="aggregate">
    <xsl:for-each select="$list">
      <xsl:value-of select="concat(@key1,'/',@key2)" />
      <xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if>
    </xsl:for-each>
  </xsl:variable>

Then use tokenize to get a node set ...

 <xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/>

And eliminate the duplicates the standard (?) way with

 <xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/>

I'm then able to process the node subset I was trying to get since I have the
keys embedded in the strings in the resultant node-set.

All was well, until my colleague decided to try out Saxon 7.1 which (it turns out)
changes the behavior of tokenize(). In that version, the nodeset comes back in
such a way that you can't use the "preceding" axis on it.

There are features in Saxon 7.1 that we are very interested in, so I needed
to try to find a different technique.

It turns out that the following has exactly the desired effect (in one line!!)

  <xsl:variable name="listNoDups"
                select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))"/>

and I could have done that all along.

However, I still wondered if there was a way of doing this without extensions.
So I put the problem to my good friend Chris Maden (yes, *the* Chris Maden)
... but not in as much detail as I have given here.

Chris said "Muenchian Keys!!"

I hadn't yet used that technique anywhere (but heard it mentioned a lot)
so decided to give it a whirl.

Well, it does solve the problem, but with a restriction that makes it
unusable for me.

I set up my key like so:
  <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>

Then used:
  <xsl:variable name="uniqueKey1Key2forFlavour"
        select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/>

Which does the trick, but I can't use it since xsl:key is a top-level element
and I have situation [A] to deal with.

So, my questions are ...
 [1] Is there a non-extension, non-xsl:key way of doing this?
 [2] If not, is there a better way than saxon:distinct approach?

Thanks for bearing with me :-)

I have attached my current test data, test transform and output since
it may help to clarify what I'm trying to do.

-- Mike Berrow

==========  input.xml  ==============
<document>
  <item flavour="sweet" >
    <fact key1="AA" key2="BB" val="11"/>
    <fact key1="XX" key2="CC" val="22"/>
    <fact key1="AA" key2="BB" val="33"/>
  </item>
  <item flavour="sour" >
    <fact key1="XX" key2="CC" val="11"/>
    <fact key1="XX" key2="BB" val="33"/>
    <fact key1="YY" key2="BB" val="22"/>
  </item>
  <item flavour="sweet" >
    <fact key1="XX" key2="CC" val="33"/>
    <fact key1="XX" key2="BB" val="22"/>
    <fact key1="AA" key2="BB" val="11"/>
  </item>
  <item flavour="sour" >
    <fact key1="YY" key2="BB" val="33"/>
    <fact key1="XX" key2="CC" val="11"/>
    <fact key1="YY" key2="BB" val="22"/>
  </item>
</document>


==========  dupElim.xsl  ==============
<?xml version="1.0"?>
<xsl:stylesheet
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:saxon="http://icl.com/saxon"
            version="1.0">

<!-- Finding unique nodes in a non-sibling nodeset... by Mike Berrow -->
<xsl:output method="xml"/>
<xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>

<xsl:template match="document">
  <!-- Select nodes of interest -->
  <xsl:variable name="list" select="item[@flavour='sour']/fact"/>

  <!-- Single value, attempt 1 -->
  <xsl:comment>For $list[not(@key1=preceding-sibling::*/@key1)]</xsl:comment>
  <xsl:text>&#xA;&#x9;</xsl:text><xsl:comment>We get ...</xsl:comment>
  <xsl:variable name="list1NoDups" select="$list[not(@key1=preceding-sibling::*/@key1)]"/>
  <xsl:for-each select="$list1NoDups">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="concat(@key1,'/',@key2)" />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Not desired: 'preceding-sibling' can't see 'preceding cousin'</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

  <!-- Single value, attempt 2 -->
  <xsl:comment>For $list[not(@key1=preceding::*/@key1)]</xsl:comment>
  <xsl:text>&#xA;&#x9;</xsl:text><xsl:comment>We get ...</xsl:comment>
  <xsl:variable name="list2NoDups" select="$list[not(@key1=preceding::*/@key1)]"/>
  <xsl:for-each select="$list2NoDups">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="concat(@key1,'/',@key2)" />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Not desired: 'preceding' looks at the whole doc</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

  <!-- Try Multi-value -->
  <xsl:comment>For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]</xsl:comment>
  <xsl:text>&#xA;&#x9;</xsl:text><xsl:comment>We get ...</xsl:comment>
  <xsl:variable name="list3NoDups" select="$list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]"/>
  <xsl:for-each select="$list3NoDups">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="concat(@key1,'/',@key2)" />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Not desired: result of a naive composite key attempt</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

  <!-- Multi-value using saxon::tokenize -->
  <xsl:comment>Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'</xsl:comment>
  <xsl:variable name="aggregate">
    <xsl:for-each select="$list">
      <xsl:value-of select="concat(@key1,'/',@key2)" />
      <xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if>
    </xsl:for-each>
  </xsl:variable>
  <xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/>
  <xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/>
  <xsl:for-each select="$list4NoDups">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="." />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Which is the desired result</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

  <!-- Multi-value using saxon::distinct -->
  <xsl:comment>saxon:distinct($list, saxon:expression('concat(@key1,@key2)')</xsl:comment>
  <xsl:for-each select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="concat(@key1,'/',@key2)" />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Which is tighter code than using tokenize</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

  <!-- Multi-value using Muenchian -->
  <xsl:comment>Using <xsl:text>&lt;xsl:key name="Key1Key2" match="item[@flavour='sour']/fact"
use="concat(@key1,@key2)"/&gt;</xsl:text>
    and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"</xsl:comment>
  <xsl:variable name="uniqueKey1Key2forFlavour"
        select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/>
  <xsl:for-each select="$uniqueKey1Key2forFlavour">
    <xsl:text>&#xA;&#x9;</xsl:text>
    <xsl:value-of select="concat(@key1,'/',@key2)" />
  </xsl:for-each>
  <xsl:text>&#xA;&#x9;</xsl:text>
  <xsl:comment>Which is the Muenchian approach, but since xsl:key is a top level element, this
      will not help when nodesets need to be calculated in specific, non-whole-document
contexts</xsl:comment><xsl:text>&#xA;&#xA;</xsl:text>

</xsl:template>

</xsl:stylesheet>


==========  minSet.xml  ==============
<?xml version="1.0" encoding="utf-8"?>
<!--For $list[not(@key1=preceding-sibling::*/@key1)]-->
 <!--We get ...-->
 XX/CC
 YY/BB
 YY/BB
 XX/CC
 <!--Not desired: 'preceding-sibling' can't see 'preceding cousin'-->

<!--For $list[not(@key1=preceding::*/@key1)]-->
 <!--We get ...-->
 YY/BB
 <!--Not desired: 'preceding' looks at the whole doc-->

<!--For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]-->
 <!--We get ...-->
 XX/CC
 XX/BB
 YY/BB
 YY/BB
 XX/CC
 YY/BB
 <!--Not desired: result of a naive composite key attempt-->

<!--Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'-->
 XX/CC
 XX/BB
 YY/BB
 <!--Which is the desired result-->

<!--saxon:distinct($list, saxon:expression('concat(@key1,@key2)')-->
 XX/CC
 XX/BB
 YY/BB
 <!--Which is tighter code than using tokenize-->

<!--Using <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>
  and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"-->
 XX/CC
 XX/BB
 YY/BB
 <!--Which is the Muenchian approach, but since xsl:key is a top level element, this
   will not help when nodesets need to be calculated in specific, non-whole-document contexts-->




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread
Finding unique nodes in a non-sibling nodeset Mike Berrow - Sat, 29 Jun 2002 13:05:19 -0400 (EDT) <= Michael Kay - Sun, 30 Jun 2002 15:12:29 -0400 (EDT)

<- Previous	Index	Next ->
XSL Formatter X2.2 Release In, Tokushige Kobayashi	Thread	RE: Finding unique nodes in a, Michael Kay
Re: How parser maintains and , Mike Brown	Date	Re: remove extra chars from t, Stan Scott
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >