Spotting "cousin marriages" in a tree

Play the video

Subject: Spotting "cousin marriages" in a tree
From: Phil Endecott <spam_from_xslt_list@xxxxxxxxxxxx>
Date: Wed, 28 Jul 2004 22:18:50 +0100

Dear XSLT experts,

I have a flat input like this:

<things>
  <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
  <thing id="2"/>
  <thing id="3"/>
</things>

I want to match up the ids with the idrefs to build a tree that looks like this:

<tree id="1">
  <tree id="2"/>
  <tree id="3"/>
</tree>

I could do it like this:

<xsl:template match="thing">
  <tree>
    <xsl:for-each select="child">
      <xsl:apply-templates
           select="/things/thing[@id=current()/@idref]"/>
    </xsl:for-each>
  </tree>
</xsl:template>

But that is O(n^2), so I'd instead do it like this:

<xsl:key name="things-by-id" match="thing" use="@id"/>
<xsl:template match="thing">
  <tree>
    <xsl:for-each select="child">
      <xsl:apply-templates select="key('things-by-id',@idref)"/>
    </xsl:for-each>
  </tree>
</xsl:template>

which is O(n log n).

This is all fine. But very rarely my input may not describe a pure tree; it may contain "cousin marriages" or more "incestuous" relationships like this:

<things>
  <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
  <thing id="2"> <child idref="4"/> </thing>
  <thing id="3"> <child idref="4"/> </thing>
  <thing id="4"/>
</things>

If you apply either of the above templates to this, thing 4 appears in the output twice:

<tree id="1">
  <tree id="2">
    <tree id="4"/>
  </tree>
  <tree id="3">
    <tree id="4">
  </tree>
</tree>

which is very bad in my application. What I need is something like this:

<tree id="1">
  <tree id="2">
    <tree id="4"/>
  </tree>
  <tree id="3">
    <error/>
  </tree>
</tree>

I can't afford for this to become an O(n^2) operation.

The obvious thought is that I need to check at the point of recursion whether the thing I am about to process has already been output. But there is no "preceding-in-the-output" axis or other way I can think of to note which things have already been output.

An alternative is to apply a test to the other input elements to see if other things have the same thing as this one as a child, but that fails because my input may legitimately contain redundant nodes like this:

<things>
  <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
  <thing id="2"/>
  <thing id="3"/>
  <thing id="9"> <child idref="2"/> </thing>
</things>

In this case I want the same output as the very first example, i.e. thing 9 doesn't appear in the output at all.

I can do a multi-pass process using exsl:node-set(), generating the tree with the duplicates in it and then deleting them if they are duplicates, i.e. something like this:

<xsl:template match="/">
  <xsl:variable name="n">
    <xsl:apply-templates select="things/thing[1]"/>  (as above)
  </xsl:variable>
  <xsl:apply-templates select="exsl:node-set($n)" mode="remove-dupes"/>
</xsl:template>

<xsl:template match="tree" mode="remove-dupes">
  <xsl:choose>
    <xsl:when test="preceding::tree[@id=current/@id]">
      <error/>
    </xsl:when>
    <xsl:otherwise>
      <tree id="{@id}">
        <xsl:apply-templates mode="remove-dupes"/>
      </tree>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

There are two problems with this. First, generating the intermediate result could run forever if the input is seriously malformed (e.g. a loop, child refers to parent); checking as it is generated would avoid this. Second, the test "preceding::tree[@id=current/@id]" is O(n^2). Can I use a key to avoid this? How can a key be applied to an exsl:node-set() result?

XSLT often needs some lateral thinking to come up with a good solution. I hope that someone who's read this far has a trick to solve this for me.

Thanks in advance!

--Phil.

Current Thread
Spotting "cousin marriages" in a tree Phil Endecott - Wed, 28 Jul 2004 22:18:50 +0100 <= Michael Kay - Wed, 28 Jul 2004 23:57:33 +0100 Jeni Tennison - Thu, 29 Jul 2004 10:30:14 +0100 <Possible follow-ups> Phil Endecott - Thu, 29 Jul 2004 12:43:37 +0100 David Carlisle - Thu, 29 Jul 2004 12:51:18 +0100

<- Previous	Index	Next ->
Re: Yet Another Sorting Probl, Mukul Gandhi	Thread	RE: Spotting "cousin marriage, Michael Kay
Re: And operator usage in XSL, J.Pietschmann	Date	Re: And operator usage in XSL, john farrow
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >