|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Spotting "cousin marriages" in a tree
Dear XSLT experts,
I have a flat input like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"/> <thing id="3"/> </things> I want to match up the ids with the idrefs to build a tree that looks like this: <tree id="1"> <tree id="2"/> <tree id="3"/> </tree> I could do it like this: <xsl:template match="thing">
<tree>
<xsl:for-each select="child">
<xsl:apply-templates
select="/things/thing[@id=current()/@idref]"/>
</xsl:for-each>
</tree>
</xsl:template>But that is O(n^2), so I'd instead do it like this: <xsl:key name="things-by-id" match="thing" use="@id"/>
<xsl:template match="thing">
<tree>
<xsl:for-each select="child">
<xsl:apply-templates select="key('things-by-id',@idref)"/>
</xsl:for-each>
</tree>
</xsl:template>which is O(n log n). This is all fine. But very rarely my input may not describe a pure tree; it may contain "cousin marriages" or more "incestuous" relationships like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"> <child idref="4"/> </thing> <thing id="3"> <child idref="4"/> </thing> <thing id="4"/> </things> If you apply either of the above templates to this, thing 4 appears in the output twice: <tree id="1">
<tree id="2">
<tree id="4"/>
</tree>
<tree id="3">
<tree id="4">
</tree>
</tree>which is very bad in my application. What I need is something like this: <tree id="1">
<tree id="2">
<tree id="4"/>
</tree>
<tree id="3">
<error/>
</tree>
</tree>I can't afford for this to become an O(n^2) operation. The obvious thought is that I need to check at the point of recursion whether the thing I am about to process has already been output. But there is no "preceding-in-the-output" axis or other way I can think of to note which things have already been output. An alternative is to apply a test to the other input elements to see if other things have the same thing as this one as a child, but that fails because my input may legitimately contain redundant nodes like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"/> <thing id="3"/> <thing id="9"> <child idref="2"/> </thing> </things> In this case I want the same output as the very first example, i.e. thing 9 doesn't appear in the output at all. I can do a multi-pass process using exsl:node-set(), generating the tree with the duplicates in it and then deleting them if they are duplicates, i.e. something like this: <xsl:template match="/">
<xsl:variable name="n">
<xsl:apply-templates select="things/thing[1]"/> (as above)
</xsl:variable>
<xsl:apply-templates select="exsl:node-set($n)" mode="remove-dupes"/>
</xsl:template><xsl:template match="tree" mode="remove-dupes">
<xsl:choose>
<xsl:when test="preceding::tree[@id=current/@id]">
<error/>
</xsl:when>
<xsl:otherwise>
<tree id="{@id}">
<xsl:apply-templates mode="remove-dupes"/>
</tree>
</xsl:otherwise>
</xsl:choose>
</xsl:template>There are two problems with this. First, generating the intermediate result could run forever if the input is seriously malformed (e.g. a loop, child refers to parent); checking as it is generated would avoid this. Second, the test "preceding::tree[@id=current/@id]" is O(n^2). Can I use a key to avoid this? How can a key be applied to an exsl:node-set() result? XSLT often needs some lateral thinking to come up with a good solution. I hope that someone who's read this far has a trick to solve this for me. Thanks in advance! --Phil.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








