[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Spotting "cousin marriages" in a tree
Dear XSLT experts,
I have a flat input like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"/> <thing id="3"/> </things> I want to match up the ids with the idrefs to build a tree that looks like this: <tree id="1"> <tree id="2"/> <tree id="3"/> </tree> I could do it like this: <xsl:template match="thing"> <tree> <xsl:for-each select="child"> <xsl:apply-templates select="/things/thing[@id=current()/@idref]"/> </xsl:for-each> </tree> </xsl:template> But that is O(n^2), so I'd instead do it like this: <xsl:key name="things-by-id" match="thing" use="@id"/> <xsl:template match="thing"> <tree> <xsl:for-each select="child"> <xsl:apply-templates select="key('things-by-id',@idref)"/> </xsl:for-each> </tree> </xsl:template> which is O(n log n). This is all fine. But very rarely my input may not describe a pure tree; it may contain "cousin marriages" or more "incestuous" relationships like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"> <child idref="4"/> </thing> <thing id="3"> <child idref="4"/> </thing> <thing id="4"/> </things> If you apply either of the above templates to this, thing 4 appears in the output twice: <tree id="1"> <tree id="2"> <tree id="4"/> </tree> <tree id="3"> <tree id="4"> </tree> </tree> which is very bad in my application. What I need is something like this: <tree id="1"> <tree id="2"> <tree id="4"/> </tree> <tree id="3"> <error/> </tree> </tree> I can't afford for this to become an O(n^2) operation. The obvious thought is that I need to check at the point of recursion whether the thing I am about to process has already been output. But there is no "preceding-in-the-output" axis or other way I can think of to note which things have already been output. An alternative is to apply a test to the other input elements to see if other things have the same thing as this one as a child, but that fails because my input may legitimately contain redundant nodes like this: <things> <thing id="1"> <child idref="2"/> <child idref="3"/> </thing> <thing id="2"/> <thing id="3"/> <thing id="9"> <child idref="2"/> </thing> </things> In this case I want the same output as the very first example, i.e. thing 9 doesn't appear in the output at all. I can do a multi-pass process using exsl:node-set(), generating the tree with the duplicates in it and then deleting them if they are duplicates, i.e. something like this: <xsl:template match="/"> <xsl:variable name="n"> <xsl:apply-templates select="things/thing[1]"/> (as above) </xsl:variable> <xsl:apply-templates select="exsl:node-set($n)" mode="remove-dupes"/> </xsl:template> <xsl:template match="tree" mode="remove-dupes"> <xsl:choose> <xsl:when test="preceding::tree[@id=current/@id]"> <error/> </xsl:when> <xsl:otherwise> <tree id="{@id}"> <xsl:apply-templates mode="remove-dupes"/> </tree> </xsl:otherwise> </xsl:choose> </xsl:template> There are two problems with this. First, generating the intermediate result could run forever if the input is seriously malformed (e.g. a loop, child refers to parent); checking as it is generated would avoid this. Second, the test "preceding::tree[@id=current/@id]" is O(n^2). Can I use a key to avoid this? How can a key be applied to an exsl:node-set() result? XSLT often needs some lateral thinking to come up with a good solution. I hope that someone who's read this far has a trick to solve this for me. Thanks in advance! --Phil.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|