Re: Merging Two Nodesets .. can it be done?

Play the video

Subject: Re: Merging Two Nodesets .. can it be done?
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 15 Aug 2007 01:50:38 +0200

Wasiq Shaikh wrote:

Im using XPath1.0/XalanJ 2.7 and yes I can use the node-set function.

I just realized you don't need the node-set function at all.

The ordering of elements does not matter but the concept of the merge should be consistent, meaning all "like" elements are joined.

Yes, nodes must be on the same level to be comparable (mergeable?). Nodes on any other level with the same likeness should not be merged since they are part of a different node generation. For example:
<X>
  <Z>
     <Z>...
The parent and child <Z> elements should not be merged.

I'm afraid I still don't really understand what you are after. But from your original input/output example, it seems that you want a node that has the same names for ancestor-or-self as another node to be merged. I.e.:

   <merge-a>
       <no-merge-z />
       <merge-b />
       <no-merge-y />
   </merge-a>
   <merge-a>
       <merge-b />
       <merge-c>
           <no-merge-x />
       </merge-c>
   </merge-a>
   <merge-a>
       <merge-c>
           <no-merge-w />
       </merge-c>
       <merge-b />
       <merge-c />
   </merge-a>

in the above, the nodes a, b and c will be merged, because their paths are the same. The nodes x, y and z are not merged (i.e., they stay as they are) because they have distinct paths (one could argue that these are also merged, from a set of 1 node to a set of 1 node).

So, in other words, any XPath X/Y/Z should return in one resulting node. I leave the details for you, I assume you want to address some more rules for any given node, like content or other properties to decide whether a node is distinct or not (see below for what I understood to be correctly merged with the above input)

The algorithm you mention is what I was thinking about doing. I know it's quite simple, however, the amount of work the processor needs to do in comparing each and every similar node is expensive. Joining two nodes is fine, but what happens if you have tens, hundreds, or thousands of similar nodes to merge? Then each child of those many nodes needs to be compared and merged as well, and so on and so forth...

Forget about my algorithm, it was based on a not-so-good understanding of your specifications.

I know it can be done in XSL, but can it handle such a process? Or is this the work for procedural programming like Java?

Quite easy in XSLT 1.0, very easy in XSLT 2.0. Of course, you can always attempt such a task in another language, but be aware, you probably have to tree-walk everything yourself then.

Ok, here's the trick. You may have heard or read about dedupping (there's been a nice discussion last year on what term should be used) and I think that your problem is essentially nothing more than dedupping based on a certain set of rules that define the uniqueness of a node. The tricky bit is that "duplicate nodes" are not always on the same level, which makes the process slightly harder.

I usually don't attempt XSLT 1.0 anymore when it comes to keys and the like, in XSLT 2.0 the solution would be sooooo much easier to implement and understand (if you can persuade your team to upgrade it will save you some headaches in the future). Anyway, here it is, I call it "Dedupping based on the node's XPath":


   <xsl:key match="*" name="ancestors"
       use="concat(
           name(),
           name(ancestor-or-self::*[2]),
           name(ancestor-or-self::*[3]),
           name(ancestor-or-self::*[4]),
           name(ancestor-or-self::*[5]))" />


   <xsl:template match="*">
       <xsl:variable name="ancestors"
           select="concat(
                   name(),
                   name(ancestor-or-self::*[2]),
                   name(ancestor-or-self::*[3]),
                   name(ancestor-or-self::*[4]),
                   name(ancestor-or-self::*[5]))" />

<xsl:if test="generate-id(key('ancestors', $ancestors)[1]) = generate-id(current())"> <xsl:copy> <xsl:apply-templates select="key('ancestors', $ancestors)/*" /> </xsl:copy> </xsl:if> </xsl:template>

As you can see, it really isn't that hard (only a bit annoying, especially by the duplicated logic). If you know something about node-identity and how you can find two nodes that are identical inside an XML document using XSLT 1.0, the above should read quite easy. Apart from the "normal" dedupping code (inside the xsl:if), the core of the piece is of course the key and the $ancestors, which are used to find all nodes that have the same XPath.

The result of applying the above code to the above input document is as follows (note that the order of input is preserved automatically):

  <merge-a>
     <no-merge-z/>
     <merge-b/>
     <no-merge-y/>
     <merge-c>
        <no-merge-x/>
        <no-merge-w/>
     </merge-c>
  </merge-a>

And now, to impress your bosses and ask them for an upgrade, here's the same code in XSLT 2.0 (and done in a tenth of the time), note the function that makes it completely generic instead of static (xslt 1.0 can be made generic too, but requires a lot more effort):

<xsl:key match="*" name="ancestors" use="s:ancestor(.)" />

<xsl:template match="s:*[key('ancestors', s:ancestor(.))[1] is current()]"> <xsl:copy> <xsl:apply-templates select="key('ancestors', s:ancestor(.))/*" /> </xsl:copy> </xsl:template> <xsl:function name="s:ancestor"> <xsl:param name="node" /> <xsl:value-of select="for $i in $node/ancestor-or-self::* return name($i)" /> </xsl:function>

<xsl:template match="s:*" />

And yes, it can be done without a key (both in XSLT 1.0 and 2.0) but with large documents that will require quite some reverse lookup that will cost a lot of processor cycles. Anyway, I hope you enjoy the solution (and even more if it is of some use for you).

Cheers,
-- Abel Braaksma

Current Thread
Merging Two Nodesets .. can it be done? Wasiq Shaikh - 13 Aug 2007 19:48:58 -0000 Abel Braaksma - 13 Aug 2007 22:03:44 -0000 Wasiq Shaikh - 14 Aug 2007 13:23:03 -0000 Abel Braaksma - 14 Aug 2007 23:51:10 -0000 <= Mukul Gandhi - 14 Aug 2007 08:08:21 -0000

<- Previous	Index	Next ->
Re: Merging Two Nodesets .. c, Wasiq Shaikh	Thread	Re: Merging Two Nodesets .. c, Mukul Gandhi
[ann] Kernow 1.5.2 beta - Now, Andrew Welch	Date	RE: Re: Coding aroung a "Cann, Michael Kay
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >