[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Solution: Efficient grouping without using the Muenchi

Subject: Solution: Efficient grouping without using the Muenchian method or key() function
From: "Rapido" <juicer@xxxxxxxxx>
Date: Thu, 20 Mar 2003 20:31:54 +0100
efficient grouping algorithm
Hi all,

Although I'm relativily new to XSLT I found it very hard to get grouping
functionality in place WITHOUT using the Muenchian method.
I needed this because I want to group tree fragments instead of the document
tree.
The key() function however only works on the document tree, so that's a
problem.
With the help of Dimitre Novatchev excellent articles I managed to work out
a solution.
Because I'm new to XSLT any comments about style and implementation are very
welcom!

Cheers,
RvD

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:java="http://xml.apache.org/xslt/java"
xmlns:xalan="http://xml.apache.org/xalan" version="1.0"
exclude-result-prefixes="java xalan">

        <!--
        Provides two grouping templates

        - group-on-key
        - group-on-item

        Author: Robbert van Dalen (juicer@xxxxxxxxx)

        NOTE 1:
        This stylesheet relies on the :NODESET extension because the
algorithm
uses multiple passes and tree-fragments to compute the result. Otherwise
it's a pure XSLT implementation.

        NOTE 2:
        Uses the XALAN :NODESET extension. Not tested are XSLT processors
other
then XALAN.

        RATIONALE:
        Grouping with XSLT can be done efficiently by the MUENCHIAN GROUPING
technique using the key() function to identify nodes within the same
group.
        The key() function however requires the nodeset to be a DOCUMENT
nodeset.
        The main problem with MUENCHIAN GROUPING is then that it cannot
group
tree-fragments.
        Use this functionality if you want to group tree-fragments:

        ALGORITHM OUTLINE:
        The algorithm has three stages.

        1) SORTING the nodes for a given key.
        2) Computing the RANGEs of nodes which have the same key.
        3) SELECTING the sorted nodes for each RANGE.


        So if we want to group to following nodeset:
        <c a="1"/> <c a="3"/> <c a="2"/> <c a="1"> <c a="3"/> <c a="2"/> <c
a="1"/>

        Step 1:
        <c a="1"/><c a="1"/><c a="1"/><c a="2"/><c a="2"/><c a="3"/><c
a="3"/>

        Step 2: (with step 2 as input)
        <r s="1" e="3"/> <r s="4" e="5"/> <r s="6" e="7"/>
        Note that the number of ranges equals the number of groups

        Step 3: (with step 1 and 2 as input)
        <g> <c a="1"/> <c a="1"/> <c a="1"/> </g>
        <g> <c a="2"/> <c a="2"/> </g>
        <g> <c a="3"/> <c a="3"/> </g>

        All this can be achieved by searching, selecting and filtering nodes
with
XPATH expressions. XPATH however, does not provide efficient means to
select a RANGE of nodes within a nodeset.
        For instance, walking a list of N nodes can be implemented using
recursion. But without tail-recursion elimination the recursion depth
will be to deep for most XSLT implementations.

        Matters get worse when the number of groups is more or less equal to
the
number of nodes. Naieve implementations can easily take O(N^2) time,
which is not acceptable.
        This implementation has a time complexity of O(log(N)*N) which is
fine
because sorting usually also takes O(log(N)*N) (depending on the XSLT's
implementation of xsl:sort).
        To achieve this speed, step 3 is implemented by partitioning the
nodeset
into a binary tree. This binary tree is then used to select nodes within a
range.

        Binary partitioning is an example of a DiVide and Conquer (DVC)
algorithm
neccesary to reduce recursion depth. After partitioning, the binary tree
can be reused to implement other DVC algorithms more easily (because
complex recursive algorithms can now be reimplemented by just walking the
binary tree).
        -->

        <!--
        Template: group-on-key
        Use this template to group <nodes> which share a common attribute
<key>
        The result will be sub-sets of <nodes> surrounded by <group/> tags
        -->

        <xsl:template name="group-on-key">
                <xsl:param name="nodes"/>
                <xsl:param name="key"/>

                <xsl:variable name="items">
                        <xsl:for-each select="$nodes">
                                <item>
                                        <key>
                                                <xsl:value-of
select="./@*[name() = $key]"/>
                                        </key>
                                        <value>
                                                <xsl:copy-of select="."/>
                                        </value>
                                </item>
                        </xsl:for-each>
                </xsl:variable>

                <xsl:variable name="grouped-items">
                        <xsl:call-template name="group-on-item">
                                <xsl:with-param name="nodes"
select="xalan:nodeset($items)/*"/>
                                <xsl:with-param name="key" select="$key"/>
                        </xsl:call-template>
                </xsl:variable>

                <xsl:for-each select="xalan:nodeset($grouped-items)/*">
                        <group>
                                <xsl:for-each select="./*">
                                        <xsl:copy-of select="./value/*[1]"/>
                                </xsl:for-each>
                        </group>
                </xsl:for-each>

        </xsl:template>

        <!--
        Template: group-on-item
        Use this template to group <nodes> which share a common structure.
        You can build this structure yourself if you want to group on
something else
        then keys.

        The structure is the <item> structure and has the following layout
        <item>
                <key>
                        aKey (can be anything, preferrably a string)
                </key>
                <value>
                        aValue (can be anything, probably a node(set))
                </value>
        </item>

        Items will we grouped on the string value of <key>
        The result will be sub-sets of items surrounded by <group/> tags
        -->

        <xsl:template name="group-on-item">
                <xsl:param name="nodes"/>

                <!-- Step 1 -->
                <xsl:variable name="sorted">
                        <xsl:for-each select="$nodes">
                                <xsl:sort select="./key[1]/"/>
                                <xsl:copy-of select="."/>
                        </xsl:for-each>
                </xsl:variable>

                <xsl:variable name="sorted-tree"
select="xalan:nodeset($sorted)/*"/>

                <!-- Step 2.1 -->
                <xsl:variable name="pivots">
                        <xsl:call-template name="pivots">
                                <xsl:with-param name="nodes"
select="$sorted-tree"/>
                        </xsl:call-template>
                </xsl:variable>

                <!-- Step 2.2 -->
                <xsl:variable name="ranges">
                        <xsl:call-template name="ranges">
                                <xsl:with-param name="pivots"
select="xalan:nodeset($pivots)/*"/>
                        </xsl:call-template>
                </xsl:variable>

                <!-- Step 3.1 -->
                <xsl:variable name="partition">
                        <xsl:call-template name="partition">
                                <xsl:with-param name="node"
select="$sorted-tree[1]"/>
                        </xsl:call-template>
                </xsl:variable>

                <xsl:variable name="root-partition"
select="xalan:nodeset($partition)/*[1]"/>

                <!-- Step 3.2 -->
                <xsl:for-each select="xalan:nodeset($ranges)/range">
                        <xsl:variable name="s" select="./start"/>
                        <xsl:variable name="e" select="./end"/>

                        <group>
                                <xsl:call-template
name="range-in-partition">
                                        <xsl:with-param name="s"
select="$s"/>
                                        <xsl:with-param name="e"
select="$e"/>
                                        <xsl:with-param name="p"
select="$root-partition"/>
                                </xsl:call-template>
                        </group>
                </xsl:for-each>

        </xsl:template>

        <xsl:template name="pivots">
                <xsl:param name="nodes"/>
                <xsl:param name="key"/>

                <xsl:for-each select="$nodes">
                        <xsl:if test="not(string(./key[1]/) =
string(./following-sibling::*[1]/key[1]/))">
                                <pivot>
                                        <xsl:value-of select="position()"/>
                                </pivot>
                        </xsl:if>
                </xsl:for-each>
        </xsl:template>

        <xsl:template name="ranges">
                <xsl:param name="pivots"/>

                <xsl:for-each select="$pivots">
                        <xsl:variable name="p"
select="./preceding-sibling::*[1]"/>
                        <range>
                                <start>
                                        <xsl:choose>
                                                <xsl:when test="$p">
                                                        <xsl:value-of
select="$p + 1"/>
                                                </xsl:when>
                                                <xsl:otherwise>
                                                        <xsl:value-of
select="1"/>
                                                </xsl:otherwise>
                                        </xsl:choose>
                                </start>
                                <end>
                                        <xsl:value-of select="string(.)"/>
                                </end>
                        </range>
                </xsl:for-each>
        </xsl:template>


        <!--
        Template: partition
        Partition a nodeset into a binary tree

        This DVC algorithm has a target time complexity of O(log(N)*N) and a
'copy' complexity of O(N) (only O(N) nodes are copied).

        Reducing the number of nodes by a half for each step is the crux of
the
algorithm. One way of splitting the nodes is by using the expressions:
'($nodes/*[position() &lt; $mid)' and '($nodes/*[position() &gt;= $mid)'
with $mid is 'count($nodes) div 2'. However this will introduce a 'copy'
complexity of O(log(N)*N).
        This implementation uses the 'following-sibling::*[I]' expression to
get
around this problem. One exception is that this operation must take O(I)
time instead of O(N) time to get the I'th sibling in a list of N nodes.
This, however depends on XSLT's implementation of 'following-sibling::*[I]'.
The upside is that if it takes O(1) time then the overall time complexity
will be O(N).
        -->

        <xsl:template name="partition">
                <xsl:param name="node"/>
                <xsl:param name="s"
select="(count($node/preceding-sibling::*)) + 1"/>
                <xsl:param name="e"
select="(count($node/following-sibling::*)) + $s"/>

                <xsl:if test="$node">
                        <xsl:element name="r">
                                <xsl:element name="s">
                                        <xsl:value-of select="$s"/>
                                </xsl:element>
                                <xsl:element name="e">
                                        <xsl:value-of select="$e"/>
                                </xsl:element>
                                <xsl:choose>
                                        <xsl:when test="$s = $e">
                                                <xsl:element name="leaf">
                                                        <xsl:copy-of
select="$node"/>
                                                </xsl:element>
                                        </xsl:when>
                                        <xsl:otherwise>
                                                <xsl:variable name="w"
select="floor(($e - $s + 1) div 2)"/>
                                                <xsl:variable name="m"
select="$s + $w"/>
                                                <xsl:call-template
name="partition">
                                                        <xsl:with-param
name="node" select="$node"/>
                                                        <xsl:with-param
name="s" select="$s"/>
                                                        <xsl:with-param
name="e" select="$m - 1"/>
                                                </xsl:call-template>
                                                <xsl:call-template
name="partition">
                                                        <xsl:with-param
name="node" select="$node/following-sibling::*[$w]"/>
                                                        <xsl:with-param
name="s" select="$m"/>
                                                        <xsl:with-param
name="e" select="$e"/>
                                                </xsl:call-template>
                                        </xsl:otherwise>
                                </xsl:choose>
                        </xsl:element>
                </xsl:if>
        </xsl:template>

        <!--
        Template: range-in-partition
        Selects a RANGE in a binary tree

        XSLT isn't really hmaking life easier but trying to do this in a
DVC style directly without the help of a binary tree is even harder.
        -->

        <xsl:template name="range-in-partition">
                <xsl:param name="p"/>
                <xsl:param name="s" select="$p/s/"/>
                <xsl:param name="e" select="$p/e/"/>

                <xsl:variable name="ps" select="number($p/s/)"/>
                <xsl:variable name="pe" select="number($p/e/)"/>

                <xsl:if test="$s &lt;= $pe and $e &gt;= $ps">
                        <xsl:if test="$ps = $pe">
                                <xsl:copy-of select="$p/leaf/*[1]"/>
                        </xsl:if>
                        <xsl:choose>
                                <xsl:when test="$s &gt; $ps">
                                        <xsl:variable name="s1"
select="$s"/>
                                        <xsl:choose>
                                                <xsl:when test="$e &lt;
$pe">
                                                        <xsl:variable
name="e1" select="$e"/>
                                                        <xsl:for-each
select="$p/*">

<xsl:call-template name="range-in-partition">

<xsl:with-param name="s" select="$s1"/>

<xsl:with-param name="e" select="$e1"/>

<xsl:with-param name="p" select="."/>

</xsl:call-template>
                                                        </xsl:for-each>
                                                </xsl:when>
                                                <xsl:otherwise>
                                                        <xsl:variable
name="e1" select="$pe"/>
                                                        <xsl:for-each
select="$p/*">

<xsl:call-template name="range-in-partition">

<xsl:with-param name="s" select="$s1"/>

<xsl:with-param name="e" select="$e1"/>

<xsl:with-param name="p" select="."/>

</xsl:call-template>
                                                        </xsl:for-each>
                                                </xsl:otherwise>
                                        </xsl:choose>
                                </xsl:when>
                                <xsl:otherwise>
                                        <xsl:variable name="s1"
select="$ps"/>
                                        <xsl:choose>
                                                <xsl:when test="$e &lt;
$pe">
                                                        <xsl:variable
name="e1" select="$e"/>
                                                        <xsl:for-each
select="$p/*">

<xsl:call-template name="range-in-partition">

<xsl:with-param name="s" select="$s1"/>

<xsl:with-param name="e" select="$e1"/>

<xsl:with-param name="p" select="."/>

</xsl:call-template>
                                                        </xsl:for-each>
                                                </xsl:when>
                                                <xsl:otherwise>
                                                        <xsl:variable
name="e1" select="$pe"/>
                                                        <xsl:for-each
select="$p/*">

<xsl:call-template name="range-in-partition">

<xsl:with-param name="s" select="$s1"/>

<xsl:with-param name="e" select="$e1"/>

<xsl:with-param name="p" select="."/>

</xsl:call-template>
                                                        </xsl:for-each>
                                                </xsl:otherwise>
                                        </xsl:choose>
                                </xsl:otherwise>
                        </xsl:choose>
                </xsl:if>
        </xsl:template>

</xsl:stylesheet>





 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.