[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Re: Re: Re: Unbounded element grouping/concatenati

Subject: Re: Re: Re: Re: Unbounded element grouping/concatenation
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 12 Dec 2003 11:50:33 +0100
xsl next record
"Gupta, Raman K [CI]" <raman.k.gupta@xxxxxxxxxxxxx> wrote in message
news:C0CB45C72DDE2A4FA32370AB65CAEC8F059673@xxxxxxxxxxxxxxxxxxxxxxxxxx

[Nice timing results omitted]

> So all of the methods except recursion exhibit exponential
> behavior with an increase in continuation records except
> recursion.
>
> So if there is a way to do solve this problem recursively
> that would avoid the stack overflow errors (2.5.2 doesn't
> even do you the courtesy of throwing an exception -- it
> just dies in the middle of the transformation), that would
> still be by far the best solution.
>

Hi Raman,

I have both bad and good news for you.

The Bad News:
              The recursive algorithm isn't the fastest.

The Good News:
              I am enclosing here the code implementing the fastest (that I
know of) algorithm, anf it is non-recursive.


The idea is to get a string with the positions of all "record" nodes with
type="normal" among all "record nodes", then for a record with type="normal"
find its position in the string and also the position of the next record
with type="normal" The difference in these two positions - 1 is the number
of intermediate record nodes with type="continuation".

Here's the XSLT code:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >

  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:variable name="vposArray">
    <xsl:value-of select="'|'"/>
    <xsl:for-each select="/*/record">
      <xsl:if test="@type = 'normal'">
        <xsl:value-of select="concat(position(), '|')"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="@* | node()" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="records">
    <records>
      <xsl:apply-templates select="record"/>
    </records>
  </xsl:template>

  <xsl:template match="record">
    <xsl:choose>
      <xsl:when test="not(@type='normal')">
        <xsl:call-template name="identity"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="vPos" select="position()"/>
        <xsl:variable name="vposNext"
        select="substring-before(
                          substring-after($vposArray,
                                          concat('|',
                                                 position(),
                                                 '|'
                                                 )
                                          ),
                          '|'
                                )"/>
         <xsl:variable name="vNumNested"
           select="$vposNext - position() - 1"/>
         <xsl:copy>
           <xsl:copy-of select="@* | node()"/>
           <xsl:if test="$vNumNested > 0">
             <xsl:copy-of select=
               "following-sibling::record
                           [position() &lt;= $vNumNested]"/>
           </xsl:if>
         </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="record[not(@type='normal')]"/>
</xsl:stylesheet>


I also measured the time it takes with the three different algorithms
(recursive, with keys, non-recursive) with two different sorce xml
documents -- one having 1000 consecutive record nodes with
type="continuation", the other with 2000 such nodes. They look like this:

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="1000" type="continuation"/>
   <record n="1001" type="continuation"/>
   <record n="1002" type="continuation"/>
   <record n="1003" type="normal"/>
   <record n="1004" type="normal"/>
</records>

and

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="2000" type="continuation"/>
   <record n="2001" type="continuation"/>
   <record n="2002" type="continuation"/>
   <record n="2003" type="normal"/>
   <record n="2004" type="normal"/>
</records>


I tested the following XSLT processors: MSXML4, Saxon6.5.3, JD1.5.2,
XalanJ2.4.1, XalanC1.5

I also wanted to test a forth algorithm using generate-id(). It took MSXML4
233157 milliseconds on the 1000 nodes document and I had to kill it after 30
minutes with the 2000 nodes document. I did not attempt to test this with
the rest of the XSLT processors.

Here are the results:


Method                    MSXML4       Saxon               JD
XalanJ            XalanC
=============================================================
Recursive:
    1000 nodes                     37     Stack Ovfl.        150
Stack Ovfl.               170
    2000 nodes                   519     Stack Ovfl.   Stack Ovfl.   Stack
Ovfl.               621

Keys:
    1000 nodes                     44            1783           791
4677              1472
    2000 nodes                 2211            6890         2654
17466              5818

Non-Recursive:
    1000 nodes                    5.9                50           100
1152                   10
    2000 nodes                12.09              101           120
1342                   40



The non-recursive algorithm exhibits linear behaviour with MSXML4 and Saxon
and sub-linear! one with JD, XalanJ and XalanC.

The recursive and key-based algorithm exhibit quadratic behaviour.

XalanJ is clearly not in the same company as the rest of the tested
processors.


I could try still speeding up the non-recursive algorithm, by using a faster
search than linear to find the position of a record node in the string with
positions -- this will require that all positions must have the same (some
maximum) length. Or I could record the positions in a node-set, for which
binary search is straight-forward.

In case you are still not satisfied with the speed of the non-recursive
algorithm, just let me know :o)

Hope this helped.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL









 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.