[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Grouping problem with large files in .Net

Subject: Grouping problem with large files in .Net
From: "Frederik Willaert" <f.w@xxxxxxxxxxx>
Date: Mon, 7 Jun 2004 01:46:35 +0200 (Romance Daylight Time)
frederik willaert
Hi,
 
I have a problem with grouping large record-style XML documents using the 
Net XslTransform class.
 
My source document has the following structure:
 
<REPORT>
    <ROW>
        <CUSTOMER>XXX</CUSTOMER>
        <ACCOUNT>YYY</ACCOUNT>
        <HOURNUMBER>1</HOURNUMBER>
        <VALUE1>...</VALUE1>
        <VALUE2>...</VALUE2>
        <VALUE3>...</VALUE3>
        <!-- ... -->
    </ROW>
    <ROW>
            <!-- ... -->
    </ROW>
    <!-- ... -->
</REPORT>
 
 
The stylesheet I'm executing is the following:
 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3
org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="rows-by-customer" match="/REPORT/ROW" use="CUSTOMER"/>
<xsl:key name="rows-by-customer-and-account" match="/REPORT/ROW" use=
concat(CUSTOMER,'+',ACCOUNT)"/>
<xsl:template match="/REPORT">
    <Report>
        <xsl:for-each select="ROW[generate-id() = generate-id(key(
rows-by-customer', CUSTOMER)[1])]">
            <xsl:variable name="customer" select="CUSTOMER" />
            <Customer Name="{$customer}">
                <xsl:for-each select="key('rows-by-customer'
$customer)[generate-id() =
generate-id(key('rows-by-customer-and-account', concat(CUSTOMER,'+'
ACCOUNT))[1])]">
                    <xsl:variable name="account" select="ACCOUNT" />
                    <Account Name="{$account}">
                        <xsl:for-each select="key(
rows-by-customer-and-account',
concat(CUSTOMER,'+',$account))">
                            <xsl:copy-of select="." />
                        </xsl:for-each>
                    </Account>
                </xsl:for-each>
            </Customer>
        </xsl:for-each>
    </Report>
</xsl:template>
</xsl:stylesheet>
 
This performs a two-level grouping: by Customer, then by Account.
 
The source document can contain several tens of thousands of rows.
 
 
=> When performing this transformation using MSXML, performance is very
acceptible.< 1 minute for a file with 60000 records.
=> However, the same transformation using .Net (1.1) XslTranform seems to
take forever - haven't been able to have it processed completely so far...
Unfortunately, .Net is the intended platform.
 
==> Am I doing something wrong, is this a known problem, and/or can
something be done about this?
 
Remarks:
- I have also tried with the count(. | key('rows-by-customer', CUSTOMER)[1])
= 1 approach, same problem.
- I've found a document on MSDN mentioning that the xsl:key implementation
had a performance problem. However, this seems to apply to .Net v1.0 (?)
- Following recommendations, I'm using XPathDocument for the input file, and
a stream for the output - or would there be better options?
- I've included the source code for the transformation, and the timings of
several transformations (using MSXSL and XslTransform) below.
 
Any help would be greatly appreciated...
 
Thanks in advance,
Frederik
 
*****************
C# code to do transformation:
 
string folder = @"D:\Test\grouping\";
string inputUri = folder + "FlatInput.xml";
string stylesheet1uri = folder + "FlatInput2Grouped.xslt";
 
string outputUri = folder + "groupedOutput_XslTransform.xml";
 
DateTime beforeStart = DateTime.Now;
DateTime afterLoadingInput, afterLoadingStylesheet, afterTransform;
using(FileStream output = new FileStream(outputUri,FileMode.Create
FileAccess.Write,FileShare.Read))
{
XPathDocument inputDocument = new XPathDocument(inputUri);
afterLoadingInput = DateTime.Now;
 
XslTransform transform = new XslTransform();
 
transform.Load(
new XPathDocument(stylesheet1uri), 
null,
this.GetType().Assembly.Evidence);
afterLoadingStylesheet = DateTime.Now;
 
transform.Transform(inputDocument,null,output,null);
afterTransform = DateTime.Now;
}
 
******************
Timings:
 
MSXSL:
 
groupedOutput_verysmall_msxsl.xml (approx. 48 records)
---------------------------------
Source document load time: 27.68 milliseconds
Stylesheet document load time: 1.810 milliseconds
Stylesheet compile time: 1.266 milliseconds
Stylesheet execution time: 6.178 milliseconds
 
groupedOutput_small_msxsl.xml (144 records)
-----------------------------
Source document load time: 45.77 milliseconds
Stylesheet document load time: 2.145 milliseconds
Stylesheet compile time: 1.297 milliseconds
Stylesheet execution time: 48.66 milliseconds
 
groupedOutput_medium_msxsl.xml (approx. 10000 records)
------------------------------
Source document load time: 1507 milliseconds
Stylesheet document load time: 11.85 milliseconds
Stylesheet compile time: .648 milliseconds
Stylesheet execution time: 1634 milliseconds
 
groupedOutput_msxsl.xml (approx. 60000 records, 30MB file size)
-----------------------
Source document load time: 11276 milliseconds
Stylesheet document load time: 3.053 milliseconds
Stylesheet compile time: .652 milliseconds
Stylesheet execution time: 40403 milliseconds
 
============
 
XSLTRANSFORM:
(timings of second transformation, to exclude JIT compilation time)
 
groupedOutput_verysmall_XslTransform.xml (48 records)
----------------------------------------
Source document load time: 30 milliseconds
Stylesheet document load time: 10 milliseconds
Stylesheet execution time: 130 milliseconds
 
groupedOutput_small_XslTransform.xml (144 records)
------------------------------------
Source document load time: 50 milliseconds
Stylesheet document load time: 10 milliseconds
Stylesheet execution time: 270 milliseconds
 
groupedOutput_medium_XslTransform.xml (approx. 10000 records)
-------------------------------------
[SEVERAL HOURS]
 
groupedOutput_XslTransform.xml (approx. 60000 records, 30MB file size)
------------------------------
[FOREVER ?]

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.