[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Grouping problem with large files in .Net

Subject: RE: Grouping problem with large files in .Net
From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx>
Date: Mon, 7 Jun 2004 18:49:32 -0500
xsltransform.transform performance
Hi Frederik,

I've not used XslTransform yet for the type of transformation you're doing
so I cannot comment on that.

If the problems with XslTransform are this magnitude (and your results look
quite severe) then you could revert to using MSXML 4.0 as a COM object in
your .NET application.

I realize that this option may not be the best but it could serve you as a
temporal workaround. 

HTH,
<prs/>

-----Original Message-----
From: Frederik Willaert [mailto:f.w@xxxxxxxxxxx] 
Sent: Sunday, June 06, 2004 6:47 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject:  Grouping problem with large files in .Net

Hi,

 

I have a problem with grouping large record-style XML documents using the
Net XslTransform class.

 

My source document has the following structure:

 

<REPORT>

    <ROW>

        <CUSTOMER>XXX</CUSTOMER>

        <ACCOUNT>YYY</ACCOUNT>

        <HOURNUMBER>1</HOURNUMBER>

        <VALUE1>...</VALUE1>

        <VALUE2>...</VALUE2>

        <VALUE3>...</VALUE3>

        <!-- ... -->

    </ROW>

    <ROW>

            <!-- ... -->

    </ROW>

    <!-- ... -->

</REPORT>

 

 

The stylesheet I'm executing is the following:

 

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3

org/1999/XSL/Transform">

<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="rows-by-customer" match="/REPORT/ROW" use="CUSTOMER"/>

<xsl:key name="rows-by-customer-and-account" match="/REPORT/ROW" use=

concat(CUSTOMER,'+',ACCOUNT)"/>

<xsl:template match="/REPORT">

    <Report>

        <xsl:for-each select="ROW[generate-id() = generate-id(key(

rows-by-customer', CUSTOMER)[1])]">

            <xsl:variable name="customer" select="CUSTOMER" />

            <Customer Name="{$customer}">

                <xsl:for-each select="key('rows-by-customer'
$customer)[generate-id() =

generate-id(key('rows-by-customer-and-account', concat(CUSTOMER,'+'

ACCOUNT))[1])]">

                    <xsl:variable name="account" select="ACCOUNT" />

                    <Account Name="{$account}">

                        <xsl:for-each select="key(
rows-by-customer-and-account',

concat(CUSTOMER,'+',$account))">

                            <xsl:copy-of select="." />

                        </xsl:for-each>

                    </Account>

                </xsl:for-each>

            </Customer>

        </xsl:for-each>

    </Report>

</xsl:template>

</xsl:stylesheet>

 

This performs a two-level grouping: by Customer, then by Account.

 

The source document can contain several tens of thousands of rows.

 

 

=> When performing this transformation using MSXML, performance is very
acceptible.< 1 minute for a file with 60000 records.

=> However, the same transformation using .Net (1.1) XslTranform seems to
take forever - haven't been able to have it processed completely so far...
Unfortunately, .Net is the intended platform.

 

==> Am I doing something wrong, is this a known problem, and/or can
something be done about this?

 

Remarks:

- I have also tried with the count(. | key('rows-by-customer', CUSTOMER)[1])
= 1 approach, same problem.

- I've found a document on MSDN mentioning that the xsl:key implementation
had a performance problem. However, this seems to apply to .Net v1.0 (?)

- Following recommendations, I'm using XPathDocument for the input file, and
a stream for the output - or would there be better options?

- I've included the source code for the transformation, and the timings of
several transformations (using MSXSL and XslTransform) below.

 

Any help would be greatly appreciated...

 

Thanks in advance,

Frederik

 

*****************

C# code to do transformation:

 

string folder = @"D:\Test\grouping\";

string inputUri = folder + "FlatInput.xml";

string stylesheet1uri = folder + "FlatInput2Grouped.xslt";

 

string outputUri = folder + "groupedOutput_XslTransform.xml";

 

DateTime beforeStart = DateTime.Now;

DateTime afterLoadingInput, afterLoadingStylesheet, afterTransform;

using(FileStream output = new FileStream(outputUri,FileMode.Create
FileAccess.Write,FileShare.Read))

{

XPathDocument inputDocument = new XPathDocument(inputUri);

afterLoadingInput = DateTime.Now;

 

XslTransform transform = new XslTransform();

 

transform.Load(

new XPathDocument(stylesheet1uri), 

null,

this.GetType().Assembly.Evidence);

afterLoadingStylesheet = DateTime.Now;

 

transform.Transform(inputDocument,null,output,null);

afterTransform = DateTime.Now;

}

 

******************

Timings:

 

MSXSL:

 

groupedOutput_verysmall_msxsl.xml (approx. 48 records)

---------------------------------

Source document load time: 27.68 milliseconds

Stylesheet document load time: 1.810 milliseconds

Stylesheet compile time: 1.266 milliseconds

Stylesheet execution time: 6.178 milliseconds

 

groupedOutput_small_msxsl.xml (144 records)

-----------------------------

Source document load time: 45.77 milliseconds

Stylesheet document load time: 2.145 milliseconds

Stylesheet compile time: 1.297 milliseconds

Stylesheet execution time: 48.66 milliseconds

 

groupedOutput_medium_msxsl.xml (approx. 10000 records)

------------------------------

Source document load time: 1507 milliseconds

Stylesheet document load time: 11.85 milliseconds

Stylesheet compile time: .648 milliseconds

Stylesheet execution time: 1634 milliseconds

 

groupedOutput_msxsl.xml (approx. 60000 records, 30MB file size)

-----------------------

Source document load time: 11276 milliseconds

Stylesheet document load time: 3.053 milliseconds

Stylesheet compile time: .652 milliseconds

Stylesheet execution time: 40403 milliseconds

 

============

 

XSLTRANSFORM:

(timings of second transformation, to exclude JIT compilation time)

 

groupedOutput_verysmall_XslTransform.xml (48 records)

----------------------------------------

Source document load time: 30 milliseconds

Stylesheet document load time: 10 milliseconds

Stylesheet execution time: 130 milliseconds

 

groupedOutput_small_XslTransform.xml (144 records)

------------------------------------

Source document load time: 50 milliseconds

Stylesheet document load time: 10 milliseconds

Stylesheet execution time: 270 milliseconds

 

groupedOutput_medium_XslTransform.xml (approx. 10000 records)

-------------------------------------

[SEVERAL HOURS]

 

groupedOutput_XslTransform.xml (approx. 60000 records, 30MB file size)

------------------------------

[FOREVER ?]

--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--+--

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.