[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [xslt performance for big xml files]

Subject: Re: [xslt performance for big xml files]
From: Aditya Sakhuja <aditya.sakhuja@xxxxxxxxx>
Date: Sun, 26 Apr 2009 11:18:04 -0700
Re:  [xslt performance for big xml files]
Thank you very much for the inputs ! As a result, my experiments have
shown some encouraging results too.

1> On splitting my 30 MB to 60 splits, I was able to get a massive
speed up. Overall transformation happening under 3 min.
2> Did some code level optimization (standard ones, avoiding non
essential computations.). Got a massive improvement here too. Tried to
avoid call-templates where ever possible. Replaced my loops with
apply-templates. eliminated usage of // totally.

Lot of performance gain till now. Looking to do some more code level
optimization in coming hours.

By the way, I am trying to do the split and merge using custom php
functions. Is there a more elegant way of doing this ?

Thanks,
Aditya

On Sat, Apr 25, 2009 at 1:50 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
>
> It's possible to write slow programs in any language, but high-level
> declarative languages like XSLT and SQL make it easier!
>
> I would think there's something in your code that makes the performance
> quadratic (or worse) with respect to source document size. This is nothing
> to do with XSLT, it's to do with your own code. To test this theory, try
> plotting how the stylesheet performance varies as you increase the document
> size: 1Mb, 2Mb, 4Mb, 8Mb.
>
> It's possible that a processor that optimizes joins (Saxon-SA is the only
> one I know) would get rid of the quadratic behaviour. On the other hand, it
> might not - without seeing your code, all is guesswork. The usual solution
> to quadratic behaviour, however, is to optimize your code "by hand" using
> keys.
>
> I would be very surprised if your transformation can't be done in under a
> minute by some appropriate tweaking. 30Mb is not big these days. The fact
> that you don't have a memory problem means that streaming isn't going to
> help.
>
> You might get a tenfold improvement just by running the same code under a
> different processor (or you might not), but you're looking for a factor of
> 1000 improvement, and unless you hit lucky with the optimizer, that will
> only come from improving your own code.
>
> Michael Kay
> http://www.saxonica.com/
>
>>
>> I am looking for some tips on performance tuning for my xslt
>> which processes a 30 MB file on an B avg. B I am haivng some
>> serious issues getting the processing completed under 24 hrs.
>> The transformation is completely CPU bound (memory usage is hardly 3-
>> 4 %). CPU utilization remains around 99% throughout.
>>
>> My specific question here is,whether these ideas would help
>> reduce processing time:
>>
>> 1> Splitting the big xml file to multiple files and then feeding it to
>> the xsltproc processor. does that sound the right thing, to
>> reduce the processing time overall.
>> 2> I have done my testing using xsltproc (libxml2). Would Saxon
>> processor be an advantage here to use?
>> 3> Does xslt processing not fit in for large xml file processing ?
>> Should I try looking other stream based processing over this,
>> if xslt does not scale ?
>>
>> I am performing experiments in parallel,but wanted to get in
>> feedback from more experienced people with xslt.
>>
>> Thanks in advance,
>>
>> --
>> -Aditya
>
>



--
-Aditya

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.