[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Rick Jelliffe's article on XSLT 1.0 performance

  • From: Michael Kay <mike@saxonica.com>
  • To: Rick Jelliffe <rjelliffe@allette.com.au>
  • Date: Sun, 12 Feb 2017 09:29:02 +0000

Re:  Rick Jelliffe's article on XSLT 1.0 performance
> 
> I agree that  benchmarking the usual expected cases is just as important as benchmarking edge cases prone to blowouts.  But I am not so sure why you think we cannot add the numbers up

Basically, there are some workloads where compile-time matters (because you're compiling the stylesheet once per transformation), and there are others where it doesn't (the typical case being where you're doing server-side transformation and the cost of compiling the stylesheet is amortized over thousands of transformations). Historically, that's the workload we have optimized for (and we're now realising that might have been a mistake). To provide figures that can be extrapolated to those very different kinds of workload, you really need more than one number.

> I think it shows that for large documents, the cost of XML parsing is utterly dwarfed by the costs of the in-memory data structures and algorithms used for processing. 

That's not what I'm seeing. For a minimal parse of a 119Mb input file (Apache Xerces, SAX parse to a do-nothing ContentHandler), run 10 times to assess VM warmup time, I'm seeing the timings:

Time: 735ms
Time: 555ms
Time: 494ms
Time: 517ms
Time: 486ms
Time: 503ms
Time: 508ms
Time: 497ms
Time: 504ms
Time: 486ms
Average of last 8: 499ms

(In future, I'll just give the "average of last 8")

If I now do a JAXP identity transform on the same input, from SAXSource to SAXResult (no XSLT involved, but using Saxon-HE as the identity transformer), so I now have parsing cost plus serialization cost (the serialized output is in memory and is discarded), the timings become

Average of last 8: 1155ms

suggesting that the serialization cost is about 650ms.

Now let's measure parsing plus tree-building, by doing Saxon's configuration.buildDocument() on the same source. This gives timings:

Average of last 8: 989ms

So the picture we are getting is: parsing 500ms, tree-building 500ms, serialization 650ms (total 1650ms)

If I now substitute the JAXP identity transformer with a stylesheet identity.xsl containing the classic identity transformation rule, and with stylesheet compilation out of the measurement loop, I'm seeing transformation times of:

Average of last 8: 2514ms

Which suggests: parsing 500ms, tree-building 500ms, transformation 850ms, serialization 650ms

That is, the actual transformation cost is about one-third of the total.

If we replace the identity transform by a minimal transformation that just does <xsl:template match="/"/>, I get timings:

Average of last 8: 953ms

which is essentially the same as the parsing plus tree building time, so the transformation adds no cost.

If instead I make the root template compute count(//*), the cost increases only to 1027ms, so it is still hardly more than the parsing and tree-building cost.

I think that for many simple "recursive descent" stylesheets that process each input node once and don't do a great deal of computation, the costs are not very different from these. Certainly, the parsing cost is not "dwarfed".

The compilation time for the identity stylesheet is being reported as 3ms, which is negligible for this scenario (large source document, tiny stylesheet) - but when you switch to a scenario with a 10Kb document transformed through the Docbook stylesheets, it's the compilation cost that dominates entirely.

> 
> If Saxon is relatively weak at compilation time, why did you drop the pre-compiled stylesheet capability?

Two reasons: (a) it didn't work reliably enough, and (b) hardly anyone used it. In fact it's replaced in current releases by the stylesheet export capability. But one of the lessons we have learned is that people won't sit down and spend a couple of hours working out how to improve the performance of their workload unless things are really chronically bad. Moreover, when they compare products against each other, they won't usually adjust the way they run their tests so that each product performs at its best.

Michael Kay
Saxonica


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.