[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Speeding up processing (with sablotron or saxon)

Subject: Re: Speeding up processing (with sablotron or saxon)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 12 Jul 2004 19:03:57 -0400
piez method
Hi,

At 01:33 PM 7/12/2004, you wrote:
ok I have a piece of XSLT that processes a large XML file into smaller
chunks. The problem I have is that the deeper down into the XML file I am
processing the longer it takes. Is this just due to the way XSLT parsers
work or can I tweak my XSL file so it processes faster?

I get the same effect when I used to process the file as one pass using
Saxon Result:document as I do processing as seperate XSL files with either
Saxon or Sablotron.


This is the seperate file XSL file:- (Change the server[@name='Ahazi'] as needed) <?xml version="1.0"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" indent='yes' encoding="utf-8"/>

<xsl:template match="server" />
<xsl:template match="server[@name='Ahazi']">
<resources>
<xsl:for-each
select=".//resource[not(@swgcraft_id=preceding::*/@swgcraft_id)]">

... this for-each is expensive. You are traversing the entire document looking for 'resource' elements; each one you find is examined by looking at all its preceding elements and comparing their @swgcraft_id attributes. When you have lots of elements, lots and lots of them are compared. (n^2 performance.)


Since this happens every time the template is matched (which could itself be lots of times), it adds up -- especially for the later nodes in your set (as you noticed).

An easy tweak to improve performance would be to use keys to de-duplicate instead of doing it by hand on the preceding:: axis.

So:

<xsl:key name="resource-by-id" match="resource" use="@swgcraft_id"/>

<xsl:variable name="resources" select="//resource"/>
(binding //resource to a variable $resource so we don't have to retrieve it every single time)


then you can deduplicate in another variable declaration:

<xsl:variable name="unique-resources"
select="$resources[not(count(.|key('resources-by-id',@swgcraft_id)[1]) = 1)]"/>


In English: $unique-resources is the collection of all resources which, when counted along with the first resource with the same swqcraft_id as themselves, amount to a single node (which is true only of the first one with each swgcraft_id).

This ought to help quite a bit.

Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.