Re: [xsl] Speeding up processing (with sablotron or saxon)

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: Speeding up processing (with sablotron or saxon)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 12 Jul 2004 19:03:57 -0400

Hi,

At 01:33 PM 7/12/2004, you wrote:

ok I have a piece of XSLT that processes a large XML file into smaller
chunks. The problem I have is that the deeper down into the XML file I am
processing the longer it takes. Is this just due to the way XSLT parsers
work or can I tweak my XSL file so it processes faster?

I get the same effect when I used to process the file as one pass using
Saxon Result:document as I do processing as seperate XSL files with either
Saxon or Sablotron.


This is the seperate file XSL file:- (Change the server[@name='Ahazi'] as
needed)
<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent='yes' encoding="utf-8"/>

<xsl:template match="server" />
<xsl:template match="server[@name='Ahazi']">
<resources>
<xsl:for-each
select=".//resource[not(@swgcraft_id=preceding::*/@swgcraft_id)]">

... this for-each is expensive. You are traversing the entire document looking for 'resource' elements; each one you find is examined by looking at all its preceding elements and comparing their @swgcraft_id attributes. When you have lots of elements, lots and lots of them are compared. (n^2 performance.)

Since this happens every time the template is matched (which could itself be lots of times), it adds up -- especially for the later nodes in your set (as you noticed).

An easy tweak to improve performance would be to use keys to de-duplicate instead of doing it by hand on the preceding:: axis.

So:

<xsl:key name="resource-by-id" match="resource" use="@swgcraft_id"/>

<xsl:variable name="resources" select="//resource"/> (binding //resource to a variable $resource so we don't have to retrieve it every single time)

then you can deduplicate in another variable declaration:

<xsl:variable name="unique-resources" select="$resources[not(count(.|key('resources-by-id',@swgcraft_id)[1]) = 1)]"/>

In English: $unique-resources is the collection of all resources which, when counted along with the first resource with the same swqcraft_id as themselves, amount to a single node (which is true only of the first one with each swgcraft_id).

This ought to help quite a bit.

Cheers,
Wendell


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread

RE: Speeding up processing (with sablotron or saxon), (continued)
- Michael Kay - Mon, 12 Jul 2004 19:09:19 +0100
- TDarksword - Mon, 12 Jul 2004 18:33:49 +0100
- TDarksword - Mon, 12 Jul 2004 18:33:49 +0100
- TDarksword - Mon, 12 Jul 2004 18:33:49 +0100
  - Wendell Piez - Mon, 12 Jul 2004 19:03:57 -0400 <=
    - TDarksword - Tue, 13 Jul 2004 15:57:44 +0100
    - Wendell Piez - Tue, 13 Jul 2004 11:33:19 -0400
    - TDarksword - Tue, 13 Jul 2004 18:49:47 +0100
    - Wendell Piez - Tue, 13 Jul 2004 14:24:46 -0400

<- Previous	Index	Next ->
Speeding up processing (with , TDarksword	Thread	Re: Speeding up processing (w, TDarksword
Problem copying xhtml element, cking	Date	question about XSLTC, Jun Yuan
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >