[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Coding Optimization for big files

Subject: RE: Coding Optimization for big files
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 10 Mar 2004 22:57:33 -0000
xsl coding
I strongly suspect that your problem is nothing to do with memory, but is
because your code has O(n^2) algorithmic complexity. You could confirm this
by plotting the elapsed time against the size of input data.

It's fairly obvious (I hope) that your elapsed time is proportional to the
product of the number of nodes in allContracts and the number of nodes in
allSumGr22. If (as I suspect) the sizes of these two variables are both
proportional to the input size, then that's your O(n^2). I don't understand
your code well enough to tell you how to get rid of this, but the expression

$allSUMGr22[@customer=$pIndexCustomer][@contract=$pIndexContract]

Seems to be crying out to be replaced by a call on key().

There's nothing intrinsically wrong with having variables whose values
contain many nodes. It's the way you use them that counts. 

Michael Kay


# -----Original Message-----
# From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx [mailto:owner-xsl-
# list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Diego, Vitiello
# Sent: 10 March 2004 13:39
# To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
# Subject:  Coding Optimization for big files
# 
# Hi all,
# I would like to get your help about a performance problem I've
# experienced.
# I'm sure there are some workaround to overcame that problem, for example a
# physical splitting of the input xml file in several chunks (I have already
# tried and it works fine)
# But what I should need is only a logical splitting or just a better usage
# of variables/keys in the XSL coding.
# 
# The execution time for a small-medium files (size: 1,5MByte containing 100
# contracts and 1500 Gr22) is around 100 seconds.
# The execution time fot the biggest (worst case) file (size: 25MByte
# containing 1800 <contracts> where each contract has several <Gr22> for a
# total of around 30000 Gr22!!!) is 6 hours!!!
# I tried to analysed the problem and of course it is related to the memory
# loading of the variables allSUMGr22 and allContracts, and their access by
# the XSLT processor.
# The goal would be, for example to generate input xml files grouped by
# group of contracts <SUM groupId='1'> or to generate different tags for
# each group <SUM1>, <SUM2> etc...
# I guess I would need to define variables that don't required too much
# memory and that are able to filter the 30000 items.
# But I don't be sure that I can avoid defining big variables.
# 
# Is there any suggestions about this optimization?
# 
# Thanks in advance
# Diego
# 
# TRANSFORM.XML
# 
# <?xml version="1.0"?>
# <xsl:stylesheet version="1.0"
# xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
# 	<xsl:output version="1.0" method="xml" indent="yes"/>
# 
# 	<xsl:key name="gr22CustomerContractKey" match="/TIMM-
# MESSAGE/SUM/Gr22" use="concat(@customer,'|',@contract)"/>
# 
# 	<xsl:variable name="allSUMGr22" select="/TIMM-MESSAGE/SUM/Gr22"/>
# 	<xsl:variable name="allContracts" select="/TIMM-
# MESSAGE/SUM/Gr22[count(. | key('gr22CustomerContractKey',
# concat(@customer,'|',@contract))[1])=1][IMD/servicecodeid='DNNUM'][IMD/pro
# ductdes='8']"/>
# 
# 	<xsl:template match="/">
# 		<doc_result>
# 			<xsl:for-each select="$allContracts">
# 			<xsl:variable name="indexCustomer"
select="@customer"/>
# 			<xsl:variable name="indexContract"
select="@contract"/>
# 			<contract>
# 				<xsl:call-template name="getNumber">
# 					<xsl:with-param
name="pIndexCustomer"
# select="$indexCustomer"/>
# 					<xsl:with-param
name="pIndexContract"
# select="$indexContract"/>
# 				</xsl:call-template>
# 			</contract>
# 			</xsl:for-each>
# 		</doc_result>
# 	</xsl:template>
# 
# 	<xsl:template name="getNumber">
# 		<xsl:param name="pIndexCustomer"/>
# 		<xsl:param name="pIndexContract"/>
# 		<contract_number>
# 				<xsl:value-of
# select="$allSUMGr22[@customer=$pIndexCustomer][@contract=$pIndexContract]/
# IMD[productdes='8'][servicecodeid='DNNUM']/fulldesc"/>
# 		</contract_number>
# 	</xsl:template>
# </xsl:stylesheet>
# 
# XML structure
# 
# <TIMM-MESSAGE>
# <SUM>
# ...other tags...
# <Gr22 customer='1' contract='1'>
# <IMD>
# 	<productdes>8</productdes>
# 	<servicecodeid>DNNUM</servicecodeid>
# 	<shortdesc></shortdesc>
# 	<fulldesc>number1</fulldesc>
# </IMD>
# ...other tags...
# </Gr22>
# ...other Gr22 related to the customer='1' contract='1'...
# 
# <Gr22 customer='1' contract='2'>
# <IMD>
# 	<productdes>8</productdes>
# 	<servicecodeid>DNNUM</servicecodeid>
# 	<shortdesc></shortdesc>
# 	<fulldesc>number2</fulldesc>
# </IMD>
# ...other tags...
# </Gr22>
# ...other Gr22 related to the customer='1' contract='2'...
# 
# ...other Gr22 related to the customer='1' for all the other contracts...
# 
# </SUM>
# </TIMM-MESSAGE>
# 
#  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.