[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: memory usage of xslt processing

Subject: RE: memory usage of xslt processing
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 13:59:08 +0100
startelement value
XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.

Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.

Alternatively, why not retrieve the data from the database in
transformer-sized chunks?

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx] 
> Sent: 19 April 2006 13:36
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  memory usage of xslt processing
> 
> Hi,
> 
> I have the following task:
> Create an arbitrary formatted file (XML/HTML/CSV whatever) 
> based on a Select from a database.
> 
> As a constraint the amount of data fetched from the database 
> can not be stored in memory as a whole.
> Another constraint is that I can not use XML-functionality in 
> the database, I have to implement the functionality on top of 
> our database access framework. This database access framework 
> fetches record for record one after another.
> And I have to use Java and Xalan.
> 
> My idea was to decorate every fetched row from the database 
> with simple generic XML and fire this to Xalan.
> 
> Let do an example:
> If my result set from the database looks like:
> 
> ID  Name  Description
> --  ----  -----------
> 1  "dog"  "an animal may be dangerous"
> 2  "cat"  "an animal likes milk"
> 
> I create the following XML:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <dataset>
>  <row>
>   <value>1</value>
>   <value>dog</value>
>   <value>an animal may be dangerous</value>  </row>  <row>
>   <value>2</value>
>   <value>cat</value>
>   <value>an animal likes milk</value>
>  </row>
> </dataset>
> 
> I create this XML as "Sax fire events" in an java 
> class[StringArrayXMLReader], which implements the 
> org.xml.sax.XMLReader interface.
> I have three methods:
> 
> public void init() throws SAXException {
>         ch.startDocument(  );
>         ch.startElement("","dataset","dataset",EMPTY_ATTR);
> }
> 
> public void close() throws SAXException {
>         ch.endElement("","dataset","dataset");
>         ch.endDocument(  );
> }
> 
> public void parse(String [] input) throws SAXException {
>         ch.startElement("","row","row",EMPTY_ATTR);
>         for (int i = 0; i< input.length; ++i){
>            ch.startElement("","value","value",EMPTY_ATTR);
>            ch.characters(input[i].toCharArray(), 
> 0,input[i].length(  ));
>            ch.endElement("","value","value");
>        }
>        ch.endElement("","row","row");
> }
> 
> The parse method creates the <row>...</row> entries for an 
> overhanded String array.
> The StringArrayXMLReader is associated with a 
> TransformerHandler, which uses a XSL stylesheet to transform 
> the XML to the desired output.
> 
> What happens here is, that when the fetch from the database 
> starts I call init() ( and thus startDocument() ) and at 
> last, after the fetch finished, I call close() (and thus 
> endDocument()).
> I observed that the xslt processing starts when endDocument() 
> is called.
> This is not acceptable for me because I fear the xslt 
> processor reads all the rows into memory until endDocument() 
> is called and in this case I take a risk to run in OutOfMemory.
> 
> My second idea was to eliminate the init()/close() methods 
> and to consider one <row>...</row> section as complete 
> document input for the processor. This has the disadvantage 
> that I have to create the head and tail of the document 
> manually (and in my example I get a NullPointerException when 
> I the transformer is called twice).
> 
> I have the following questions:
> Is it possible to create the output without having the whole 
> data in memory ?
> The basis XML for xslt processing
> <dataset>
>   <row><value>...
>   <row><value>...
> </dataset>
> looks very simple and the supplied XLS stylesheets will be 
> not complex so my hope is to get it working.
> I also think that the task in general - produce formatted 
> output from a potential very large data pool - should be a common one.
> Unfortunately I did not do much xslt-processing in the past 
> so I lack the experience (a bit libxslt which I feed a DOM tree). 
> If someone has some striking links I would very glad to hear. 
> My test code I provide at:
> 
> http://randspringer.de/sax_row.tar and
> http://randspringer.de/sax.tar
> 
> If someone could have a look at it I would really appreciate it.
> 
> Thomas
> 
> 
> -- 

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.