[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: memory usage of xslt processing

Subject: Re: memory usage of xslt processing
From: JAPISoft <public2@xxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 15:29:19 +0200
usage of xslt
Hello Michael,

Should it not depend on the XPath expressions from your XSLT ?

If I use "//*" for a document fragment, what possibility could be ?

I was thinking if a tool that could analysis the XPath expressions from an XSLT document and could create a kind of
graph nodes with the scope of the expressions could have a sens ?


Best regards,

A.Brillant


----- Original Message ----- From: "Michael Kay" <mike@xxxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, April 19, 2006 2:59 PM
Subject: RE: memory usage of xslt processing



XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.


Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.


Alternatively, why not retrieve the data from the database in
transformer-sized chunks?

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx]
Sent: 19 April 2006 13:36
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject:  memory usage of xslt processing

Hi,

I have the following task:
Create an arbitrary formatted file (XML/HTML/CSV whatever)
based on a Select from a database.

As a constraint the amount of data fetched from the database
can not be stored in memory as a whole.
Another constraint is that I can not use XML-functionality in
the database, I have to implement the functionality on top of
our database access framework. This database access framework
fetches record for record one after another.
And I have to use Java and Xalan.

My idea was to decorate every fetched row from the database
with simple generic XML and fire this to Xalan.

Let do an example:
If my result set from the database looks like:

ID  Name  Description
--  ----  -----------
1  "dog"  "an animal may be dangerous"
2  "cat"  "an animal likes milk"

I create the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
 <row>
  <value>1</value>
  <value>dog</value>
  <value>an animal may be dangerous</value>  </row>  <row>
  <value>2</value>
  <value>cat</value>
  <value>an animal likes milk</value>
 </row>
</dataset>

I create this XML as "Sax fire events" in an java
class[StringArrayXMLReader], which implements the
org.xml.sax.XMLReader interface.
I have three methods:

public void init() throws SAXException {
        ch.startDocument(  );
        ch.startElement("","dataset","dataset",EMPTY_ATTR);
}

public void close() throws SAXException {
        ch.endElement("","dataset","dataset");
        ch.endDocument(  );
}

public void parse(String [] input) throws SAXException {
        ch.startElement("","row","row",EMPTY_ATTR);
        for (int i = 0; i< input.length; ++i){
           ch.startElement("","value","value",EMPTY_ATTR);
           ch.characters(input[i].toCharArray(),
0,input[i].length(  ));
           ch.endElement("","value","value");
       }
       ch.endElement("","row","row");
}

The parse method creates the <row>...</row> entries for an
overhanded String array.
The StringArrayXMLReader is associated with a
TransformerHandler, which uses a XSL stylesheet to transform
the XML to the desired output.

What happens here is, that when the fetch from the database
starts I call init() ( and thus startDocument() ) and at
last, after the fetch finished, I call close() (and thus
endDocument()).
I observed that the xslt processing starts when endDocument()
is called.
This is not acceptable for me because I fear the xslt
processor reads all the rows into memory until endDocument()
is called and in this case I take a risk to run in OutOfMemory.

My second idea was to eliminate the init()/close() methods
and to consider one <row>...</row> section as complete
document input for the processor. This has the disadvantage
that I have to create the head and tail of the document
manually (and in my example I get a NullPointerException when
I the transformer is called twice).

I have the following questions:
Is it possible to create the output without having the whole
data in memory ?
The basis XML for xslt processing
<dataset>
  <row><value>...
  <row><value>...
</dataset>
looks very simple and the supplied XLS stylesheets will be
not complex so my hope is to get it working.
I also think that the task in general - produce formatted
output from a potential very large data pool - should be a common one.
Unfortunately I did not do much xslt-processing in the past
so I lack the experience (a bit libxslt which I feed a DOM tree).
If someone has some striking links I would very glad to hear.
My test code I provide at:

http://randspringer.de/sax_row.tar and
http://randspringer.de/sax.tar

If someone could have a look at it I would really appreciate it.

Thomas


--

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.