[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: more on XSLT processor performance

Subject: RE: more on XSLT processor performance
From: "Paulo Gaspar" <paulo.gaspar@xxxxxxxxxxxx>
Date: Wed, 2 Aug 2000 13:21:29 +0200
xslt smaller than
> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxx]On Behalf Of Kay Michael
> 
> Compiling won't solve the memory problem. If we're going to make XSLT
> processing of such large files practical, the only way we'll do it is by
> using persistent storage rather than memory for the tree.

I suggested in the Apache's Xalan-J-Dev mailling list the use of
indexed persistent storage.

This is the relevant bit:

If you are talking indexed XML, I also believe so.

I have several ideas on indexing XML for XPath access, but the trouble 
is always to know what to index.

For me, a funny transform cycle concept is:
 1. Analyze the XSLT source and figure out what kind of (XPath) 
    selections from a source document are necessary in order to get 
    all the nodes required for the transformation;
 2. Pre-parse the document indexing only the parts found to be 
    relevant on 1. One should end up with index information much 
    smaller than the full XML source - small enough to fit in memory;
 3. Use a "XLocator" that knows how to use this index to perform the 
    XSLT transformation.

Example of "parts found to be relevant": if you find that the XSLT 
only causes the selection of some elements from the XML source, than 
only the location of those elements should be indexed.

If you use this idea to transform a XML stream, you need to save that
XML (or maybe only relevant parts of it) to temporary disk storage
an build the index information. Only than you proceed generating the 
output stream.
(For the most generic cases. I am not considering that some cases 
could be handled on the fly, as already mentioned in this list.)


In cases where one has a XSLT that gets a small amount of data from
a very big XML file, this approach can be faster than trying to build
a DOM:
 - A full pass is always necessary, but then you only re-read a small
   amount of data (thanks to the indexing);
 - Even during the full pass, full paRsing of the file can be avoided;
 - Creating an index can require much less processing than creating 
   a DOM;
 - Since the index requires less memory use, Virtual Memory use is 
   avoided (less disk swapping).


I know my language is not formaly correct, but...
...does this make sense?


Have fun,
Paulo Gaspar


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.