[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: optimization for very large, flat documents

Subject: RE: optimization for very large, flat documents
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 19 Jan 2005 09:16:56 -0000
absolute path xpath performance
> I'm trying to process a very large (600 MB) flat XML document, a
> bibliography where each of the 400,000 entries is completely 
> independent
> of the others.  According to the Saxon web site and mailing 
> list, it'll
> take approx. 5-10 times that (3 GB) to hold the document tree 
> in memory,
> which is impractical.  The Saxon mailing list also has some tips about
> how to accomplish this, but my question is: Why doesn't XSLT provide a
> way to specify that a matched node can be processed 
> independently of its
> predecessor and successor siblings?  Alternatively, couldn't an XSLT
> processor infer that from the complete absence of XPath 
> expressions that
> refer to predecessor and successor siblings?

I think the reason that XSLT vendors have not tried this approach is:

(a) there are rather few stylesheets where the technique works, and can be
seen statically to work. It's not enough that all path expressions should
select downwards: there must be no absolute path expressions, no global
variables that select from the initial context node, no keys, and probably
quite a few other conditions besides.

(b) for such stylesheets, a completely different run-time approach is
needed: effectively, a different XSLT processor.

I think that in practice if you want to do serial transformation then a
functional language is not the right answer: if you can only look at each
piece of input data once, then you need the ability to remember what you
have seen, so you need a procedural language with updatable memory. That's
why STX was invented.

However, I think there is scope for someone to package up the idea of
running an XSLT transform on each "record" in a large file, and then
recombining the results.

Michael Kay
http://www.saxonica.com/ 

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.