[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Seeking Examples of XSLT Memory Stress
On Wed, Aug 17, 2005 at 06:53:41PM +0100, Michael Kay wrote: >> If the document falls out of scope then both XSLT 1 and 2 allow >> an implementation to discard it. I don't think we'll see a >> procedural way to discard a document otherwise, except as >> part of something like the XQuery update facility perhaps. > In practice it's quite difficult to discard the document automatically. The > spec offers two guarantees: > > (a) if the same document (URI) is loaded again, you'll get the same node > identifiers > > (b) if the same document (URI) is loaded again, it will have the same > content > > It would be possible to discard the document and achieve (a) by remembering > the node identifiers and reusing them if needed. Yes. > Achieving (b) though is really hard, given that the URI might in the > worst case identify a random number generator. The only real way to do > it is to serialize a private copy of the document to disk. You could also behave differently depending on the URI scheme -- an extension to say "trust http expiry times and that the stylesheet will take no more than 3 hours to run :-) and trust that input files won't change on disk" might be interesting. > The real problem though is in deciding when it's a good idea to discard the > document. For example, if the stylesheet is working its way through the > @href links from the primary source document, what's the chance that you'll > want to visit the same target document more than once? Are there some special cases that are big wins in prctice? E.g. consider: <xsl:template match="foo"> <!--* load a 500MByte XML file: *--> <xsl:variable name="oed" select="doc('oed.xml')" /> <!--* do stuff with the dcument *--> <xsl:element name="word-of-the-day"> <xsl:copy-of select="/dictionary/a/entry[@id = 'ascii'] /> </xsl:element> </xsl:template> if you don't know how often the template matches I can see that you might want to cache the whole document in memory, but you have a couple of other choices -- (1) save the result of the template -- in this case it doesn't depend on anything other than the input document, and I've seen this usage often, e.g. to get a document title (2) drop the document if you get low on memory This case is very clear, but I don't know at what point it stops being optimiseable, and I'm sure you've thought about it a lot more than I have! :-) > That's why I decided > that in this case having a user function to tell me when the document is no > longer needed is rather more useful. I think it's a good compromise, but I agree with you it'd be hard to get consensus to add that to XPath F&O. Liam -- Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ http://www.holoweb.net/~liam/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|