Back to one of my old complaints (nag, nag, nag ...)
I've got a very simple XSLT that stuffs two XML files into a container XML.
I just tried running it using the Stylus internal XSLT processor (5.0 build 127b) and it consumed over a Gig of RAM (mostly virtual - real slow - thrash that disk) trying to join together a 27Meg file with a 4.5Meg file.
Now I know the DOM can consume one order of magnitude more RAM than the size of the file. But both of these files had simple structures (although neither had a schema defined), so the DOM shouldn't be that hungry.
Subject:Re: Memory hogging Author:Minollo I. Date:24 Jul 2003 10:39 PM
It's surely not the DOM which is causing the memory consumption you are
seeing. More likely it's a matter of output handler and/or of large XPath
expressions/XSLT variables.
Would it be possible for you to send us a testcase, maybe working on a
smaller version of the XML documents?
Subject:Re: Memory hogging Author:Lee Humphries Date:25 Jul 2003 12:25 AM
Here's the XSLT in question - as you can it's pretty tiny. The file that it includes and throws into a variable is the 4.5Meg file.
To test it just take any two XML files, change the file name in the document function to match one of the files and supply the other as the scenario test case.
One reason why I assumed your implementation of the DOM may be involved is that we've experienced apparently similar issues with MSXML 4.0
We had been initially using the MSXML DOM when importing and processing some larger XML files (from 800kb to 60Mb) however we found it unbelievably slow, and it got exponentially slower (roughly) based on the number of elements in the XML file.
So we switched to using SAX.
Later though, I wanted to eliminate some of the handcoded validation we had done for our SAX implementation in favour of using an XSD, as the SAX based validation also seemed to be taking too long.
So I set up a DOM instance with the XSD applied against it first, then imported the XML. Now it runs a hell of a lot faster (about 3 orders of magnitude faster) and load time appears to be linear with respect to the number of elements, memory consumption is also drastically reduced.
That's why I was wondering whether your DOM implementation may be suffering with these large files not having any schemas defined against them.