Subject:Memory utilization high with multi pass Author:Metin Solmaz Date:07 Jul 2009 06:22 AM Originally Posted: 07 Jul 2009 06:14 AM
I have a memory problem with a stylesheet where I use the multi-pass (or pipeline) pattern. With relatively small input files, the memory utilization is not noticed. However, with bigger input files (e.g., > 6 Mb), both in xsltproc (libxslt 10118) and in saxon (6.5) the memory utilization becomes excessive (more than 1 Gb).
There are in total 13 passes (yes, I agree it seems a lot, but it was really necessary for the problem at hand). The first phase extends the input XML with some additional information (attributes). The second phase uses the output of the first phase and extends it with even more data and so on. The output of each phase is held in a variable.
A sketch of the multi-pass process:
<!-- start -->
<xsl:apply-templates mode="phase1" select="/"/>
<!-- end -->
When we look closer at the process, we will notice that each variable (phase) is referenced *only once*, namely in the following phase. Thus, in theory, the variable could be cleaned up (removed from memory) once it is not needed anymore. However, assuming this is not trivial to determine (it *is* in this particular case, but probably not in general), the xslt processor (at least xltproc and saxon) holds all variables in memory until they are out of scope (at the end of the template or for-each or...).
I tried an alternative, via exsl:document, with which each phase is stored in a file and loaded by the next phase via document(). But this does not solve the issue.
I guess, currently, the only real solution is to perform the pipeline outside the template (or use saxon:next-in-chain, but in our end-solution we must use xltproc (libxslt) as that is what we have integrated with). However, I think, especially with the multi-pass pattern, xslt processors could be more sophisticated and efficient with respect to memory utilization by cleaning up memory of variables that are (will not be) referenced anymore.
Subject:Memory utilization high with multi pass Author:Metin Solmaz Date:07 Jul 2009 12:10 PM
After further thinking about this, I think I found the solution within the template. Instead of storing the intermediate steps in variables and passing it on to the next step, I passed the output of step N to step N+1 as a parameter and so on (nested). This did indeed reduced the memory utilization to normal levels.