[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Does <xsl:copy> use a lot of memory? Is there an

Subject: Re: Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Sun, 02 Sep 2012 15:31:01 +0100
Re:  Does <xsl:copy> use a lot of memory?  Is there an
Memory is used for the source document and for intermediate variables. In Saxon, and I suspect in most processors, no memory is used for the result tree provided that the transformation is writing directly to a serializer.

Intrinsically, all xsl:copy has to do is to send two events - startElement and endElement - to the serializer.

I would strongly suspect that the out of memory error occurs during building of the source tree, and will happen whatever transformation you run. For a 370Mb input document, you should probably allocate at least 2Gb of memory, preferably more.

Michael Kay
Saxonica

On 02/09/2012 13:47, Costello, Roger L. wrote:
Hi Folks,

Does <xsl:copy> use a lot of memory?

Is there an alternative that is more efficient?

Consider this problem. I have an XML document in which some elements have an id attribute and others have an idref attribute. If an element A references element B, then I want to embed B inside A.

Example: I want to convert this:

<Test>
     <A idref="b" />
     <B id="b" />
</Test>

to this:

<Test>
     <A>
         <B id="b" />
     </A>
     <B id="b" />
</Test>

Notice that A references B, and after processing B is nested inside A.

Here's a template that handles elements with a reference:

<xsl:key name="ids" match="*[@id]" use="@id"/>

<xsl:template match="*[@idref]">
<xsl:variable name="refed-element" select="key('ids', @idref)"/>
<xsl:copy>
<xsl:copy-of select="@* except @idref" />
<xsl:sequence select="$refed-element" />
</xsl:copy>
</xsl:template>


The complete program is below.

It works fine if:

(a) The XML document is small.
(b) I don't have to repeat this embedding process too many times.

However, such is not the case. I am dealing with an XML document that is 370 MB in size and has tens of thousands of references. And I have to repeat the embedding process multiple times.

Saxon gives me an "out of memory error."

I suspect the reason for this is due to the <xsl:copy> command. I believe it is making new copies, thereby consuming lots of memory. True?

So, is there an alternative to <xsl:copy> that is more efficient?

Is there a way to express the above template rule that is more efficient?

/Roger
-----------------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 exclude-result-prefixes="#all"
                 version="2.0">

<xsl:output method="xml" />
<xsl:key name="ids" match="*[@id]" use="@id"/>
<xsl:template match="*[@idref]">
<xsl:variable name="refed-element" select="key('ids', @idref)"/>
<xsl:copy>
<xsl:copy-of select="@* except @idref" />
<xsl:sequence select="$refed-element" />
</xsl:copy>
</xsl:template>
<xsl:template match="node()">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>


</xsl:stylesheet>

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.