[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Streaming with XSLT version 3.0

Subject: Re: Streaming with XSLT version 3.0
From: Radu Pisoi <radu_pisoi@xxxxxxx>
Date: Tue, 11 Mar 2014 16:33:36 +0200
Re:  Streaming with XSLT version 3.0
Hi all,

First of all, I want to thank you all for your opinions and feedback on this thread.

Related to the OutOfMemory problem, this can happens if the transformation result is very large and the user choose to see it in the Results view. The Result view uses a simple text area which does not support loading such a large content.

This option can be disabled by editing the associated transformation scenario, open the 'Output' tab and unselect all the checkboxes from 'Show in results view as' section.

More details about how to configure the transformation scenario output can be found here:
http://oxygenxml.com/doc/ug-editor/#topics/the-output-tab.html#the-output-tab


I will add a feature request in our issue tracking system to improve the handling of this situation.

After I disabled displaying the transformation output in the Results view, I tried to transform a 3 GB file that has a similar structure with the one posted by Terry. In this case I found another problem: the execution time from oXygen is 6 times slower than running the transformation in the command line.

This happens because the Saxon-EE schema-based validation (-val:lax) feature is active by default when running a transformation with the Saxon-EE processor.

The main feature in the first Saxon-EE versions was the schema-aware validation (-sa switch). So, we assumed that the user choose to run with Saxon-EE because he wants schema aware validation.
Meanwhile, the list with features available in Saxon-EE has grown and now there are a lot more reasons to use Saxon-EE.


I will add an issue in our bug tracking system to reconsider the default for this option (-val).

To disable 'schema-aware validation' option you have to edit the associated scenario and press the 'Advanced Options' button located next to the Saxon-EE processor combo. The 'Advanced Options' button displays a dialog that allows you to customize Saxon-EE processor. In this dialog you have to choose 'Disable schema validation' for 'Validation on source file (-val)' option.

In conclusion, without showing the output in the result view and by disabling the schema-aware validation you will get the same execution time when running the transformation from oXygen and from the command line.

Regards,
Radu
--
Radu Pisoi
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 3/8/2014 23:14, Terry Badger wrote:
MIchael,
I did run the process successfully. See my notes here. I have reported it to Oxygen.
Details for running a large file with xslt v3 streaming
==========
Large source file is found here: http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles-multistream.xml.bz2
==========
Here is the result of Saxon running for a DOS shell with a respectable 21 minutes and no out-of-memory report
C:\Temp\wiki>C:\Progra~2\Java\jre7\bin\java -Xmx180m -Xss4096k -Xms48m -cp C:/saxon/saxon9ee.jar; net.sf.saxon.Transform -TJ -t -it:main  -o:C:/Temp/wiki/out/wiki-03-output.xml C:/Temp/wiki/xsl/wiki-03.xsl
Saxon-EE 9.5.1.4J from Saxonica
Java version 1.7.0_45
Using license serial number V001638
Generating byte code...
Stylesheet compilation time: 476 milliseconds
Processing  (no source document) initial template = main
URIResolver.resolve href="../source/enwiki.xml" base="file:/C:/Temp/wiki/xsl/wiki-03.xsl"
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Writing to file:/C:/Temp/wiki/out/output-wiki-03.xml
Execution time: 21m 24.612s (1284612ms)
Memory used: 25491272
NamePool contents: 28 entries in 27 chains. 7 URIs
==========
With this xsl stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns="http://www.mediawiki.org/xml/export-0.8/"
     xpath-default-namespace="http://www.mediawiki.org/xml/export-0.8/" exclude-result-prefixes="#all"
     version="3.0">
     <xsl:output method="xml"/>
     <xsl:variable name="root" select="/"/>
     <xsl:mode streamable="yes"/>
     <xsl:template name="main">
         <xsl:stream href="../source/enwiki.xml">
             <xsl:result-document href="../out/output-wiki-03.xml">
                 <count>
                     <xsl:iterate select="mediawiki/page">
                         <xsl:param name="count" select="0" as="xs:decimal"/>
                         <xsl:next-iteration>
                             <xsl:with-param name="count" select="$count+1"/>
                         </xsl:next-iteration>
                         <xsl:on-completion>
                             <xsl:value-of select="$count"/>
                         </xsl:on-completion>
                     </xsl:iterate>
                 </count>
             </xsl:result-document>
         </xsl:stream>
     </xsl:template>
</xsl:stylesheet>
============
With this result file
<?xml version="1.0" encoding="UTF-8"?>
<count xmlns="http://www.mediawiki.org/xml/export-0.8/%22%3E13355093%3C/count>
============
While running in Oxygen 15.2 with Saxon 9.5.1.3 with same source and stylesheet file after about an hour we had an out of memory error. I have reported it to Oxygen.



On Saturday, March 8, 2014 5:43 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
Could you try it outside oXygen? You can get a 30-day free Saxon-EE evaluation license to enable this. That will establish whether the problem is primarily a Saxon one or an oXygen one, which will make it a lot easier to help you.

Michael Kay
Saxonica

On 7 Mar 2014, at 23:10, Terry Badger <terry_badger@xxxxxxxxx> wrote:

David,
Thank you. I tried your suggestion but it still failed with an out-of-memory report.
Terry


On Friday, March 7, 2014 9:10 AM, David Rudel <fwqhgads@xxxxxxxxx> wrote: Terry, You can address the possibility that oXygen is simply choking on the output by wrapping your output in <xsl:result-document> instructions.

If you pipe output to a file, oXygen does not attempt to display it in
the application when the scenario completes. This would eliminate at
least one possible reason for the crash without requiring you to run
from the command line.

-David

On Fri, Mar 7, 2014 at 1:09 AM, Abel Braaksma (Exselt) <abel@xxxxxxxxxx> wrote:

It is also important to try to find out what is actually causing the
memory exception. If you run it from oXygen like you say, it is very
well possible that the exception comes from oXygen itself, not capable
of handling the output file. This would explain the late memory
exception. To find this out, simply run it from the command line, and
what what happens to memory in task manager.


--

"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.



--
Regards,
Radu

Radu Pisoi
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.