[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XSLT3 - Streaming + Recursive File Output

Subject: Re: XSLT3 - Streaming + Recursive File Output
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Aug 2016 11:02:32 -0000
Re:  XSLT3 - Streaming + Recursive File Output
> On 12 Aug 2016, at 11:23, Mailing Lists Mail daktapaal@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Dr. Kay.
> Thank you for your explanation. This is my first ever streaming stylesheet
and your explanations are very educational to me. I have some questions.
>

> In your point A, you said we can switch off the multi Threading in the
result document. How do we do that?
>

You can switch off multi-threading globally as a configuration option e.g.
from the command line

--allow-multithreading:off

(note two initial hyphens)

Alternatively, write

<xsl:result-document .... saxon:asynchronous="no"
xmlns:saxon="http://saxon.sf.net/">

to switch it off for a specific xsl:result-document instruction.

> In point B, foreach , you typed idiv .. should it be div ? is it a typo or
is there a new operator called idiv
>
>
Introduced in XPath 2.0, idiv does integer division. So elements 1 to 10000
have grouping key 0, 10001 to 20000 have grouping key 1, etc.
> Point c. Changing initial unnamed template to streamable produced no
results. No files generated. Also in the examples given in the spec i did not
see any mode on the initial template
>
>
We would need to see how you are invoking the transformation. Sorry, I now see
there is an xsl:stream instruction inside the match="/" template, so
presumably you are supplying a dummy source document, which of course doesn't
need to be streamed. Normally I use a named template entry point for such
stylesheets. XSLT 3.0 recognizes <xsl:template name="xsl:initial-template"/>,
and in Saxon you can then use -it (with no template name) to select this as
the entry point, avoiding the need for a dummy source document.

Michael Kay
Saxonica
> Thank you Michael for your insights .. i have learned a lot by asking the
question.
>
> Dak
>
>
> On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx
<mailto:mike@xxxxxxxxxxxx>" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
> (A) don't equate xsl:fork with multi-threading. In fact, the current
implementation of xsl:fork in Saxon is not multi-threaded (xsl:result-document
might be, but you can switch it off). (Saxon's streamed processing uses a push
model, which complicates many things, but pushing parser events to multiple
consumers doesn't require multitple threads).
>
> (B) I think your recursive named template can be replaced with a streamable
call on xsl:for-each-group, something like
>
> <xsl:for-each-group select="*:species" group-adjacent="(position()-1) idiv
1000">
>   <xsl:result-document href="species{position()}.xml">
>     <species><xsl:copy-of select="current-group()"/></species>
>   </xsl:result-document>
> </xsl:for-each-group>
>
> Compared with your approach, this solution has the advantage of not imposing
an arbitrary limit on the number of elements to be processed.
>
> (C) I would expect the initial unnamed mode should be streamable.
>
> (D) In the latest XSLT 3.0 we've provided "streamable stylesheet functions"
- not yet implemented in Saxon - but we stopped short at streamable named
templates. But you couldn't do this kind of batching using streamable
stylesheet functions either. A human reader can see in your code that the Nth
recursive call of the template is always processing nodes that are later in
document order than the (N-1)th recursive call, but it would require a
phenomenal amount of analysis for a theorem-prover to establish that during
static analysis, and even if you could prove it streamable, generating a
streamable execution plan would be far from trivial.
>
> Michael Kay
> Saxonica
>
>
> > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx
<mailto:daktapaal@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
> >
> > Dear All,
> > I have the following problem to solve using XSLT3 Streaming , which I
> > have been trying for some time now and i find a road block no matter
> > which way I choose. Seems to be an interesting issue to solve, which
> > when resolved, will be a very good learning for me.
> >
> > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming)
> >
> > I am using : SaxonEE9-7-0-7J
> >
> > Problem Definition
> >
> > 1. Remove a set of nodes(Species) from the source
> > tree(UniverseKingdom.xml), which can be  around 1000,000
> > 2. Create a File called UniverseKingdom-without-species.xml which has
> > every element in UniverseKingdom, except the Species nodes
> > 3. Create batches of 1000 species and throw them out into
> > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the
> > Species are covered.
> >
> > So when the Program runs, I get
> > 1. UniverseKingdom-without-species.xml  and 1000 files , each with
> > 1000 Species, with appropriate file names
> > AnimalKingdomSpeciesBatch1.xml ... to
> > AnimalKingdomSpeciesBatch1000.xml
> >
> > What I did so far ( after many attempts and which I thought should
> > work  but did not work )
> > <xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform
<http://www.w3.org/1999/XSL/Transform>"
> >    xmlns:xs="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>">
> >    <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/>
> >    <xsl:strip-space elements="*"/>
> >    <xsl:output method="xml" indent="yes"/>
> >    <xsl:template match="/">
> >        <xsl:result-document
href="output\UniverseKingdom-without-species.xml">
> >            <xsl:stream href="UniverseKingdom.xml">
> >                <xsl:fork>
> >                    <xsl:sequence>
> >                        <xsl:apply-templates mode="stream"/>
> >                    </xsl:sequence>
> >                    <xsl:sequence>
> >                        <xsl:for-each
> > select="*:UniverseKingdom/*:AnimalKingdom">
> >                              <!-- Call Recursive Templates here -->
> >                            <xsl:call-templates
name="batch-animal-species"/>
> >                        </xsl:for-each>
> >                    </xsl:sequence>
> >                </xsl:fork>
> >            </xsl:stream>
> >        </xsl:result-document>
> >    </xsl:template>
> >    <xsl:template name="batch-animal-species">
> >        <xsl:param name="limit" select="1000000"/>
> >        <xsl:param name="batch" select="1"/>
> >        <xsl:param name="start" select="1"/>
> >        <xsl:param name="end" select="1000"/>
> >        <xsl:if test="$start &lt;= $limit ">
> >            <xsl:result-document
> > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml">
> >                <species>
> >                    <xsl:for-each select="*:species[position() =
> > ($start to $end) ]">
> >                        <species>
> >                            <xsl:copy-of select="."/>
> >                        </species>
> >                    </xsl:for-each>
> >                </species>
> >            </xsl:result-document>
> >            <xsl:call-template name="batch-animal-species">
> >                <xsl:with-param name="batch" select="$batch+1"/>
> >                <xsl:with-param name="start" select="$end+1"/>
> >                <xsl:with-param name="end" select="$end+1000"/>
> >            </xsl:call-template>
> >        </xsl:if>
> >    </xsl:template>
> >    <xsl:template match="*:species" mode="stream"/>
> > </xsl:stylesheet>
> >
> >
> > Here, the issue was with the template batch-animal-species . Saxon
> > Throws Error :
> >
> > e:\perf\xslt3>java  -jar saxon9ee.jar   str.xml splitter.x
> > sl  -o:StreamAni.xml
> > Static error at xsl:template on line 22 column 91 of splitter.xsl:
> >  XTSE3430: Template rule is declared streamable but it does not
> > satisfy the streamability rules.
> >  * Operand . of CallTemplate#batch-animal-species selects streamed nodes
in a
> > context
> >  that allows arbitrary navigation (line 43)
> > Errors were reported during stylesheet compilation
> >
> >
> > I know that the logic for chunking various batched files could be made
> > better or even questionable.. But I was not expecting that the
> > Call-Template will fail.
> >
> > I am hoping some ninja warriors of XSLT3 can help me with this issue//
> > Seriously can not take No for an answer :) a lot is dependent on this
> > ...
> >
> > Also, if someone can think of an intelligent way for me to get this
> > done with a smarter code, and possibly without using fork( there is a
> > admin sitting somewhere in the System who has asked us to create code
> > without the multiple threads. He wants to be responsible for the
> > number of threads and discourages people from spawning multiple
> > threads. If not possible, then I will enforce that forking has to be
> > done.)
> > Please help ...
> > Dak.Tap
> >
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <-list/293509> (by email <>)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.