[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Question on streaming and grouping with nested key

Subject: Re: Question on streaming and grouping with nested keys
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 Jul 2017 14:13:23 -0000
Re:  Question on streaming and grouping with nested key
On 14.07.2017 15:02, Felix Sasaki felix@xxxxxxxxxxxxxx wrote:


2017-07-14 14:41 GMT+02:00 Martin Honnen martin.honnen@xxxxxx <mailto:martin.honnen@xxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>>:

    On 14.07.2017 14:05, Felix Sasaki felix@xxxxxxxxxxxxxx
    <mailto:felix@xxxxxxxxxxxxxx> wrote:

I tried the example from Martin with

        <xsl:template match="TRANSACTION-LIST">
               <xsl:copy>
                  <xsl:for-each-group select="copy-of(TRANSACTION)"
        group-by="ITEM2/SUBITEM2/GROUPING-KEY">
                     <xsl:copy>
                        <item1-sum><xsl:value-of
        select="sum(current-group()/ITEM2/SUBITEM2.1)"/></item1-count>

...

        It gives me an of memory error. The input file is 160MB, but the
        individual transactions are rather small (around 20+ elements).
        The error also appears if I remove "<xsl:copy>".


160 MB doesn't sound like a file you need streaming for at all. Does
that suggestion above cause memory problems only when using
streaming (e.g. when you have <xsl:mode streamable="yes"/>) or also
without streaming?




Without streaming it works.

That sounds odd.




Thanks. Working without accumulators is fine, just trying to understand the issue. Other input files are a bit bigger, up to 1.5 GB, so having a streaming solution would be nice but it's not mandatory.

I have now tried to solve it with streaming accumulators, using


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    exclude-result-prefixes="xs math map"
    expand-text="true"
    version="3.0">

<xsl:param name="STREAMABLE" as="xs:boolean" static="yes" select="true()"/>

<xsl:mode _streamable="{$STREAMABLE}" on-no-match="shallow-skip" use-accumulators="item1-count subitem groups"/>

<xsl:output indent="yes"/>

<xsl:accumulator name="item1-count" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION" select="0"/>
<xsl:accumulator-rule match="TRANSACTION/ITEM1" select="$value + 1"/>
</xsl:accumulator>


<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>


<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>


<xsl:template match="TRANSACTION-LIST">
<xsl:copy>
<xsl:apply-templates/>
<xsl:variable name="groups" select="accumulator-after('groups')"/>
<xsl:for-each select="map:keys($groups)">
<transaction key="{.}">
<count>{$groups(.)?count}</count>
<amount>{$groups(.)?sum}</amount>
</transaction>
</xsl:for-each>
</xsl:copy>
</xsl:template>


</xsl:stylesheet>

I had thought, that, when matching on a text() node, it is possible to consume its value and Saxon does not complain about the accumulator

<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>


However, for the more complex one


<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>


it continues to complain with

Static error at xsl:accumulator-rule on line 33 column 136 of count-sum-accum1.xsl:
XTSE3430: The xsl:accumulator-rule/@select expression (or contained sequence constructor)
for a streaming accumulator must be motionless


As I have no other implementation to test (the Feb 2016 build of Exselt is too old to support the XSLT 3.0 final spec syntax details) I can't tell whether Saxon is right and I am afraid I still get lost when doing streamability analysis by hand.

When I disable streaming then the code seems to give the right result on some simplified test data

<?xml version="1.0" encoding="UTF-8"?>
<TRANSACTION-LIST>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
</TRANSACTION-LIST>

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.