[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Using sibling value in streaming mode

Subject: Re: Using sibling value in streaming mode
From: "Martynas Jusevičius martynas@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 31 Aug 2019 10:11:57 -0000
Re:  Using sibling value in streaming mode
Thanks a lot for your suggestions Martin and Michael, very helpful to
understand how streaming works.

The maps are not that large, they are JSON objects coming from an API.

I see that reusing IDs should be possible but incurs some complexity
nonetheless. Which made me question, do I really need to reuse the IDs
from the input or is it sufficient to produce any kind of ID which is
stable for a given <map>.

I have tried the following

    <xsl:accumulator name="map-id" initial-value="()" streamable="yes"
as="xs:string?">
       <xsl:accumulator-rule match="/array/map" select="uuid:randomUUID()"/>
    </xsl:accumulator>

and this might just work for my purposes. I considered generate-id(.)
as an option, but I need globally unique IDs rather than
document-scoped.

On Sat, Aug 31, 2019 at 10:25 AM Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I think Martin has provided several options quite well, but perhaps another
angle will also be helpful.
>
> If the maps are reasonably small, then the simplest approach is "burst-mode"
or "windowed" streaming: In the template rule with match="map", bind a
variable to select="copy-of(.)", and then process the tree contained in that
variable in normal unstreamed fashion.
>
> If you want to achieve some level of streaming within the map, then clearly
it's not going to be perfect streaming; in the worst case, if the "id" comes
last, then you're going to have to buffer something in memory. Burst-mode
streaming buffers the input in memory; an alternative is to buffer the output,
which you can achieve using xsl:fork:
>
> <xsl:template match="map" mode="streamed">
>    <xsl:fork>
>      <xsl:sequence>
>         <id>{string[@key='id']}</id>
>      <xsl:sequence>
>      <xsl:sequence>
>         <xsl:apply-templates select="string[not(@key='id')]"
mode="streamed"/>
>      <xsl:sequence>
>    </xsl:fork>
> </xsl:template>
>
> If the maps are too large for that to be viable, then you could go for a
two-pass solution, In the first streamed pass over the input document,
construct an in-memory XDM map from position to id. In the second streamed
pass, as each <map> element is encountered, output the id obtained from this
XDM map, and then process all the children of the map (skipping the id) in
streamed mode.
>
> Another possibility that occurred to me is a self-merge. Use xsl:merge to
merge the file with itself, using the <map> element's position() as the merge
key (if that's possible); then extract the id from one of the merge inputs,
and the other values from the other. But that still requires memory
proportional to the largest map, because Saxon is going to hold the merge
groups in memory (the semantics require an implicit call on snapshot()).
>
> Michael Kay
> Saxonica
>
> On 30 Aug 2019, at 22:18, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I've started looking into streaming recently (using Saxon 9.9). I have
> a use case like this:
>
> Input:
>
> <array>
>    <map>
>       <string key="key1">value1</string>
>       ...
>       <string key="id">123456789</string>
>       ...
>       <string key="keyN">valueN</string>
>    </map>
>    ...
> </array>
>
> Required output:
>
> <items>
>    <item>
>       <id>123456789</id>
>       <key>key1<key>
>       <val>value1</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>id<key>
>       <val>123456789</val>
>    </item>
>    ...
>    <item>
>       <id>123456789</id>
>       <key>keyN<key>
>       <val>valueN</val>
>    </item>
>    ...
> </items>
>
> The value of <string key="id"> is used as <id> in <item> elements. The
> problem is that <string key="id"> can occur in any position in the
> <map>.
>
> I've tried using an accumulator such as
>
> <xsl:accumulator name="map-id" initial-value="()" streamable="yes"
> as="xs:string?">
>   <xsl:accumulator-rule match="/array/map/string[@key = 'id']/text()"
> select="string(.)"/>
> </xsl:accumulator>
>
> and then
>
> <item>
>    <id><xsl:value-of select="accumulator-before('map-id')"/></id>
>    ...
> </item>
>
> That worked partially -- only for sibling <string> elements that
> followed the <string key="id">. Which is not surprising.
>
> I've also tried accumulator-after('map-id') but got:
>
>  XTSE3430: Template rule is not streamable
>  * A call to accumulator-after() is consuming when there are no
> preceding consuming instructions
>
> Is it possible to have a streaming solution in this case?
>
> Martynas
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.