[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How to stream-process non-XML text using unparsed

Subject: Re: How to stream-process non-XML text using unparsed-text-lines( ) ?
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 26 Jul 2014 18:46:31 -0000
Re:  How to stream-process non-XML text using  unparsed
> Is unparsed-text-lines(...) always streaming,

It depends on the processor and on the way the code is written, but I think
that was also what Michael Kay tried to point out. If you keep a reference to
the result of fn:unparsed-text-lines and you go back and forth through the
sequence, there is little chance that it will be streamed, but in the
xsl:for-each approach, and probably with a SimpleMapExpr, you can be quite
certain that it is streamed.

I.e., this is streamable, and effectively filters the set so you can work with
only those lines you are interested in:
unparsed-text-lines()!.[contains(., 'hello world')]

> or is it streaming only when the
> XSLT contains:
>
> 	<xsl:mode streamable="yes" />

Whether fn:unparsed-text or fn:unparsed-text-lines are streamable is
irrespective of the streamable="yes" mode (or, more generally, whether you are
in a streamable context or not).  The reason is subtle: those functions return
strings, not a node set, and streamable rules only apply to streamed nodes.

We have considered adding rules specifically for this situation, but it proved
too complex and/or not worth the effort (can't really recall), because the
use-cases are already covered in the definition of fn:unparsed-text-lines and
the freedom it allows processors to implement it.

>
> The below XSLT program works great. Would it work differently if I were to
> remove the xsl:mode? How do I know that the input is being streamed?

Yes, it would work the same. And the only sure way that I know of to find out
whether the input is actually being streamed is by using a large input
document and monitoring the use of memory. However, even if you do see a lot
of memory consumed, it may still use streaming, because processors might
consider a certain buffering, potentially to the size of available memory,
beneficial in a particular scenario. To cancel that out, the input file must
be larger than available memory.

Note: if you want a sure-fire cross-processor way of streaming text, you can
also create a UriResolver that returns a text document as an XML document with
a (large) set of elements each containing one line or record of the text
input. Then you can use xsl:stream and create a guaranteed streamable
stylesheet (which turns out to be relatively simple, as your input will be
flat and each time you select a text-node, the streamability rules will take
into account that it cannot contain children, so the rules are a bit more
lenient).

Cheers,

Abel Braaksma
Exselt XSLT 3.0 streaming processor
http://exselt.net

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.