[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: I processed a 3GB XML file ... using XSLT streaming

  • From: Dimitre Novatchev <dnovatchev@gmail.com>
  • To: Peter Hunsberger <peter.hunsberger@gmail.com>
  • Date: Fri, 13 Sep 2013 13:06:28 -0700

Re:  I processed a 3GB XML file ... using XSLT streaming
And because Peter mentioned this, here are two facts one should be aware of:

1. When streaming one doesn't know until the end whether the stream is
"well-formed" or not. If after many hours of processing there is a
wellformedness error, it is likely that some side-effect was already
created (such as sending or posting), that cannot (or is too-late) to
be undone. Are we missing the "(long) transaction" concept here?

2. People usually forget that an XML document is two-dimensional.
There are XML documents that aren't so big in size, that can choke any
streaming processing. One only needs to construct an XML document with
big enough depth. As a streaming processor needs to keep track of all
ancestors of the current node, it will crash at a certain depth.

So, "size is not all" :), or do we need to redefine what "huge" means
wrt an XML document?


Cheers,
Dimitre

On Fri, Sep 13, 2013 at 12:47 PM, Peter Hunsberger
<peter.hunsberger@gmail.com> wrote:
> It might be worth noting that streaming XML has been around for some many
> years now (in spite of the W3C's belief in what makes a well formed
> document).  I think the first custom XML parser I ever wrote was for the
> Sports TIcker data feed probably not long after they first started up 15
> years ago...   The thing that has changed, and I think maybe the point of
> Roger's original post (whether he intended it or not), is the ability to use
> XSLT to handle the streaming data without having to resort to custom
> software.
>
> Peter Hunsberger
>
>
> On Fri, Sep 13, 2013 at 2:28 PM, Dimitre Novatchev <dnovatchev@gmail.com>
> wrote:
>>
>> > The point of streaming is (of course) being able to do such things
>> > without using much memory, even if it's slower. Not everyone has 96G, or
>> > even 16G of memory available... :-)
>>
>>
>> Absolutely true.
>>
>> Also, there could be cases when the XML data is generated in real time
>> continuously and non-stop,  and must be processed again in real time.
>> In such scenarios even terabytes of RAM wouldn't help.
>>
>>
>> Cheers,
>> Dimitre
>>
>> On Fri, Sep 13, 2013 at 12:14 PM, Liam R E Quin <liam@w3.org> wrote:
>> > On Fri, 2013-09-13 at 20:53 +0200, Hermann Stamm-Wilbrandt wrote:
>> >> I did give your non-streaming stylesheet (B) a try (A).
>> >> Slight modifications were necessary to get back to XSLT 1.0
>> >> (eq -> = , doc -> document).
>> >> 16GB of memory were used (17851470K-1067694K), (D).
>> >
>> > The point of streaming is (of course) being able to do such things
>> > without using much memory, even if it's slower. Not everyone has 96G, or
>> > even 16G of memory available... :-)
>> >
>> > Liam
>> >
>> > --
>> > Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
>> > Pictures from old books: http://fromoldbooks.org/
>> > Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
>> >
>> >
>> > _______________________________________________________________________
>> >
>> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> > to support XML implementation and development. To minimize
>> > spam in the archives, you must subscribe before posting.
>> >
>> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> > subscribe: xml-dev-subscribe@lists.xml.org
>> > List archive: http://lists.xml.org/archives/xml-dev/
>> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>> >
>>
>>
>>
>> --
>> Cheers,
>> Dimitre Novatchev
>> ---------------------------------------
>> Truly great madness cannot be achieved without significant intelligence.
>> ---------------------------------------
>> To invent, you need a good imagination and a pile of junk
>> -------------------------------------
>> Never fight an inanimate object
>> -------------------------------------
>> To avoid situations in which you might make mistakes may be the
>> biggest mistake of all
>> ------------------------------------
>> Quality means doing it right when no one is looking.
>> -------------------------------------
>> You've achieved success in your field when you don't know whether what
>> you're doing is work or play
>> -------------------------------------
>> Facts do not cease to exist because they are ignored.
>> -------------------------------------
>> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>> write all patents, too? :)
>> -------------------------------------
>> I finally figured out the only reason to be alive is to enjoy it.
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.