[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Streaming XML (WAS: More on taming SAX (was Re: [xm

  • To: "Daniela Florescu" <dflorescu@m...>,"XML Developers List" <xml-dev@l...>
  • Subject: RE: Streaming XML (WAS: More on taming SAX (was Re: ANN: Amara XML Toolkit 0.9.0))
  • From: "Dare Obasanjo" <dareo@m...>
  • Date: Tue, 28 Dec 2004 07:10:55 -0800
  • Thread-index: AcTsTt1eEeC8KsgaSEm2Yor72/U5RAAn5+IC
  • Thread-topic: Streaming XML (WAS: More on taming SAX (was Re: ANN: Amara XML Toolkit 0.9.0))

streaming xml
As someone who was until very recently "one of those implementers" I completely disagree with you. We had customers who want to process XML documents that hundreds of megabytes to gigabytes in size who can't afford to materialize even a fraction of these documents in certain cases. Then there were customers who wanted to process thousands of XML documents per minute and couldn't afford to overhead of object creation/memory consumption/GC. Using XQuery or XSLT in such scenarios even with various optimization tricks just wouldn't cut it. 
 
Every paper I've seen on streaming XML assumes some forward only processing OR is just wrong. Instead of telling folks to use Google Scholar or CiteSeer to find relevant works are there any techniques in any papers in particular you want to highlight. 
 
-- 
PITHY WORDS OF WISDOM
Eat right, Exercise, Die anyway.   

________________________________

From: Daniela Florescu [mailto:dflorescu@m...]
Sent: Mon 12/27/2004 12:00 PM
To: XML Developers List
Subject: Re:  Streaming XML (WAS: More on taming SAX (was Re:  ANN: Amara XML Toolkit 0.9.0))



>     I've thought about using an XPath tracker in error reporting to
>     my library, which would be very simple to add at this point, and
>     it's necessary, I think because the document locator loses
>     meaning when I chain together a bunch of SAX filters.

..........

>
>     In any case, I'm reading through some of the other articles
>     you've been posting. This is a very interesting discussion.

I read with great interest the whole discussion about XML streaming and
SAX,
and I have to admit that I am very confused by it.

Could you guys please try to clarify for me the answer to the following
question: instead
hand coding steaming applications using SAX, couldn't you write some
XQuery code (with external functions probably) to do the same thing ?
Did you try at least ? Did you try and fail ? If yes, why did it fail ?

My hope is that at a certain point people will stop writing low level
code, and they'll
rely on good implementations of XQuery to do the right amount of
streaming, in the
optimal way. That should be vendor's problem, not user's problem.

Other question: why do you people care about "perfect" streaming, i.e.
streaming
with zero memory consumption ? Between perfect streaming and total
materialization
there is a world of possibilities in between, where materialization
happens, but only
restricted to the minimum amount of data required to compute the
answer, and only
for the minimum amount of time necessary to compute the correct answer.

Perfect streaming happens too rarely to be of any interest. What is
interesting is all this
world in between.

Anyway, I believe that people shouldn't try to hand code their
applications using low level
APIs like SAX or STAX, but use a higher level language like XSLT or
XQuery, and trust the
XQuery/XSLT implementors that they'll  do a good job to minimize memory
consumption.
That's *their* job, not  *yours* as users.

But anyway, for those interested in streaming processing XML, the
database
research might come in handy. There have been several studies of the
problem in the
literature. For example you could find some of it at

http://citeseer.ist.psu.edu/

  searching for "streaming XML"; starting from there you might find some
interesting papers.

Best regards, happy holidays,
Dana




-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://www.oasis-open.org/mlmanage/index.php>




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.