[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [Question] How to do incremental parsing?

  • From: James Strachan <james_strachan@y...>
  • To: Tony.Coates@r..., xml-dev@l...
  • Date: Wed, 04 Jul 2001 12:18:42 +0100

incremental parsing
From: <Tony.Coates@r...>
> On 04/07/2001 01:27:28 "Xu, Mousheng  (SEA)" wrote:
>
> >A problem of all the current XML parsers is that they at least read the
> >whole XML document into the input stream, which can consume a lot of
memory
> >when the XML is big (e.g. 1 GB).
>
[snip]
>
> So, "use SAX or a persistent DOM" for large XML files/streams is what I
would suggest.

I agree with David and Tony that both direct SAX or persistent DOMs can be
useful.

One alternative you might find useful is to use a document object model to
parse your large document but do it in a  'pruning mode'. Often massive
documents (e.g. 1GB) are often database generated and can contain many
'rows' (document fragments) which can be processed individually without
requiring the entire document in memory at once. e.g.

<products name="foo">
    <product id="1">
        <name>foo</name>
        ...
    </product>
    <product id="2">
        ...
    </product>
    ...
    <product id="10000000">
    ...
    </product>
</product>

For example the dom4j project has an event based call back mechanism, like
SAX, which can be used to process 'rows' of a massive document in a row by
row fashion which can then be pruned from the tree when finished with and
then garbage collected.

http://dom4j.org

The neat thing about this is you are called back with a complete valid
Document object that only contains one row (<product>) at a time and you can
still use dom4j's XPath support on all aspects of the Document as well as
using XSLT.

There's an example in the FAQ here:-

http://dom4j.org/faq.html
http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%20large%20XML%2
0documents?

James



_________________________________________________________

Do You Yahoo!?

Get your free @yahoo.com address at http://mail.yahoo.com




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.