[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: [Question] How to do incremental parsing?

  • From: Jeff Lowery <jlowery@s...>
  • To: "'Xu, Mousheng (SEA)'" <Mousheng.Xu@s...>,"'xml-dev@l...'" <xml-dev@l...>
  • Date: Wed, 04 Jul 2001 10:33:24 -0700

incremental parsing
For large documents where:
	a) the target data is sparse (bits scattered throughout the large
document), and
	b) the location of the data is known (what branch it hangs off the
tree),
then you might get the best performance, speed- and memory-wise, using a
pull-based parser like kXML (www.enhydra.org). With a pull-based parser, you
lightly skip over the nodes you're not interested in until you find a node
that has content you're looking for. 

Caveats: 
1) I don't know if anyone has actually done performance testing to verify
the above claim, and 
2) kXML, at least, has some limitations, quote:
- kXML does not support user defined (external) entities. 
- The doctype declaration is not parsed. However, a corresponding "legacy
event" is generated by the    parser, so application programmers are able to
parse the doctype declaration themself 

> -----Original Message-----
> From: Xu, Mousheng (SEA) [mailto:Mousheng.Xu@s...]
> Sent: Tuesday, July 03, 2001 5:27 PM
> To: 'xml-dev@l...'
> Subject: [Question] How to do incremental parsing?
> 
> 
> Dear all,
> 
> A problem of all the current XML parsers is that they at 
> least read the
> whole XML document into the input stream, which can consume a 
> lot of memory
> when the XML is big (e.g. 1 GB).
> 
> One way to get around the problem would be to read the XML 
> file into memory
> gradually and when needed. I would like to build such a DOM 
> parser, but I am
> not familiar with the design of the Xerces XML parsers. Could 
> someone give
> me a suggestion on how to tackle on the problem? The most 
> critical part
> would be the method to parse an element. If reading the whole 
> document into
> memory is inevitable, then I would like to borrow the method 
> which parse the
> input stream to get the next element.
> 
> Your help is highly appreciated.
> 
> Thanks in advance.
> 
> -- Mousheng Xu 
> 
> 
> The information contained in this email is intended for the
> personal and confidential use of the addressee only. It may
> also be privileged information. If you are not the intended
> recipient then you are hereby notified that you have received
> this document in error and that any review, distribution or
> copying of this document is strictly prohibited. If you have
> received  this communication in error, please notify Celltech
> Group immediately on:
> 
> +44 (0)1753 534655, or email 'is@c...'
> 
> Celltech Group plc
> 216 Bath Road, Slough, SL1 4EN, Berkshire, UK
> 
> Registered Office as above. Registered in England No. 2159282
> 
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@l...
> 

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.