[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: [Question] How to do incremental parsing?
For large documents where: a) the target data is sparse (bits scattered throughout the large document), and b) the location of the data is known (what branch it hangs off the tree), then you might get the best performance, speed- and memory-wise, using a pull-based parser like kXML (www.enhydra.org). With a pull-based parser, you lightly skip over the nodes you're not interested in until you find a node that has content you're looking for. Caveats: 1) I don't know if anyone has actually done performance testing to verify the above claim, and 2) kXML, at least, has some limitations, quote: - kXML does not support user defined (external) entities. - The doctype declaration is not parsed. However, a corresponding "legacy event" is generated by the parser, so application programmers are able to parse the doctype declaration themself > -----Original Message----- > From: Xu, Mousheng (SEA) [mailto:Mousheng.Xu@s...] > Sent: Tuesday, July 03, 2001 5:27 PM > To: 'xml-dev@l...' > Subject: [Question] How to do incremental parsing? > > > Dear all, > > A problem of all the current XML parsers is that they at > least read the > whole XML document into the input stream, which can consume a > lot of memory > when the XML is big (e.g. 1 GB). > > One way to get around the problem would be to read the XML > file into memory > gradually and when needed. I would like to build such a DOM > parser, but I am > not familiar with the design of the Xerces XML parsers. Could > someone give > me a suggestion on how to tackle on the problem? The most > critical part > would be the method to parse an element. If reading the whole > document into > memory is inevitable, then I would like to borrow the method > which parse the > input stream to get the next element. > > Your help is highly appreciated. > > Thanks in advance. > > -- Mousheng Xu > > > The information contained in this email is intended for the > personal and confidential use of the addressee only. It may > also be privileged information. If you are not the intended > recipient then you are hereby notified that you have received > this document in error and that any review, distribution or > copying of this document is strictly prohibited. If you have > received this communication in error, please notify Celltech > Group immediately on: > > +44 (0)1753 534655, or email 'is@c...' > > Celltech Group plc > 216 Bath Road, Slough, SL1 4EN, Berkshire, UK > > Registered Office as above. Registered in England No. 2159282 > > ------------------------------------------------------------------ > The xml-dev list is sponsored by XML.org, an initiative of OASIS > <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To unsubscribe from this elist send a message with the single word > "unsubscribe" in the body to: xml-dev-request@l... >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|