[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Handling very large instance docs
On Thursday 29 April 2004 3:28 pm, you wrote: > > >At the very least I need to be able to sequentially process a large > > >document and extract an identified sub-tree (ideally denoted by an > > >XPath expression) for run-of-the-mill tools to manipulate. I assume > > >such a beast would need to be based on a SAX parser. > > > > I did exactly that in Python. I considered building an engine that > > could filter SAX events to those that match a limited version of > > XPath, but ran out of gas. I ended up with a just regular SAX > > application. > > Interesting - I always thought such a thing is useful, but haven't > come across implementation. > The main problem is obviously getting a good range of expression types to evaluate correctly and at high performance, its a hard problem. A good starting point for reseach in this area is http://xmltk.sourceforge.net/. This software there is somewhat behind in functional terms but as a free and easy solution to performing large document manipulation its good value. At the 200-300Mb level I would not rule out a XSLT as a solution although you would have to set up your environment carfully, in particularly available memory and which XSLT processor. Kev.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|