|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Processing huge XML files
Hi all, Really thanks for your valuable advice. Let me give you more info for my case. In fact, we are required to access the different parsed data values in the file at high performance although we know the access patterns for our specific application. (I mean the access is not totally random.) So it's good to have an efficient persistent data structure for the parsed XML data file. At best, the data structure is generic (to the XML schema and access patterns) enough to support fast data access. But at least, we are looking for a method to implement a data structure customized for a specific XML schema and the defined access pattern. I'm looking at different technologies that some of you have suggested. Other suggestions are most welcome. Thanks again, Thomas Rick Jelliffe wrote: > From: "Michael Kay" <michael.h.kay@n...> > >>But really, when you get above 50Mb or so, you need to start looking at >>XML databases. > > > Another approach is to use steaming languages such as Perl and OmniMark, > (and, I guess, Python?) especially if you are not updating the data just extracting information. > > Of course, you may need to take several passes. And you may need to > have one pass of the data generate a program to be used for then next > pass, a venerable technique that is often overlooked. But multiple > passes with streaming languages is the way that many large scale > publishing systems work. A lot can depend on whether your document > has an order that is amenable to your application: storing metadata > and keys before the data in particular. > > A very typical way of constructing streaming programs on large > data sets is to do two passes: > 1) Run over the data and extract all information that will be needed for > decisions that otherwise require random access or lookahead. > 2) Run over the data and perform the extractions/analysis, using the > decision points. > > Cheers > Rick Jelliffe > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> > -- Thomas Y.T. LEE Chief Technology Officer Center for E-Commerce Infrastructure Development (CECID) Department of Computer Science and Information Systems The University of Hong Kong E-mail: ytlee@c... URL: http://www.cecid.hku.hk Tel: +852 22415388 Fax: +852 25474611 Room 301, Chow Yei Ching Building Pokfulam Road, Hong Kong SAR, China
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








