[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: To continue parsing after a fatal error.
Hi Dave, I guess the diagnostic is correct. And I think I should either preprocess the XML as you have said, or split the file into smaller files as Tim Bray has suggested. I hope the 'Out of memory' situation won't arise. It was precisely for this reason that I decided to use SAX rather than DOM. In fact, using DOM for an 800 MB file is unthinkable. Thanks for your suggestion. Anoop. ----- Original Message ----- From: David Brownell <david-b@p...> To: Anoop A V <anoop_scorpio@h...> Sent: Wednesday, October 24, 2001 12:26 AM Subject: Re: To continue parsing after a fatal error. > 800 MB ... are you sure the diagnostic is correct? That's a > pretty big file, and C/COM/... level code could easily give > a bogus diagnostic. I'd expect "ran out of memory". > > If the file has a correct XML declaration, with the right text > encoding, then you certainly need to tell whoever produces > the file that they've got bugs ... as a rule, MSXML has bugs > that others don't, so if even MSXML rejects that file, then > whoever is making that file probably has some big problems. > > Assuming they're not doing their job, however, you should > still be able to solve the problem by preprocessing the XML > to strip out illegal characters. Think of it as another processing, > the first of N scans over the data just finds and removes those > characters. > > It's actually a requirement of the XML spec that once a > fatal error is found, no more data will ever be reported > (only additional errors, and that's not required). > > - Dave > > > ----- Original Message ----- > From: "Anoop A V" <anoop_scorpio@h...> > To: <xml-dev@l...> > Sent: Tuesday, October 23, 2001 10:51 AM > Subject: To continue parsing after a fatal error. > > > > Hi, > > I have an 800 MB file which I need to parse. When I do this using MSXML SAX > > parser, I get a fatal error with the message "Invalid character found in > > text content". And the parsing will be stopped. But I need to continue > > parsing the file even if an invalid character is met. I don't mind if that > > particular node(s) is skipped. But I need to parse the whole file. This file > > is not under my control, so there is no question of my being able to edit > > this file and remove the invalid characters. Can anybody help? > > > > Thanks. > > Anoop. > > > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > > > > > > ----------------------------------------------------------------- > > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > > initiative of OASIS <http://www.oasis-open.org> > > > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > > > To subscribe or unsubscribe from this elist use the subscription > > manager: <http://lists.xml.org/ob/adm.pl> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|