[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: To continue parsing after a fatal error.

  • From: Anoop A V <anoop_scorpio@h...>
  • To: David Brownell <david-b@p...>
  • Date: Wed, 03 Oct 2001 10:46:12 +0530

strip illegal characters
Hi Dave,
    I guess the diagnostic is correct. And I think I should either
preprocess the XML as you have said, or split the file into smaller files as
Tim Bray has suggested. I hope the 'Out of memory' situation won't arise. It
was precisely for this reason that I decided to use SAX rather than DOM. In
fact, using DOM for an 800 MB file is unthinkable.

Thanks for your suggestion.
Anoop.


----- Original Message -----
From: David Brownell <david-b@p...>
To: Anoop A V <anoop_scorpio@h...>
Sent: Wednesday, October 24, 2001 12:26 AM
Subject: Re:  To continue parsing after a fatal error.


> 800 MB ... are you sure the diagnostic is correct?  That's a
> pretty big file, and C/COM/... level code could easily give
> a bogus diagnostic.  I'd expect "ran out of memory".
>
> If the file has a correct XML declaration, with the right text
> encoding, then you certainly need to tell whoever produces
> the file that they've got bugs ... as a rule, MSXML has bugs
> that others don't, so if even MSXML rejects that file, then
> whoever is making that file probably has some big problems.
>
> Assuming they're not doing their job, however, you should
> still be able to solve the problem by preprocessing the XML
> to strip out illegal characters.  Think of it as another processing,
> the first of N scans over the data just finds and removes those
> characters.
>
> It's actually a requirement of the XML spec that once a
> fatal error is found, no more data will ever be reported
> (only additional errors, and that's not required).
>
> - Dave
>
>
> ----- Original Message -----
> From: "Anoop A V" <anoop_scorpio@h...>
> To: <xml-dev@l...>
> Sent: Tuesday, October 23, 2001 10:51 AM
> Subject:  To continue parsing after a fatal error.
>
>
> > Hi,
> > I have an 800 MB file which I need to parse. When I do this using MSXML
SAX
> > parser, I get a fatal error with the message "Invalid character found in
> > text content". And the parsing will be stopped. But I need to continue
> > parsing the file even if an invalid character is met. I don't mind if
that
> > particular node(s) is skipped. But I need to parse the whole file. This
file
> > is not under my control, so there is no question of my being able to
edit
> > this file and remove the invalid characters. Can anybody help?
> >
> > Thanks.
> > Anoop.
> >
> > _________________________________________________________________
> > Get your FREE download of MSN Explorer at
http://explorer.msn.com/intl.asp
> >
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
>
>


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.