[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: To continue parsing after a fatal error.

  • From: Michael Brennan <Michael_Brennan@A...>
  • To: Anoop A V <anoop_scorpio@h...>, xml-dev@l...
  • Date: Tue, 23 Oct 2001 14:01:18 -0700

julia jia
Actually, I'd say it is more likely that the XML is incorrectly identified
as UTF-8 (or lacks an encoding declaration), and is not truly UTF-8. This is
an extremely common error. Character encoding issues are poorly understood
by most developers.

Try a simple experiment: make sure the document has the following
declaration at the top:
<?xml version="1.0" encoding="ISO-8859-1"?>

See if that fixes the problem. It probably will (but if it doesn't, I'm
probably wrong and Joshua probably right regarding the problem).

Then tell the person who sent you the XML to read the following:
http://msdn.microsoft.com/library/default.asp?URL=/library/en-us/dnxml/html/
xmlencodings.asp

Although watch out for the typos that show incorrect syntax for HTTP
headers. They show this as an example:
Content-Type: text/html; charset:ISO-8859-1;

The correct syntax is:
Content-Type: text/html; charset=ISO-8859-1

(Maybe Joshua or Julia can use their influence at Microsoft to get these
typos in an otherwise very useful article corrected?)

-----Original Message-----
From: Joshua Allen [mailto:joshuaa@m...]
Sent: Tuesday, October 23, 2001 12:40 PM
To: Anoop A V; xml-dev@l...
Cc: Julia Jia
Subject: RE:  To continue parsing after a fatal error.


This error should occur with any conforming XML processor.  It is quite
likely that the error is caused by a control character in the low ASCII
range.  The only way to avoid the problem is to clean up the XML on the
way in, before it is processed by MSXML.  And unfortunately I am not
aware of a way to do this without writing code to pipe the input stream
through a scrubber before passing it to MSXML.  Julia will know if there
are any code samples existing today (I doubt it).

Thanks,
Joshua




> -----Original Message-----
> From: Anoop A V [mailto:anoop_scorpio@h...]
> Sent: Tuesday, October 23, 2001 10:51 AM
> To: xml-dev@l...
> Subject:  To continue parsing after a fatal error.
> 
> Hi,
> I have an 800 MB file which I need to parse. When I do this using
MSXML
> SAX
> parser, I get a fatal error with the message "Invalid character found
in
> text content". And the parsing will be stopped. But I need to continue
> parsing the file even if an invalid character is met. I don't mind if
that
> particular node(s) is skipped. But I need to parse the whole file.
This
> file
> is not under my control, so there is no question of my being able to
edit
> this file and remove the invalid characters. Can anybody help?
> 
> Thanks.
> Anoop.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.