|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: To continue parsing after a fatal error.
Actually, I'd say it is more likely that the XML is incorrectly identified as UTF-8 (or lacks an encoding declaration), and is not truly UTF-8. This is an extremely common error. Character encoding issues are poorly understood by most developers. Try a simple experiment: make sure the document has the following declaration at the top: <?xml version="1.0" encoding="ISO-8859-1"?> See if that fixes the problem. It probably will (but if it doesn't, I'm probably wrong and Joshua probably right regarding the problem). Then tell the person who sent you the XML to read the following: http://msdn.microsoft.com/library/default.asp?URL=/library/en-us/dnxml/html/ xmlencodings.asp Although watch out for the typos that show incorrect syntax for HTTP headers. They show this as an example: Content-Type: text/html; charset:ISO-8859-1; The correct syntax is: Content-Type: text/html; charset=ISO-8859-1 (Maybe Joshua or Julia can use their influence at Microsoft to get these typos in an otherwise very useful article corrected?) -----Original Message----- From: Joshua Allen [mailto:joshuaa@m...] Sent: Tuesday, October 23, 2001 12:40 PM To: Anoop A V; xml-dev@l... Cc: Julia Jia Subject: RE: To continue parsing after a fatal error. This error should occur with any conforming XML processor. It is quite likely that the error is caused by a control character in the low ASCII range. The only way to avoid the problem is to clean up the XML on the way in, before it is processed by MSXML. And unfortunately I am not aware of a way to do this without writing code to pipe the input stream through a scrubber before passing it to MSXML. Julia will know if there are any code samples existing today (I doubt it). Thanks, Joshua > -----Original Message----- > From: Anoop A V [mailto:anoop_scorpio@h...] > Sent: Tuesday, October 23, 2001 10:51 AM > To: xml-dev@l... > Subject: To continue parsing after a fatal error. > > Hi, > I have an 800 MB file which I need to parse. When I do this using MSXML > SAX > parser, I get a fatal error with the message "Invalid character found in > text content". And the parsing will be stopped. But I need to continue > parsing the file even if an invalid character is met. I don't mind if that > particular node(s) is skipped. But I need to parse the whole file. This > file > is not under my control, so there is no question of my being able to edit > this file and remove the invalid characters. Can anybody help? > > Thanks. > Anoop.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








