[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: 5-byte UTF-8 encoding not supported
At 22:53 16-12-2001, Neeraja Divakaruni wrote: >We are getting an exception while parsing the XML document using Oracle >parser "parseCLOB" procedure.. The exception is "5-byte UTF-8 encoding >not supported". > >One more observation is the other foreign characters like æ , Æ , Ø ( >these are also danish characters) etc we are getting ane exception " >Invalid UTF8 encoding". >What can be the possible causes for these two exceptions ?? Please do >respond.. It sounds like the parser is trying to parse the CLOB as UTF-8 despite its actual encoding. The character "ø" is 0xF8 (11111000) in ISO 8859-1, which would be interpreted as the start of a 5-byte UTF-8 sequence; the other characters you mention are not valid UTF-8 sequence starters. What code are you using to parse the CLOB and to set the encoding? I suspect that, rather than simply inserting an XML declaration in the CLOB, you need to actually instruct the parser what encoding to use for reading the input. ~Chris -- Christopher R. Maden, Principal Consultant, HMM Consulting Int'l, Inc. DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training <URL: http://www.hmmci.com/ > <URL: http://crism.maden.org/consulting/ > PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|