|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Parsing Kanji (Japanese) characters...
From: <nizar.hirani@c...> > Is the SAX Parser able to handle Kanji characters? Any help/pointers are > appreciated. The problem is probably that your document is encoded in an encoding that uses escape sequences. When it is read using a different encoding (e.g. the default encoding of UTF-8) then the ESC character is correctly flagged as being a problem. There are three main Japanese encodings in common use: ISO 2022, Shift JIS and EUC: all of these have various variants and extensions, and also documents can be in Unicode encodings, which also have variants. It is a very good thing that XML systems can often detect that your data has been mislabelled, isn't it! Otherwise if you add the wrong data to a database, that database will have been corrupted. Your text is probably encoded using ISO-2022-JP (JIS) encoding. If you are working with Far Eastern data much, I recommend you read Ken Lunde's "Chinese Japanese Korean Vietnamese Information Processing" from O'Reilly. It is an amazing book. On the WWW see http://lfw.org/text/jp.html#iso2022 Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








