|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Use of UTF-8 and UTF-16
Rick Jelliffe wrote: > For CJK (Chinese, Japanese, Korean) XML documents, where three (or six) > bytes may be used by UTF-8 instead of UCS-16's two (or four), UTF-16 files > will usually be smaller. First a correction: UTF-8 never uses six bytes for anything. The largest UTF-8 character you'll ever see is 4 bytes wide. UTF-16 files may well be smaller, but it's not a sure thing. Even Chinese XML contains lots of ASCII characters such as <, >, &, =, ", and the space. Text heavy documents like novels and stories may well be smaller. Technical documents that also contain the digits 0-9 and other non-Chinese ASCII characters may even be larger in UTF-16. Either way, the size difference is not likely to be important. the reasons for choosing UTF-8 have little to do with size. See http://www-128.ibm.com/developerworks/xml/library/x-utf8/ for a slightly longer discussion of this issue. -- Elliotte Rusty Harold elharo@m... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








