|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Microsoft FUD on binary XML...
Alaric B Snell <alaric@a...> wrote at Fri, 21 Nov 2003 13:36:24 +0000: > Tony Graham wrote: > > > Changing UTF-16 Chinese to UTF-8 means a 50% size increase for the > > Chinese characters in the Basic Multilingual Plane (i.e., most of the > > Chinese characters in the message) since as UTF-16, one Chinese > > character is 16 bits, and as UTF-8, one Chinese character is three > > bytes. > > Exactly - efficient representation of Unicode text currently sadly > involves the user or the application doing a frequency analysis and > deciding whether to use UTF-8 or UTF-16... I think very, very, few do > this right now; UTF-8 seems the almost ubiquitous choice, mainly due to > the software industry being driven from places that use the Roman alphabet. > > Perhaps we need a new UTF that loses many of UTF-8s nice properties with > respect to lexical sorting and so on, but is less discriminatory against > character sets that live far into the BMP, perhaps working along the > lines of: For a moment there, I thought you were inventing SCSU [1]. You might also be interested in BOCU-1 [2]. Regards, Tony Graham ------------------------------------------------------------------------ XML Technology Center - Dublin Sun Microsystems Ireland Ltd Phone: +353 1 8199708 Hamilton House, East Point Business Park, Dublin 3 x(70)19708 [1] http://www.unicode.org/reports/tr6/ [2] http://www.unicode.org/notes/tn6/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








