[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Microsoft FUD on binary XML...
Elliotte Rusty Harold wrote: > > One should keep in mind that Chinese and similar languages are quite > compressed to start with, far more so than English text is. For example, > in UTF-8 the English word "tree" takes four bytes. The Japanese word for > tree takes three bytes. > Good point, actually... I suppose that, in general, any language which uses more than 256 code points in general use is actually quite likely to be a language that uses one code point per word. So languages like Arabic, which are alphabet-based but not very compact in UTF-8 due to being composed of high-numbered characters (although I'm not sure how high so don't know if they would mainly be 2 or 3 bytes or whatever), would be better served by an encoding that mainly uses a shiftable window with single-byte characters, I guess. ABS
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|