|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] UTF-8 space efficiency
UTF-8 requires: - 1 byte each for the ASCII characters (U+0000 to U+007F), known as the "Basic Latin" block in Unicode. This is better than UTF-16, and no worse than ASCII. - 2 bytes each for characters from U+0080 to U+07FF. This includes Latin-1, Latin Extended-A, Latin Extended-B, the IPA extensions, spacing modifier letters, combining diacritical marks, Greek, Cyrillic, Armenian, Hebrew, and Arabic. This is no worse than UTF-16. - 3 bytes each for U+0800 to U+FFFF. This is worse than UTF-16, and it covers the vast majority of Unicode characters in use today. - 4 bytes each for U+10000 to U+10FFFF. This is no worse than UTF-16, and it potentially covers many more characters than in all the previous ranges put together, but it is currently nearly unused. In the future, it is likely to be used only for extremely unusual scripts, such as Sumerian cuneiform, the Tengwar of Feanor, Egyptian hieroglyphics, hieratic, and demotic. So I doubt this will be of much import to many people. UTF-8 has some other advantages over UTF-16. It has no byte-order ambiguity, and for many purposes -- such as string-searching -- it can be treated as simply a string of bytes. And it's "filesystem-safe", which means you can use it in filenames without modifying the filesystem to be Unicode-aware. >From my provincial American point of view, it looks like a space win, too -- most of my text is ASCII anyway. Obviously if I were mostly using one of the numerous scripts in the UTF-8-disadvantaged area, I might feel differently. But to tell the truth, most of the space on my disk and bandwidth on my modem isn't consumed by text, anyway. A 50% increase in the size of all that text wouldn't bother me much. -- <kragen@p...> Kragen Sitaker <http://www.pobox.com/~kragen/> The Internet stock bubble didn't burst on 1999-11-08. Hurrah! <URL:http://www.pobox.com/~kragen/bubble.html> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








