[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: UTF-8 vs UTF-16...?
According to the latest Unicode book (is it version 2.0? Or 3.0?) UTF-8 does not allow you to encode more than the first 17 planes of ISO 10646. If I remember correctly, the formats are (omitting leading output zero bits): one byte: 0xxxxxxx -> xxxxxxx two bytes: 110yyyyy 10xxxxxx -> yyy yyxxxxxx three bytes: 1110zzzz 10yyyyyy 10xxxxxx -> zzzzyyyy yyxxxxxx four bytes: 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx -> wwwww zzzzyyyy yyxxxxxx where wwwww is uuuu+1. (These characters are encoded with surrogate pairs in UTF-16.) I may be mistaken about this one; my book is at home. No five-byte or longer sequences are listed. No valid sequences starting with more than four ones are listed. Presumably these two omissions correspond, and an extended UTF-8 with these additions would allow you to handle larger character sets. It may be that other standards actually specify such an extended UTF-8. So "bigger character range" is probably not a valid reason for wanting to use UTF-8 -- quite aside from the question of whether you really need more than the million or so characters UTF-16 can encode -- because UTF-8 decoders implemented according to Unicode's spec will choke if you try to encode bigger characters in it. -- <kragen@p...> Kragen Sitaker <http://www.pobox.com/~kragen/> The Internet stock bubble didn't burst on 1999-11-08. Hurrah! <URL:http://www.pobox.com/~kragen/bubble.html> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|