[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: UTF-8 vs UTF-16...? (Was: Feeling good about SML)

  • From: Tony Graham <tgraham@m...>
  • To: xml-dev@i...
  • Date: Wed, 17 Nov 1999 10:51:56 -0400 (EST)

utf 8 vs utf 16
At 17 Nov 1999 14:29 GMT, Steve Schafer wrote:
 > On 17 Nov 1999 13:24:27 +0100, you wrote:
 > >Not sure if I understand the UTF-16 bit above, but I'm reading this:
 > >        <URL:http://www.unicode.org/unicode/faq/#UTF-16 and UCS-4>
 > >to UTF-16 being able to represent the full UCS-4, which is what you
 > >say UTF-8 can do, if I interpret you correctly...?
 > Section C.3 of the Unicode 2.0 spec, paragraph 4:
 > "UTF-16 does not support the representation of all the UCS-4 code
 > space but is limited to the BMP and the next 16 planes...."

True, but that's more code values than anybody expects to ever
standardise (although that's the opinion of the same people that
thought that they'd never need more than the BMP).

All of the currently defined Unicode and ISO/IEC 10646 characters
(both people define the same characters) are in the BMP.  It won't be
long until characters are defined in Plane 1 and Plane 2 (with
possible spill-over into Plane 3), plus planes 15 and 16 are reserved
for private use.

Currently the only thing defined for the characters beyond Plane 16 of
Group 00 (i.e. beyond the characters addressable with UTF-16) are more
areas available for private use.

The fuss over UTF-8 or UTF-16 is over the number of bytes used to
represent the characters in the BMP, i.e. the currently defined
characters.  UTF-16 uses two bytes per character, and UTF-8 uses one
byte per character for the ASCII characters, two bytes per character
for not that many more characters, and three bytes per character for
most of the characters in the BMP.  Both UTF-8 and UTF-16 use four
bytes per character to represent the characters in planes 1 to 16.

(There's also UTF-32, which is four bytes per character for all the
characters that you can represent with UTF-16.)

UTF-8 is efficient if you use a lot of ASCII, e.g. if you're an
English speaker and all you use is ASCII, but it's more bytes per
character than UTF-16 for a whole lot of other scripts (plus it's more
bytes per character than an lot of current script-specific encodings).

So the issue isn't how many characters the different encodings can
represent, but how efficiently (or how uniformly) they represent the
currently defined characters.


Tony Graham
Tony Graham                            mailto:tgraham@m...
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.