[Home] [By Thread] [By Date] [Recent Entries]


Bob Foster wrote:

> I'm puzzled. What is the "aha moment" here? Your point seems to be that Java
> char != Unicode character. True. Exactly like UTF-8 octet != Unicode
> character. The fact that half a surrogate pair is not a Unicode character
> doesn't seem like breaking news.

The 'aha' moment was the point that it's safer to use strings rather 
than characters as the primitives of your API, because what to a human 
may look like a single character may be a composition of several unicode 
characters, which looks like a string to the program.

> Do you mean to say that use of UTF-16 character encoding in a programming
> language is broken as designed? In the perfect language of your own design,
> would you have the "char" type be 32 bits? Is that what this is all about?

I'm in the middle of a series of essays on this over at 'ongoing'
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member