|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Java/Unicode brain damage
David Brownell scripsit: > It would likely be instructive to have someone explain > the senses in which "char" is, and isn't, a character. A Java char is a 16-bit unsigned integral value. Unicode characters require 21 bits of unsigned integer to fully represent them. UTF-16 is a representation scheme in which the Unicode characters with values between 0 and D7FF or between E000 and FFFF, are represented by a single 16-bit value, and the rest are represented by two consecutive 16-bit values, one ranging from D800 to DBFF and the other ranging from DC00 to DFFF. Fortunately, all the commonly used Unicode characters are of the first kind. > Likewise the senses in which combining marks relate > to the concept of a character ... "character" is actually > a rather complex notion, and ISO-10646 code points > are (as I understand) not necessarily going to be able > to represent a "character" either (32 bits v. 16). Indeed, "characters" in this sense (often called "graphemes" by Unicode people, though a better term is sought) can contain arbitrarily long strings of Unicode characters: In European scripts, a base letter may be followed by up to three diacritics in practice, and in theory there is no limit at all; Korean syllables are composed of up to three letters; Indic syllables can have any number of basic letters separated by viramas and non-joiner, followed by a vowel sign and possibly a diacritic; Tibetan script is much the same, except that the consonants after the first are represented by separate "subjoined" letters, so no virama is needed. -- John Cowan cowan@c... One art/there is/no less/no more/All things/to do/with sparks/galore --Douglas Hofstadter
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








