[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Java/Unicode brain damage

  • From: John Cowan <cowan@m...>
  • To: David Brownell <david-b@p...>
  • Date: Fri, 27 Jul 2001 00:17:40 -0400 (EDT)

java unicode value
David Brownell scripsit:

> It would likely be instructive to have someone explain
> the senses in which "char" is, and isn't, a character.

A Java char is a 16-bit unsigned integral value.  Unicode characters require
21 bits of unsigned integer to fully represent them.  UTF-16 is a
representation scheme in which the Unicode characters with values
between 0 and D7FF or between E000 and FFFF, are represented by
a single 16-bit value, and the rest are represented by two
consecutive 16-bit values, one ranging from D800 to DBFF and the
other ranging from DC00 to DFFF.

Fortunately, all the commonly used Unicode characters are of the
first kind.

> Likewise the senses in which combining marks relate
> to the concept of a character ... "character" is actually
> a rather complex notion, and ISO-10646 code points
> are (as I understand) not necessarily going to be able
> to represent a "character" either (32 bits v. 16).

Indeed, "characters" in this sense (often called "graphemes"
by Unicode people, though a better term is sought) can
contain arbitrarily long strings of Unicode characters:

In European scripts, a base letter may be followed by up to
three diacritics in practice, and in theory there is no
limit at all;

Korean syllables are composed of up to three letters;

Indic syllables can have any number of basic letters
separated by viramas and non-joiner, followed by
a vowel sign and possibly a diacritic;

Tibetan script is much the same, except that the
consonants after the first are represented by
separate "subjoined" letters, so no virama is needed.

-- 
John Cowan                                   cowan@c...
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.