[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Java/Unicode brain damage

  • From: Miles Sabin <msabin@i...>
  • To: xml-dev@l...
  • Date: Fri, 27 Jul 2001 09:17:34 +0100

java char
David Brownell wrote,
> Miles Sabin wrote,
> > A Java 'char' is a 16 bit data type, so it simply isn't possible 
> > for it to directly represent a Unicode character. 
>
> Could you elaborate?

[I'll use Tims 'jchar' and 'uchar']

Tim and Johns replies are exactly right as far as a single jchar is 
concerned: a single jchar in isolation can't represent uchars outside 
the BMP, and it can represent non-uchars (eg. surrogate values).

But of course jchars often don't appear in isolation. In char[]s and
in java.lang.Strings they appear in sequences, and in those cases
pairs of adjacent jchars can represent non-BMP uchars. Pairs of jchars
can also represent all sorts of other nonsense too, but that's not
necessarily a problem unless you absolutely insist that semantic
constraints be enforced programatically.

> The word "character" is heavily overloaded, but I think it's clear 
> that in at least one sense a Java "char" _is_ what folk call a 
> "character".  That's just how the word is used, even if it's 
> arguably sloppy usage for other contexts.
>
> It would likely be instructive to have someone explain the senses in 
> which "char" is, and isn't, a character.

I don't think that can be done. A jchar is a 16 bit unsigned scalar.
It's association with a uchar is pretty much conventional, although
that association is almost always made. There's no way of telling from 
just the syntax of a Java program whether or not a jchar (or jbyte, or 
jint, or anything else for that matter) is or isn't being used to
represent a uchar. To tell that you have to know what the program
means.

So I think it boils down to this: a jchar is a 16 bit unsigned scalar 
which is typically appropriate for representing a BMP uchar; and jchar 
sequences are typically appropriate for representing uchar sequences. 
With the proviso that some jchars (resp. jchar sequences) don't
represent legal uchars (legal uchar sequences).

Oh, I guess I should point out that the above is my view, and doesn't
necessarily represent that of the JSR 51 EG (or anyone else, for that
matter ;-)

Cheers,


Miles

-- 
Miles Sabin                                     InterX
Internet Systems Architect                      27 Great West Road
+44 (0)20 8817 4030                             Middx, TW8 9AS, UK
msabin@i...                               http://www.interx.com/


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.