|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Java/Unicode brain damage
David Brownell wrote, > Miles Sabin wrote, > > A Java 'char' is a 16 bit data type, so it simply isn't possible > > for it to directly represent a Unicode character. > > Could you elaborate? [I'll use Tims 'jchar' and 'uchar'] Tim and Johns replies are exactly right as far as a single jchar is concerned: a single jchar in isolation can't represent uchars outside the BMP, and it can represent non-uchars (eg. surrogate values). But of course jchars often don't appear in isolation. In char[]s and in java.lang.Strings they appear in sequences, and in those cases pairs of adjacent jchars can represent non-BMP uchars. Pairs of jchars can also represent all sorts of other nonsense too, but that's not necessarily a problem unless you absolutely insist that semantic constraints be enforced programatically. > The word "character" is heavily overloaded, but I think it's clear > that in at least one sense a Java "char" _is_ what folk call a > "character". That's just how the word is used, even if it's > arguably sloppy usage for other contexts. > > It would likely be instructive to have someone explain the senses in > which "char" is, and isn't, a character. I don't think that can be done. A jchar is a 16 bit unsigned scalar. It's association with a uchar is pretty much conventional, although that association is almost always made. There's no way of telling from just the syntax of a Java program whether or not a jchar (or jbyte, or jint, or anything else for that matter) is or isn't being used to represent a uchar. To tell that you have to know what the program means. So I think it boils down to this: a jchar is a 16 bit unsigned scalar which is typically appropriate for representing a BMP uchar; and jchar sequences are typically appropriate for representing uchar sequences. With the proviso that some jchars (resp. jchar sequences) don't represent legal uchars (legal uchar sequences). Oh, I guess I should point out that the above is my view, and doesn't necessarily represent that of the JSR 51 EG (or anyone else, for that matter ;-) Cheers, Miles -- Miles Sabin InterX Internet Systems Architect 27 Great West Road +44 (0)20 8817 4030 Middx, TW8 9AS, UK msabin@i... http://www.interx.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








