[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Java/Unicode brain damage
I think Tim's response was closest to hitting the issue that I was thinking about. There are lots of senses of "character". Merging some of Tim's and John's input, at least (!) these senses are in common use: - jchar (Java "char") ... ~UCS-2 character, which in very early days seems to have meant "Unicode" (1.0?); - xchar (XML Character) ... ~Unicode character, one or two "jchar" (Miles called this "uchar"); - graphemes (typographic/display) ... 1-to-N xchar. John's examples of complex graphemes (some European scripts, Korean, Indic, Tibetan, ...) are probably worth looking at in the current Unicode book, for anyone who hasn't seen that already ... :) "jchar" arrays (including java.lang.String) clearly don't talk in terms of single "character" unless you're talking in the restricted sense of "jchar" (or Win32's version of the C/C++ 'wchar_t'), or are content with: > The worrying thing is that for 99.9999999999% of all > real-world XML processing, if you pretend that a jchar > represents an xchar, you won't get in any trouble. Twelve-nines ... whoa! :) That's probably true for graphemes as well, unless you're working in scripts such as those which John mentioned. I'm not sure I'd buy twelve-nines though! > So > I bet there's a huge amount of java code out there right > now that makes this assumption. I don't think we have > much understanding now as to what flavor of breakage is > apt to occur when (if) non-BMP data starts flowing > through such code. -Tim Depending on how much work they do with those jchars, and what kind, maybe no breakage at all. Just don't assume that "character" and "jchar" are ever going to be the same. People dealing with graphemes (details of display/output) are likely very conscious of such issues already. - Dave
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|