[Home] [By Thread] [By Date] [Recent Entries]
There is a lot of confusion arising from sentences that use "character" without clarifying whether it is the glyph, collatable unit, Unicode code, or UTF-* code that is meant. The confusion is one of wrong expectations. You are right that a Java Character is a UTF-16 code. But making Java Characters into 24-or 32-bit codes would still not make them characters in the plain English sense (which is closest to "collatable units"). A combining umlaut is not really a character for example; radicals are not ndependent characters, though they may have codepoints. So, paradoxically, an API that handles real characters properly probably never has arguments or return results of Character (or something that is 8, 16, 24, or 32 bits) but instead uses String (and its variants). One reason I like normalization is that it removes as many combining character sequences as possible: making Java Character = collatable unit more. So surrogates may add a level of handling for Java Characters, but they don't add any more complexity for Java characters (taken in the plain English/collatable unit sense). The semantics of String and Character in the Java documentation may indeed need to be updated now that Unicode goes beyond BMP (reminiscent of the banks of the Rhine overflowing in Wagner.) Probably the Java documentation should be proofread to check that "character" is never used when "Character" is meant. But it is not that the length of a String is no longer reliably the number of characters: it never was--it is the number of Characters, to labour my point. Anyway, my post was that Java, one of the leaders of the pack, is still catching up, not that it has arrived at Unicode 4.0. Hence as long as XML 1.1 is out there in PR for any early implementers who need it, the critical path for getting better Unicode 3.2+ support in XML is not an XML 1.1 REC (nor even IRIs) but API/platform infrastructure. That is why I suggested that XML 1.1 is not urgent. I think Tim read my "only just reaching" as "has already reached" which is not what I intended to say. Cheers Rick Jelliffe
|

Cart



