[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Postel's law, exceptions
Rick Jelliffe scripsit: > They have *almost* been abstracted away: a Java "character" is UTF-16. > Some Unicode characters require more than one Java "character" to > represent then. All *implementations* of characters have one (or more) > underlying encoding. A nominal getEncoding() method on a Java 1.n > character stream even TeeWriter should always produce "UTF-16". Well, if you like. But *diversity* of encodings is lost. > This should upset no-one, because some real characters may require > more than one Unicode "character" to represent them, anyway. > Take Vietnamese, please: if I have a u with a horn accent above plus > a dot underneath [1], that is one real character (according to what > people think of as characters) but three Unicode characters, 3 UTF-16 > characters, 6 bytes of storage. Actually, you can also represent any Vietnamese letter with a single Unicode (and UTF-16) character, U+1EF1 in this case. The story with Vietnamese, for those who are curious, is that it has 12 vowel letters (a e i o u y a-breve a-circ e-circ o-circ o-horn u-horn), each of which may bear one of five tone marks (acute, grave, hook above, tilde, dot below). -- It was impossible to inveigle John Cowan <jcowan@r...> Georg Wilhelm Friedrich Hegel http://www.ccil.org/~cowan Into offering the slightest apology http://www.reutershealth.com For his Phenomenology. --W. H. Auden, from "People" (1953)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|