|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode, xml:lang, and variant glyphs
Rick Jelliffe wrote: > FACT: Many times that someone says two characters are variants and should be > unified, someone else has used them not as variants. Hence the Unicode > compatability area. Unicode had to be round-trip compatible with many character sets formed on different principles. The KSC character sets, e.g. encode some hanja (Chinese character) more than once if they have more than one meaning, for the sake of making hanja-hangeul conversions easy. Nobody denies that these are the same *characters*; even their glyphs are bit for bit the same. > Oops I meant Russian and Bylorussian (or Khazak or Ukrainian) where some of > the national characters have a different form. I don't know about this. Are there really glyphic differences? I know about the character-level differences, like Ukrainian using GHE WITH STROKE except for a period from Stalin till a few years ago, when they were forced to use GHE indiscriminately for GHE and GHE WITH STROKE. I also know about Polish accents, which are properly placed lower over the character than similar-looking Western accents. That certainly is a glyph difference that fine Polish typography should take into account, but getting it wrong does not interfere with *meaning*: it is not a plaintext distinction. (See below.) A borderline case is 8859-2's use of S WITH CEDILLA and T WITH CEDILLA to represent Romanian's S and T WITH COMMA BELOW. This is finally being undone, so that Turkish can keep S WITH CEDILLA and Romanian will get a proper S WITH COMMA BELOW. (Nobody actually needs T WITH CEDILLA.) My *National Geographic* world map uses S WITH CEDILLA in Romanian place names, but you have to look closely and compare with Turkish place names to be sure. > Are you are saying that characters carry information, and never glyphs (or > character + locale + markup)? No, I am talking about the CJK case specifically. A unified font may look ugly, and certainly shouldn't be used for fine typography, but a language indicator is neither necessary nor sufficient to solve this problem. This is not to say that in documents to be finely rendered, an attribute called "cjkv-typographic-tradition" might not be useful. > if it is mathematics, then the font definitely > carries information that the unified character does not. Which is why there are a whole bunch of "letterlike symbols" for math purposes. > If you have a > multi-language dictionary or a list of names which requires exactness, the > font (or markup which selects the font) again is important. Sure, font is important when it's important. My claim is confined to this: that for plain-text purposes, Han unification does not obscure anything essential. > "Harder to read" is no criterion at all. If it is harder to read, it is > because it has lost information. Au contraire. The Unicode definition of a "plain text distinction" is one which is necessary for mere legibility. -- John Cowan http://www.ccil.org/~cowan cowan@c... You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








