[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Unicode, xml:lang, and variant glyphs
Rick Jelliffe wrote: > Not so. The additions are use composed of standard radicals and > combinations. There are various projects around (such as C.C.Hsieh in > Taiwan) to figure out encodings to "spell" Han ideographs by component > radicals. I'm glad to hear about this; I find the IRG archives utterly impenetrable. > I guess the point is that John thinks that if an XML system can produce > characters which a recipient system cannot process, because it does not use > ISO 10646, that is not something that CDATA sections should be used to > address. I think his reasons are that he cannot see it in the spec. [...] > I think a lot > of people now think that any non-ISO10646 system is for losers anyway > (except for whatever character set they use, probably). Well, actually I would say the latter rationale has more effect on me than the former, if I must choose either. It just seemed to me that using CDATA sections to constrain the behavior of editors was not particularly user-friendly; if the user wants a character, let her have it, using a character reference if possible. In general, transcoding XML documents involves inserting NCRs as needed, unless the target is UTF-8 or UTF-16. > The primary purpose of xml:lang, as far as I am concerned, should be to > convey the information lost by ISO 10646 unification: where the Japanese and > Chinese glyphs Actually, the problem isn't that clearcut. As John Jenkins posted to the Unicode list last year: # FACT. It is true that some Unihan characters are typically written # differently within the Japanese, Taiwanese, Korean, and Mainland Chinese # typographic traditions. # # FACT. These differences of writing style are within the general range of # allowable differences within each typographic tradition. # # E.g., the official "Taiwanese" glyph for U+8349 ("grass") per ISO/IEC # 10646 uses four strokes for the "grass" radical, whereas the PRC, # Japanese, and Korean glyphs use three. As it happens, Apple's LiSung # Light font for Big Five (which follows the "Taiwanese" typographic # tradition) uses three strokes. # # (This is easily confirmed by accessing # http://www.unicode.org/unihan/unihan.acgi$8349.) # # FACT. Japanese users prefer to see Japanese text written with "Japanese" # glyphs. # # FACT. It is also acceptable to Japanese users to see Chinese text # written with "Japanese" glyphs. # # E.g., I just borrowed from Lee Collins a standard Japanese dictionary # which quotes Chinese authors (e.g., Mencius) to show how a character is # used. When doing so, they use "Japanese" glyphs, not Chinese ones. # # In particular, it is acceptable within Japanese typography for a small # stretch of Chinese quoted in a predominantly Japanese text to be written # with "Japanese" glyphs. # # FACT. Han unification allows for the possibility that a Japanese user # might be required to use a Chinese font to display some Japanese text # (e.g., if it uses a rare kanji). # # FACT. Ditto for JIS or an ISO 2022-based solution. # # FACT. Unicode doesn't include all the characters in actual use in Japan # today, particularly for personal names. # # FACT. Neither does JIS or an ISO 2022-based solution. There are vendor # sets which include many of these characters, and Unicode is working with # the IRG and East Asian national bodies to add them. > (or Polish and Russian) How's that again? Polish uses Latin, Russian uses Cyrillic! What could possibly count as a unification between these two?? *Nobody* thinks that LATIN LETTER A and CYRILLIC LETTER A should be unified.... > for a unified character differ, then > I think transcoding and unifying the characters into ISO 10646 can lose > information unless the xml:lang attribute is set. It doesn't lose information about meaning. It may make characters harder to read, but the distinction is one of typographic tradition, not language, and can cross languages. -- John Cowan http://www.ccil.org/~cowan cowan@c... You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|