|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode, xml:lang, and variant glyphs
Rick Jelliffe wrote: > If a XML document arrives with an encoding in the XML header of SJIS it will > have been created on a Japanese editor: in the absense of any information to > the contrary, shouldn't it be displayed using Japanese fonts? Perhaps as a heuristic. But I find it very hard to swallow that the charset encoding of a document is part of its semantics. Would you assume that, in the absence of other evidence, a document in ASCII was in en-US? And if so, what assumption would you make about a 8859-1 document? > And if a > document arrives in Big Five, shouldnt it be displayed in the absense of > anything else, using a (presumably traditional but this is not clear cut > now) Chinese font? Note: Contrary to a common assumption, Unicode does *not* unify simplified hanzi with their traditional counterparts. > I am very loathe to say "everything that you need to know arrives marked-up > explicitly" in this particular case. For example, if a Japanese document > arrives in XML, and it was originally encoded in shift-JIS, then we should > have a suspicion that when there is a backslash character, a Yen glyph might > be intended. If it is really encoded in SJIS, then an \x5C byte represents a yen character, not a backslash, and had better be treated as such by the application. Of course, since the document character set is always 10646, a \ character reference means a backslash, not a yen symbol. Ditto for KSC with a won symbol (U+20A9). > As far as the plain-text distinction, were laypeople actually tested for > this, or is it the conjecture of scholars who already know all the variants > and their connections (no disrespect intended)? I don't know, as I am not part of the Ideographic Rapporteur Group and find their documents very hard to follow. > The plain-text criterion may be good for character-set people. But there is > no reason to assume that preserving minimal readability is a criterion good > enough for documents. No doubt it is not. The point is that anything that is not a plain text distinction should be encoded using our favorite markup mechanism: XML. > I guess this is the PDF versus SGML debate writ small; > should fidelity to the originating publication be the policy or should > rendering be termined by the setup of the receiver. In the end the receiver always controls: a variant PDF renderer could exist, although there's no reason for it to. Fidelity to the originating publication is a reasonable goal, but requires reasonable cooperation. > And maybe it is a > content-related thing too: the closer text is to literature or names, the > greater the chance that the sender intends a particular glyph variant for > the character they chose. Very true, which is why I am interested to hear about methods for explicitly encoding variants. -- John Cowan http://www.ccil.org/~cowan cowan@c... You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








