|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Unicode, xml:lang, and variant glyphs
Rick: > > Are you are saying that characters carry information, and never > > glyphs (or character + locale + markup)? > From: John Cowan > No, I am talking about the CJK case specifically. A unified font > may look ugly, and certainly shouldn't be used for fine typography, > but a language indicator is neither necessary nor sufficient to > solve this problem. But I am not thinking "What is sufficient?", I am thinking "Is something being lost here?" and "how nice is the thing being lost?" If a XML document arrives with an encoding in the XML header of SJIS it will have been created on a Japanese editor: in the absense of any information to the contrary, shouldn't it be displayed using Japanese fonts? And if a document arrives in Big Five, shouldnt it be displayed in the absense of anything else, using a (presumably traditional but this is not clear cut now) Chinese font? In XML terms, if there is no xml:lang in effect, and the sender wrote using a different script variant to the receiver, mightn't heuristic defaulting of xml:lang based on originating character set (and, for example, originating country in the URL) be the desired behaviour for some? And if it is desired in that circumstance, wouldn't it be useful to preserve that information when cutting-and-pasting documents or transcluding portions. (The XML encoding PI presumably will not survive in the grove of every document, so I am not sure it could reliably be available in the case of transcluded data.) I am very loathe to say "everything that you need to know arrives marked-up explicitly" in this particular case. For example, if a Japanese document arrives in XML, and it was originally encoded in shift-JIS, then we should have a suspicion that when there is a backslash character, a Yen glyph might be intended. I know that it would be better to encode the document properly first, but it seems that a policy of choosing a variant font based on the sending encoding (or for that matter, the country in the URL) is just as legitimate a default policy as just using the current-locale's variant font at the receiver. > My claim is confined > to this: that for plain-text purposes, Han unification does not > obscure anything essential. As far as the plain-text distinction, were laypeople actually tested for this, or is it the conjecture of scholars who already know all the variants and their connections (no disrespect intended)? As is the case with fraktur for English readers, if you have not been taught the characters you cannot read them, and if you have been taught them you cannot be tested for whether you can read them. The plain-text criterion may be good for character-set people. But there is no reason to assume that preserving minimal readability is a criterion good enough for documents. I guess this is the PDF versus SGML debate writ small; should fidelity to the originating publication be the policy or should rendering be termined by the setup of the receiver. And maybe it is a content-related thing too: the closer text is to literature or names, the greater the chance that the sender intends a particular glyph variant for the character they chose. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








