[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode, xml:lang, and variant glyphs

  • From: John Cowan <cowan@l...>
  • To: XML Dev <xml-dev@i...>
  • Date: Thu, 05 Nov 1998 16:13:27 -0500

xml lang japanese
Rick Jelliffe wrote:

> If a XML document arrives with an encoding in the XML header of SJIS it will
> have been created on a Japanese editor: in the absense of any information to
> the contrary, shouldn't it be displayed using Japanese fonts?

Perhaps as a heuristic.  But I find it very hard to swallow that
the charset encoding of a document is part of its semantics.
Would you assume that, in the absence of other evidence, a
document in ASCII was in en-US?  And if so, what assumption
would you make about a 8859-1 document?

> And if a
> document arrives in Big Five, shouldnt it be displayed in the absense of
> anything else, using a (presumably traditional but this is not clear cut
> now) Chinese font?

Note:  Contrary to a common assumption, Unicode does *not* unify
simplified hanzi with their traditional counterparts.
 
> I am very loathe to say "everything that you need to know arrives marked-up
> explicitly" in this particular case. For example, if a Japanese document
> arrives in XML, and it was originally encoded in shift-JIS, then we should
> have a suspicion that when there is a backslash character, a Yen glyph might
> be intended.

If it is really encoded in SJIS, then an \x5C byte represents
a yen character, not a backslash, and had better be treated as such
by the application.  Of course, since the document character set is
always 10646, a &#x5C; character reference means a backslash, not a
yen symbol.  Ditto for KSC with a won symbol (U+20A9).

> As far as the plain-text distinction, were laypeople actually tested for
> this, or is it the conjecture of scholars who already know all the variants
> and their connections (no disrespect intended)?

I don't know, as I am not part of the Ideographic Rapporteur Group
and find their documents very hard to follow.

> The plain-text criterion may be good for character-set people. But there is
> no reason to assume that preserving minimal readability is a criterion good
> enough for documents.

No doubt it is not.  The point is that anything that is not a plain
text distinction should be encoded using our favorite markup mechanism:
XML.

> I guess this is the PDF versus SGML debate writ small;
> should fidelity to the originating publication be the policy or should
> rendering be termined by the setup of the receiver.

In the end the receiver always controls: a variant PDF renderer
could exist, although there's no reason for it to.  Fidelity to
the originating publication is a reasonable goal, but requires
reasonable cooperation.

> And maybe it is a
> content-related thing too: the closer text is to literature or names, the
> greater the chance that the sender intends a particular glyph variant for
> the character they chose.

Very true, which is why I am interested to hear about methods for
explicitly encoding variants.
 
-- 
John Cowan	http://www.ccil.org/~cowan		cowan@c...
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.