[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode, xml:lang, and variant glyphs

  • From: John Cowan <cowan@l...>
  • To: XML Dev <xml-dev@i...>
  • Date: Thu, 05 Nov 1998 12:16:41 -0500

cedilla xml
Rick Jelliffe wrote:

> FACT: Many times that someone says two characters are variants and should be
> unified, someone else has used them not as variants. Hence the Unicode
> compatability area.

Unicode had to be round-trip compatible with many character sets formed
on different principles.  The KSC character sets, e.g. encode some
hanja (Chinese character) more than once if they have more than
one meaning, for the sake of making hanja-hangeul conversions easy.
Nobody denies that these are the same *characters*; even their glyphs
are bit for bit the same.

> Oops I meant Russian and Bylorussian (or Khazak or Ukrainian) where some of
> the national characters have a different form.

I don't know about this.  Are there really glyphic differences?
I know about the character-level differences, like Ukrainian using
GHE WITH STROKE except for a period from Stalin till a few years
ago, when they were forced to use GHE indiscriminately for GHE and
GHE WITH STROKE.

I also know about Polish accents, which are properly placed lower
over the character than similar-looking Western accents.  That
certainly is a glyph difference that fine Polish typography should
take into account, but getting it wrong does not interfere with
*meaning*: it is not a plaintext distinction.  (See below.)

A borderline case is 8859-2's use of S WITH CEDILLA and T WITH
CEDILLA to represent Romanian's S and T WITH COMMA BELOW.  This is
finally being undone, so that Turkish can keep S WITH CEDILLA and
Romanian will get a proper S WITH COMMA BELOW.  (Nobody actually
needs T WITH CEDILLA.)  My *National Geographic* world map uses
S WITH CEDILLA in Romanian place names, but you have to look closely
and compare with Turkish place names to be sure.
 
> Are you are saying that characters carry information, and never glyphs (or
> character + locale + markup)?

No, I am talking about the CJK case specifically.  A unified font
may look ugly, and certainly shouldn't be used for fine typography,
but a language indicator is neither necessary nor sufficient to
solve this problem.

This is not to say that in documents to be finely rendered, an
attribute called "cjkv-typographic-tradition" might not be
useful.

> if it is mathematics, then the font definitely
> carries information that the unified character does not.

Which is why there are a whole bunch of "letterlike symbols" for
math purposes.

> If you have a
> multi-language dictionary or a list of names which requires exactness, the
> font (or markup which selects the font) again is important.

Sure, font is important when it's important.  My claim is confined
to this: that for plain-text purposes, Han unification does not
obscure anything essential.
 
> "Harder to read" is no criterion at all. If it is harder to read, it is
> because it has lost information.

Au contraire.  The Unicode definition of a "plain text distinction"
is one which is necessary for mere legibility.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@c...
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.