[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Non-Unicode Character Sets

  • From: John Cowan <cowan@l...>
  • To: xml-dev@i...
  • Date: Sat, 29 Jan 100 14:20:32 -0500 (EST)

non unicode character set
Paul Prescod scripsit:

> I am told that conversion of some character sets through Unicode is
> lossy and cannot be round-tripped. But it occurs ot me that as long as
> one has the private use area, "unknown" characters can always be
> preserved.

Mappings have to serve various purposes: not just round-trippability,
which could be achieved by any arbitrary 1-1 mapping, but also
usefulness.  Not all character set standards agree on what counts
as a character, as opposed to a mere variant that need not be
represented.  Most of Unicode's compatibility characters were added
in order to satisfy these rather disjoint needs.

For example, the Korean standard KSC 5601 provides distinct codepoints
for different "readings" of Chinese characters (hanja) used in Korean writing.
The great bulk of all Chinese characters have only a single reading in
Korean (unlike Japanese), but some few have two, three, or more.
Providing distinct codepoints eased mappings between Korean hanja
and native Korean writing, as each hanja could be given a unique

Unicode, however, unified Chinese characters into a single repertoire.
In order to permit round-tripping between KSC 5601 and Unicode,
compatibility characters were added to Unicode for each of the
multi-mapped hanja.

The character set CNS 11643 was not given this treatment, however,
and its (few) multiple mappings do not have Unicode equivalents.  Therefore,
round-tripping is not possible.

> Is there any character set in the world that cannot be considered a
> "subset of Unicode"?

The CCCII standard and its superset EACC (aka ANSI Z39.64) have
many multiple mappings and will not roundtrip through Unicode.

John Cowan                                   cowan@c...
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom@i... the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email@your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.