[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Blueberry (non-ASCII name characters in Japan)

  • From: Joel Rees <rees@s...>
  • To: Rick Jelliffe <ricko@a...>
  • Date: Wed, 11 Jul 2001 13:13:52 +0900

xml encoding japan
Thanks for the virtual links, Rick.

600 fundamental components and 16 composition functions? That's not going to
help in developing an extensible character encoding that computers can use.
(Or, rather, that humans can use on computers.)

crud. (pardon me.)

I had hoped that the reduction of pin-yin to the keyboard was a good
indication that the 230 or so radicals I am familiar with from Japanese
could be further reduced to good effect. I know the ideographs have been
built with ad-hoc rules, and that systemizations from one era have been
overwritten by systemizations from the next, but I keep hoping.

Actually, I have some vague ideas about formalizing a dual encoding -- the
simple scalar encoding (thus, a single code point) would be used to
reference pre-composed/pre-rendered characters, but each character would
also have a standard vector encoding, a string of position-code:radical
pairs. To send a non-standard character with a document, it would be defined
in a document header in three parts: the vector encoding and an assignment
to an arbitrary scalar code drawn from a private use area, together with a
graphic description of the non-standard character as it should be composed.

And then we start mucking around with parsing problems, and it occurs to me
that we need a fourth part for the definition, a set of attributes for the
non-standard character, to tell the parser how to parse it. By this time I
get to feeling giddy, like I'm walking on a high-wire, and I give up. Well,
sometimes I get far enough to think about using
position:orientation:scaling:component 4-tuples, and to thinking that
several compositing schemes should be supported. And then my job keeps
calling me back.

Color me a confused idealist.

Joel Rees

PS: I have some friends who insist that the Japanese had Kanji before it was
(re-?) introduced from China. They use some historical oddities to argue
that Kanji should be considered a separate and independent writing system
from the Han characters. Isn't it wonderful to live in a world with lots of
friendly holes to fall into?

----- Original Message -----
From: "Rick Jelliffe" <ricko@a...>
To: <xml-dev@l...>
Cc: "www-xml-blueberry-comments" <www-xml-blueberry-comments@w...>
Sent: Tuesday, July 10, 2001 9:17 PM
Subject: Re: XML Blueberry (non-ASCII name characters in Japan)


> In Unicode 3.1 there are added special function characters for allowing
new
> characters to be composed positionally from parts.  These are intended for
> very rare or new characters only.
>
> There has been several thousand of years of research into what the
primitive
> components of Han ideographs are.  It is only now that we have computers
and
> large databases of characters that it is feasible to try out different
> alternatives.  At Academia Sinica, for example, my friend Prof. C.C. Hsieh
> devised a system with about 600 components and I think 16 composition
> functions (side-by-side) which can represent about 98% of the Hanyu
lexicon.
>
> Unicode went with a simpler set of functions, but at the expense that the
> functions allow some ambiguity: there may be more than one way to
represent
> the same character.  This may be fine for text, but not good for names
where
> normalization and comparison is their destiny.
>
> (I don't think these function characters are suitable for use in names,
> b.t.w.)
>
> Cheers
> Rick Jelliffe


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.