[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Blueberry (non-ASCII name characters in Japan)
Thanks for the virtual links, Rick. 600 fundamental components and 16 composition functions? That's not going to help in developing an extensible character encoding that computers can use. (Or, rather, that humans can use on computers.) crud. (pardon me.) I had hoped that the reduction of pin-yin to the keyboard was a good indication that the 230 or so radicals I am familiar with from Japanese could be further reduced to good effect. I know the ideographs have been built with ad-hoc rules, and that systemizations from one era have been overwritten by systemizations from the next, but I keep hoping. Actually, I have some vague ideas about formalizing a dual encoding -- the simple scalar encoding (thus, a single code point) would be used to reference pre-composed/pre-rendered characters, but each character would also have a standard vector encoding, a string of position-code:radical pairs. To send a non-standard character with a document, it would be defined in a document header in three parts: the vector encoding and an assignment to an arbitrary scalar code drawn from a private use area, together with a graphic description of the non-standard character as it should be composed. And then we start mucking around with parsing problems, and it occurs to me that we need a fourth part for the definition, a set of attributes for the non-standard character, to tell the parser how to parse it. By this time I get to feeling giddy, like I'm walking on a high-wire, and I give up. Well, sometimes I get far enough to think about using position:orientation:scaling:component 4-tuples, and to thinking that several compositing schemes should be supported. And then my job keeps calling me back. Color me a confused idealist. Joel Rees PS: I have some friends who insist that the Japanese had Kanji before it was (re-?) introduced from China. They use some historical oddities to argue that Kanji should be considered a separate and independent writing system from the Han characters. Isn't it wonderful to live in a world with lots of friendly holes to fall into? ----- Original Message ----- From: "Rick Jelliffe" <ricko@a...> To: <xml-dev@l...> Cc: "www-xml-blueberry-comments" <www-xml-blueberry-comments@w...> Sent: Tuesday, July 10, 2001 9:17 PM Subject: Re: XML Blueberry (non-ASCII name characters in Japan) > In Unicode 3.1 there are added special function characters for allowing new > characters to be composed positionally from parts. These are intended for > very rare or new characters only. > > There has been several thousand of years of research into what the primitive > components of Han ideographs are. It is only now that we have computers and > large databases of characters that it is feasible to try out different > alternatives. At Academia Sinica, for example, my friend Prof. C.C. Hsieh > devised a system with about 600 components and I think 16 composition > functions (side-by-side) which can represent about 98% of the Hanyu lexicon. > > Unicode went with a simpler set of functions, but at the expense that the > functions allow some ambiguity: there may be more than one way to represent > the same character. This may be fine for text, but not good for names where > normalization and comparison is their destiny. > > (I don't think these function characters are suitable for use in names, > b.t.w.) > > Cheers > Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|