[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Closing Blueberry

  • From: Joel Rees <rees@s...>
  • To: Elliotte Rusty Harold <elharo@m...>
  • Date: Mon, 23 Jul 2001 17:13:14 +0900

font kanji
Elliotte,

You've written a pretty good summary (of the visible tip of the iceberg) of
the long term problem. You are correct that fixing the character set to a
specific version of UNICODE is going to continually force the problem to be
re-visited.

About new words in Japanese, borrowed words are usually written in katakana,
as you seem to be aware. (This is current practice, which reverses practice
around the turn of the last century, which was also a reversal of practice.
Interesting history there.) Some borrowed words, like "double" {daburu} and
"software bug" {baggu} become so accepted that native speakers write them in
hiragana (and/or begin to conjugate them as Japanese). If the level of
acceptance is high, these words can be written with existing Kanji, implying
a new reading to the Kanji. In some cases, new Kanji are invented for these
words, but that is rather rare, at least at this time. It definitely doesn't
survive into generally available computerized documents.

It might interest you to know that some native Japanese words have never
been assigned Kanji, at least, not according to the dictionaries. However,
with the proliferation of word processors, it has become much easier to use
Kanji that one doesn't really remember how to write by hand. This means that
many words that have been traditionally written in hiragana, except by
professors and people putting on airs, are now commonly typed as Kanji in
word processor documents.

But the bulk of new ideographs, as I understand it, are in specialty fields,
highly technical terms for which phonetic kana would simply not carry the
semantic load. (This is similar to our turning to Latin for names of newly
discovered species and diseases.) Most people do not need these new
characters, but the people who need them really do need them.

Current work-arounds for the new technical Kanji are all proprietary. The
researchers have some gaiji ("foreign character"=private use character)
editor, their group all use the same word processor, and they pass around
their file of gaiji. As you might imagine, this drives the choice of word
processor for the research group, and it tends to discourage use of personal
equipment in research. When they publish, their printing company has to
build a one-shot gaiji font.

I think a real solution to the creativity problem requires a fundamental
shift in the method of encoding ideographs. At a minimum, we have to be able
to define new characters on the fly, complete with parsing information. We
have some of the technology for on-the-fly character image definition and
rendering, but it's expensive. We don't have the technology to handle adding
parsing information to characters on the fly, but I think XML/SGML and
UNICODE are finally giving us the tools to take it on. It might not be that
hard. Encoding the on-the-fly characters for transmission will require a
base of known characters and a common method for attaching the definitions
of the non-common characters used.

(UNICODE might provide the base set, but it feels like a poor fit to me --
too many universally defined characters, among other things. Moving the
burden of the on-the-fly characters onto XML, as some have suggested, would
of course add another parse layer, and would definitely require a new
version number.)

But I think blueberry shouldn't need to wait for on-the-fly character
creativity.

I wonder if the version number or XML declaration could be modified to
include a field for specific UNICODE version number reference, as has been
alluded to on the list. A simple linear progression, tying character issues
to version number, seems too limited.

I'm sure this has been considered, but what would the arguments against
declaring the UNICODE version number in the encoding clause would be?

<xml version="1.0.1" encoding="UNICODE-3.1">

No, we are going to want to be able to do something like

<xml version="1.0.1" encoding="mojikyo" encoding-reference="UNICODE-3.1">

Joel Rees
programmer -- rees@m...
----------------------------------------------------
To be a tree supporting all information,
  giving root to the chaos
    and branches to the trivia,
      information breathing anew --
        This is the aim of Yggdrasill.
============================XML as Best Solution===
Media Fusion Co. ,Ltd.  $B3t<02q




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.