[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: How to process Japanese Code with XMLDSO(MS-XML)
Rick Jelliffe wrote: > MURATA Makoto wrote: > > > More than one conversion procedures certainly exist. The more I > > think about > > this issue, the more pessimistic I become. > > On the other hand, there has just been little practical requirement for > everyone to > synchronize until now. It is still the earliest days of Unicode and XML This is correct. We still have hope. > deployment, so cheer up! perhaps a strong request stating XML's needs to > > JIS and Microsoft (and ISO) can force a resolution. > If everyone remains stubborn, then the only thing to do is for IANA to > register > three different character sets. And perhaps XML will need another > pre-defined > attribute to indicate which character set variant is in use in an > element, to > handle cut-and-paste. What a cock-up. In the meantime, I guess the > appropriate > strategy is "damage control": as many Japanese implementors as possible > should > adopt a single mapping. Can you recommend one? What I have in mind is as follows: First, we should clarify the definition of the charset "shift_JIS" registered at IANA. I believe the least common denominator, which is JIS X0201 + JIS X0208:1997 should be adopted as the coded character set of SHIFT_JIS. NEC extensions and IBM extensions should be eliminated. 0x5C is backslash rather than yen sign, and 0x7E is tilde rather than overline. Second, we should revise the Japanese profile for XML and encourage the use of character entities to represent conversion- error-prone characters rather than directly use them. (See the end of this message.) Then, it will become easier for users to make conversion-error-free documents. User-friendly XML processors should warn users when documents in EUC-jp, iso-2022-jp, or shift_jis contain such characters. > deployed. It is better to converge on a single mapping, even if that > mapping > is not satisfactory to everyone (i.e. JIS). Actually, I am not optimistic about this, because there are many conversion policies. For example, Microsoft maps 0x5C (sjis) to 0x005C (unicode), but the glyph for yen sign is used for this code point. Microsoft converts NEC extensions and IBM extensions to Unicode characters. On the other hand, Java ignores NEC extensions and IBM extensions. (What happens if J++ is used? I do not know.) Apple appears to use more than one conversion table. Rick Jelliffe wrote: > > I have made mapping tables for entity references to thousands of > characters and > glyphs. You are talking about SPREAD entities. I recently tried to find ERCS documents, but I could find only a few. In my understanding, names of SPREAD entities contain hexadicimal numbers. But XML already have hexadecimal character entities. I would rather want to use natural language markup such as &enkigou; (enkigou should be in kanji). Here is a list of conversion-error-prone characters. < YEN SIGN > BACKSLASH < OVERLINE > TILDE < OVERLINE > FULLWIDTH MACRON < EM DASH > HORIZONTAL BAR < BACKSLASH > FULLWIDTH BACKSLASH < WAVE DASH > FULLWIDTH TILDE < DOUBLE VERTICAL LINE > PARALLEL TO < MINUS SIGN > FULLWIDTH HYPHEN-MINUS < YEN SIGN > FULLWIDTH YEN SIGN < CENT SIGN > FULLWIDTH CENT SIGN < POUND SIGN > FULLWIDTH POUND SIGN < NOT SIGN > FULLWIDTH NOT SIGN < TILDE > FULLWIDTH TILDE < BROKEN BAR > FULLWIDTH BROKEN BAR Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@a... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|