|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] [offtopic] Re: Microsoft FUD on binary XML...
Tim Bray wrote: > On Nov 22, 2003, at 3:37 PM, Alaric B Snell wrote: > >> Good point, actually... I suppose that, in general, any language >> which uses more than 256 code points in general use is actually quite >> likely to be a language that uses one code point per word. > > No, actually. I don't know much about Chinese, but the average number > of characters/word in Japanese is two point something; you have to > learn 1700 or so characters to get out of Japanese high school, and > literate people pick up quite a few more. Korean Hangul are syllabics > and thus there are naturally several per word. Chinese words are often deemed to be made of two characters: Beijing. The very common 4 character parallel epigram (such as "crouching tiger hidden dragon") uses this. On the other hand, one remarkable thing about Chinese is that lay people often do not have a strong idea of "word" at all. Not one of my various Chinese friends could even name, off hand, the Chinese word for "word". De Francis' "The Chinese Language" says they go from characters to ideas to sentences rather than letters to words to ideas to sentences, where a character is halfway between our letters and a word. I guess it is like in English: is "white space" or "whitespace" one word or two words? Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








