Rick Jelliffe brings one of the most complete and coherent Eastern/Western perspectives I've ever encountered, and his proposal says:
"A Nuke document is UTF-8 in its external form. Inside a program, after parsing, it would typically use UTF16."
Yes, we all know about the politics and inertia that have affected uptake of Unicode in some geographies, but the "UTF-8 or UTF-16" is there for a very strong pragmatic reason. Dealing with a pretty open-ended world of character sets, as in XML 1.0 is one of the biggest factors that complicate and slow down parsers, even if you get someone else (e.g. ICU) to do the relatively hard bits.
If we want to have a strong diversity of well-performing and conforming tools, which I suspect is an important component of success for most of us considering XML-NG, I think "UTF-*-only" is the simple reality. For me, UTF-8 or UTF-16 is certainly an improvement over JSON's UTF-8 only.
I'm curious as to how that JSON limitation is affecting trends in text processing conventions in non-Western countries as "Web 2.0" becomes pervasive.
Uche Ogbuji http://uche.ogbuji.net
Poetry ed @TNB: http://www.thenervousbreakdown.com/author/uogbuji/
Founding Partner, Zepheira http://zepheira.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format