|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Specifying a Unicode subset
On Mon, Oct 21, 2002 at 12:27:15PM -0400, John Cowan wrote: > tblanchard@m... scripsit: > > > Lets move on. UTF-8 is your transfer encoding, use UCS-2 in memory > > (unless planning to process ancient Sumerian or something - then use > > UCS-4) and lets all move on to something remotely interesting. > > In CJK environments, using UTF-16 for transfer makes sense, because UTF-8 > imposes a 50% growth in the size of native-language characters. > That's basically why XML requires both UTF-8 and UTF-16 support of all > conforming parsers. And using UCS-2 for memory encoding is also in a lot of cases a really bad choice. Processor performances are cache related nowadays. Filling them up with 0 for half of your data processed can simply trash your caches. I will stick to UTF8 internally, it also allows some processor to use hardcoded CISC instructions for 0 terminated C strings (IIRC the Power line of processors have such a set of instructions). Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@r... | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








