Re: Specifying a Unicode subset

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: John Cowan <jcowan@r...>
Subject: Re: Specifying a Unicode subset
From: Daniel Veillard <veillard@r...>
Date: Tue, 22 Oct 2002 17:37:10 -0400
Cc: tblanchard@m..., xml-dev@l..., Gustaf Liljegren <gustaf.liljegren@x...>
In-reply-to: <200210211640.MAA28778@m...>; from jcowan@r... on Mon, Oct 21, 2002 at 12:27:15PM -0400
References: <AF104122-E511-11D6-BFB3-0030657E2F34@m...> <200210211640.MAA28778@m...>
Reply-to: veillard@r...
User-agent: Mutt/1.2.5.1i

On Mon, Oct 21, 2002 at 12:27:15PM -0400, John Cowan wrote:
> tblanchard@m... scripsit:
> 
> > Lets move on.  UTF-8 is your transfer encoding, use UCS-2 in memory 
> > (unless planning to process ancient Sumerian or something - then use 
> > UCS-4) and lets all move on to something remotely interesting.
> 
> In CJK environments, using UTF-16 for transfer makes sense, because UTF-8
> imposes a 50% growth in the size of native-language characters.
> That's basically why XML requires both UTF-8 and UTF-16 support of all
> conforming parsers.

  And using UCS-2 for memory encoding is also in a lot of cases
a really bad choice. Processor performances are cache related nowadays.
Filling them up with 0 for half of your data processed can simply
trash your caches. I will stick to UTF8 internally, it also allows
some processor to use hardcoded CISC instructions for 0 terminated C
strings (IIRC the Power line of processors have such a set of instructions).

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@r...  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: Specifying a Unicode subset
  - From: tblanchard@m...
- Re: Specifying a Unicode subset
  - From: Paul Prescod <paul@p...>

References:
- Re: Specifying a Unicode subset
  - From: tblanchard@m...
- Re: Specifying a Unicode subset
  - From: John Cowan <jcowan@r...>

Prev by Date: Re: The Browser Wars are Dead! Long Live the Browser Wars!
Next by Date: XML 1.1 Names
Previous by thread: Re: Specifying a Unicode subset
Next by thread: Re: Specifying a Unicode subset
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >