Re: Specifying a Unicode subset
Uh, yeah it sort of depends on your processing model I guess - the main reason I use UCS-2 is I can get to character n in constant time. With utf-8, character n is reached in time proportional to n. Maybe thats OK for you - I don't find it so great though. On Tuesday, October 22, 2002, at 11:37 PM, Daniel Veillard wrote: > On Mon, Oct 21, 2002 at 12:27:15PM -0400, John Cowan wrote: >> tblanchard@m... scripsit: >> >>> Lets move on. UTF-8 is your transfer encoding, use UCS-2 in memory >>> (unless planning to process ancient Sumerian or something - then use >>> UCS-4) and lets all move on to something remotely interesting. >> >> In CJK environments, using UTF-16 for transfer makes sense, because >> UTF-8 >> imposes a 50% growth in the size of native-language characters. >> That's basically why XML requires both UTF-8 and UTF-16 support of all >> conforming parsers. > > And using UCS-2 for memory encoding is also in a lot of cases > a really bad choice. Processor performances are cache related nowadays. > Filling them up with 0 for half of your data processed can simply > trash your caches. I will stick to UTF8 internally, it also allows > some processor to use hardcoded CISC instructions for 0 terminated C > strings (IIRC the Power line of processors have such a set of > instructions). > > Daniel > > -- > Daniel Veillard | Red Hat Network https://rhn.redhat.com/ > veillard@r... | libxml GNOME XML XSLT toolkit > http://xmlsoft.org/ > http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ >
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format