[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode confusion

  • From: David Brownell <david-b@p...>
  • To: xml-dev@i...
  • Date: Tue, 04 Jan 2000 08:56:35 -0800

unicode api
David Megginson wrote:
> 
> roddey@u... writes:
> 
> > If anything, it should go the other way. Unicode should be the core
> > API, and there should be helper API to allow the use of local code
> > page chars where necessary. Everything should be set up to optimize
> > use of the Unicode API, with local code page use paying the price,
> > since Unicode is the more desireable format.

I took that as referring to 16-bit character codes vs variable width
or 32-bit ones.  And when I take it that way, I agree!  (However, the
notion of a "Unicode API" struck me as strange; the spec has no API.)


> No one's disagreeing with the use of Unicode; we're talking about
> which character encoding we'll use to represent it.  You can represent
> Unicode in variable-width 8-bit or 16-bit encodings or in fixed-width
> 32-bit encodings.
> 
> Note that Java uses UTF-16, which isn't quite fixed-width, though no
> one really notices.

... no one really notices "yet"!  Unicode is still rolling out, in the
big picture, and most people now using it have little reason to notice.

One way that UTF-16 (and Unicode) aren't fixed width is that there
can exist "surrogate pairs", where two 16-bit values get combined to
represent a character in a range that can't be represented by 16-bits.
(For those that didn't know that!)   It's the existence of such pairs
which makes some folk argue that a 32-bit character code is the way to
go (and they persuaded most SysV UNIX platforms to put a 32-bit wchar_t
in their ABI, accordingly).

However, another way they aren't fixed width is that "combining"
characters get used.  Things like diacritical marks aren't always
part of the characters.  In my book, the additional existence of
such features means there's no point in a 32-bit character code,
since even apps using a full ISO-10646 encoding (32-bit) still need
to deal with such issues.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.