[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX/C++: UTF-8 v UTF-16

  • From: James Clark <jjc@j...>
  • To: David Megginson <david@m...>
  • Date: Fri, 03 Dec 1999 09:50:11 +0700

c utf 8
David Megginson wrote:

> 4. Hold my nose and use UTF-8 rather than UTF-16, for compatibility
>    with most existing C++ code.

I would say there was at least as much C++ code using UTF-16 as using
UTF-8. On Windows at least, UTF-16 is much more common. The DOM mandates
UTF-16, so if SAX mandated UTF-8 there would be an unfortunate mismatch.
This is a tough one, because there's a lot more diversity in the C++
world.  My preference would be not to mandate either UTF-8 or UTF-16
exclusively.  There are lots of apps using UTF-8 and there are lots of
apps using UTF-16; if you exclude either, then a lot of apps will take a
mojor performance/convenience hit. Expat allows a choice at compile-time
between UTF-8 and UTF-16, and there are big projects using both (eg Perl
uses UTF-8 and Mozilla uses UTF-16).

There are a couple of possible solutions:

1. A lo-tech solution.  Provide a SAXChar typedef, and define everything
in terms of SAXChar.  SAXChar gets typedefed to either char or unsigned
short depending on whether SAX_UNICODE is defined or not.  It's up to
implementations to decide whether to support both or just one, and up to
clients to decide whether to work with both or to require one.

A variation on this is to allow both UTF-8 and UTF-16 variants to exist
in a single library.  To do this, you can do something along the lines
of

class AttributeList16 {
public:
  virtual const unsigned short *getName(int pos) = 0;
};

class AttributeList8 {
public:
  virtual const char *getName(int pos) = 0;
};

#ifdef SAX_UNICODE
typedef AttributeList16 AttributeList;
#else
typedef AttributeList8 AttributeList;
#endif

2. A hi-tech solution.  Do what the Standard C++ library does and make
the interface a template in the character type.  This is the cleanest
solution, but lots of C++ projects eschew templates on portability
grounds.

If you feel that one needs to be mandated, I would pick UTF-16.

James



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.