[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX/C++: Changes for C++

  • From: Michael Fuller <msf@m...>
  • To: David Megginson <david@m...>
  • Date: Mon, 6 Dec 1999 15:04:18 +1100

c bytestream
On Thu, Dec 02, 1999 at 04:38:11PM -0500, David Megginson wrote:
> Here are some of the differences between the SAX/Java interfaces and the 
> SAX/C++ interfaces:
> 
> - lots of const
> - C++ const char * for Java String throughout (and, thus, UTF-8
>   instead of UTF-16)
> - InputSource doesn't have an equivalent of Java Reader (no getReader
>   method)

I don't mind if the character container is unsigned short or wchar_t
(it doesn't really matter if wchar_t is 32 bits on some platforms as
it's easy enough to convert to/from where required), but put me down
as another vote for UTF-16 rather than UTF-8.

Given that the point of Unicode is to support I18N, why choose as a default
a format that typically has a 50% size overhead for non-European languages?
Many parsers and application happily work internally using UTF-16;
why not standardize that as the default SAX character encoding?

Suggestion:
    Do what the Java SAX interface did: optionally provide *both*
    ByteStream and CharacterStream components in an InputSource object

Applications can treat the ByteStream as a stream of bytes whose encoding
can either be auto-detected, or is explicitly indicated by the Encoding.
However, a CharacterStream would always be a sequence of UTF-16 characters.
    
> - SAXException does not allow an embedded exception, because there's
>   no need to tunnel exceptions in C++ (you can always throw any
>   exception)

Unless you use throw() lists in function declarations; as did the Java spec.
In which case, you need to be able to embed exceptions...

> - DocumentHandler::characters and DocumentHandler::ignorableWhitespace 
>   don't need the 'start' argument, since they can be passed a pointer
>   to the start position in an existing array (that's not possible in
>   Java)

Yup.

> - HandlerBase omitted, since the classes can contain their own default 
>   implementations

I think this has been covered by others; if we define SAX/C++ using
abstract classes, then we need HandlerBase and the Impl classes back
for convenience.

> - I haven't figured out what to do with Parser::setLocale yet

Michael
____________________________________________
http://www.mds.rmit.edu.au/~msf/
Multimedia Databases Group, RMIT, Australia.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.