[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX InputSource and character streams

  • From: Rob Lugt <roblugt@e...>
  • To: David Megginson <david@m...>, xml-dev@l...
  • Date: Mon, 12 Mar 2001 15:49:50 +0000

sax processor
David Megginson wrote:-

> Mike Brown writes:
>
>  > My question was, when supplying a character stream to the parser, is it
>  > reasonable to expect that the parser will not complain if the encoding
>  > declaration says the encoding is (was) something the parser does not
>  > support?
>
>  > XML seems to assume that every parsed entity that a processor
encounters
>  > consists of encoded characters (bytes, essentially), whereas in
practice
>  > we obviously have parsers that accept the entities as characters.
>
> Hmm -- I can see two reasonable arguments here:
>
> 1. With a Java character stream, there's no way to know what the
> original encoding might have been, so the encoding declaration is
> moot.

Agreed.

> 2. A Java character stream is presented (more-or-less) in UTF-16, so
> the encoding declaration, if present, should agree with that.

I don't agree with this suggestion for the following reasons:-

1) What's the point?  The XML processor has no need to do anything with the
encoding declaration since it already has a character stream.  A SAX
processor doesn't even have to report the encoding to the application.

2) Perhaps most importantly, this undermines the responsibility on the
application to provide a valid character stream.  I would argue that by
passing a character stream, the application undertakes to perform all the
encoding-related tasks of an XML processor and thereby relieves the SAX
processor of that task.

3) The process that created the character stream (possibly be decoding a
byte stream) might have to search for and replace the encoding declaration.
This is both undesirable additional work and requires an understanding of
XML syntax which, IMHO, is not appropriate for a (possibly generic) decoding
utility.

4) You focus on Java.  This is understandable given the origins of SAX but,
thanks to the simplicity and ellegance of the SAX interface, it has broken
free and is now implemented in a number of languages.  On some platforms C++
is not constrained by 16-bit characters and can present the application with
full UCS-4 characters.

Regards
Rob Lugt
ElCel Technology




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.