[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX InputSource and character streams

  • From: David Brownell <david-b@p...>
  • To: Rob Lugt <roblugt@e...>, Mike Brown <mike@s...>,xml-dev@l...
  • Date: Tue, 20 Feb 2001 10:27:34 -0800

sax inputsource
> > When constructing a SAX InputSource from a character stream
> > (java.io.Reader), is it correct to assume that any encoding
> > declaration given in the document will be ignored?

No ... the XML 1.0 spec does however say (near the end of 4.3.3) that

    it is an error for an entity including an encoding declaration to be presented
    to the XML processor in an encoding other than that named in the declaration.

Translated to English, a SAX processor MAY report an error if the encoding
declaration is wrong, but it's not required.  In the XML spec, "error" is a
wording that accomodates variations in vendor implementations (except "fatal"
ones, which MUST be reported, and validity errors).


> Obviously it is an implementational thing, but I would argue that it makes
> no sense for a SAX parser to try to validate the encoding string contained
> within a character stream (java.io.Reader).

Java doesn't really make it easy to figure out what the input encoding was,
unless you just happen to be using an InputStreamReader so you can use
getEncoding() ... and then can translate from those "Java encoding names"
(not really documented last I checked) back to the real world.  So there's
no "100% reliable" check for whether the encoding name matches.

My conclusion is that it's worth a warning if things don't check out, since
it's easy enough to create a Reader that's using the wrong encoding.  It's
just bits ... and so long as '<', '>', '&' and a few other characters get read
correctly, the XML might actually parse ... but give garbage because the
non-markup characters were misinterpreted.  Consider the different
ISO-8859 encodings --- I could easily see that happening.

It's pretty clear that the XML spec allows that to be treated as a fatal
error, so I'd never assume that an encoding would be ignored.

- Dave


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.