[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX InputSource and character streams
David Megginson wrote:- > Mike Brown writes: > > > My question was, when supplying a character stream to the parser, is it > > reasonable to expect that the parser will not complain if the encoding > > declaration says the encoding is (was) something the parser does not > > support? > > > XML seems to assume that every parsed entity that a processor encounters > > consists of encoded characters (bytes, essentially), whereas in practice > > we obviously have parsers that accept the entities as characters. > > Hmm -- I can see two reasonable arguments here: > > 1. With a Java character stream, there's no way to know what the > original encoding might have been, so the encoding declaration is > moot. Agreed. > 2. A Java character stream is presented (more-or-less) in UTF-16, so > the encoding declaration, if present, should agree with that. I don't agree with this suggestion for the following reasons:- 1) What's the point? The XML processor has no need to do anything with the encoding declaration since it already has a character stream. A SAX processor doesn't even have to report the encoding to the application. 2) Perhaps most importantly, this undermines the responsibility on the application to provide a valid character stream. I would argue that by passing a character stream, the application undertakes to perform all the encoding-related tasks of an XML processor and thereby relieves the SAX processor of that task. 3) The process that created the character stream (possibly be decoding a byte stream) might have to search for and replace the encoding declaration. This is both undesirable additional work and requires an understanding of XML syntax which, IMHO, is not appropriate for a (possibly generic) decoding utility. 4) You focus on Java. This is understandable given the origins of SAX but, thanks to the simplicity and ellegance of the SAX interface, it has broken free and is now implemented in a number of languages. On some platforms C++ is not constrained by 16-bit characters and can present the application with full UCS-4 characters. Regards Rob Lugt ElCel Technology
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|