[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: encoding problem fixed

  • From: John Cowan <cowan@l...>
  • To: XML Dev <xml-dev@i...>
  • Date: Fri, 30 Jul 1999 13:56:49 -0400

mailto charset
David Brownell wrote:

> Actually, that's not correct either.  My general advice is to pass a
> URI to the parser -- which is required to do the correct thing! -- and
> in those rare cases that can't be done:
> 
>     * If the data is externally typed according to character set,
>       you MUST use some Reader ... e.g. given a MIME type of
>       "application/xml;charset=Big5", then use a reader set
>       up to use the "Big5" encoding (a Chinese encoding).  There
>       isn't much choice of classes; InputStreamReader, or a custom
>       reader that understands that encoding.
> 
>     * If the data is NOT externally typed, then you MUST rely on
>       the XML parser's autodetection ... pass an InputStream.

This is all quite sound, and I was wrong to overlook the case of
external charset information.
 
> > Actually, it's doing what it's expected to: reading the native charset,
> > CP-1252.  (Unix JVMs use 8859-1 instead.)
> 
> Those are actually system-specific defaults ... many localized versions
> of those environments work differently.  For example UNIX JVMs may well
> use the "EUC-JP" coding in Japan, or MS-Windows the "Shift_JIS".

Reasonable.
 
> In fact, my own basic guidance is never to pass any sort of I/O stream
> (InputStream -or- Reader!) to the parser; let the parser work from the
> URI, if at all possible.  It's normally quite possible, and it's a lot
> less likely to handle the encodings wrong than application code!!

This leads to an interesting question: what do various XML parsers
do when fetching http: URIs that produce explicit charset declarations?
Someone should try Aelfred, etc. and see if the header-level charset
declaration is respected, overriding the internal encoding declaration.

-- 
	John Cowan	http://www.ccil.org/~cowan	cowan@c...
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
			-- Coleridge / Politzer

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.