[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Text/xml with omitted charset parameter
From: "Bjoern Hoehrmann" <derhoermi@g...> > So, who tells me I > am wrong and text/xml documents without charset parameter may still be > UTF-8 encoded (and use non-ASCII characters)? Apache uses text/xml as > default type for .xml documents, are they asking for interoperability > problems or what? The only ways out of encoding hell are: - no data interchange between people on different systems - everyone adopt a common character reportoire and encoding (e.g. UTF-8) - everyone label their data, and make all protocols strictly require accurate labelling The first of these is silly. The second can only come slowly, if it comes. XML supports the last two: a common character repertoire and required labelling. It is no suprise that this falls apart as soon as we get layers which do not transfer encoding information. But that is not a sign that labelling is bad, it merely means that the other layers need to be attended to. The impracticality of the defaulting rules that MIME uses is the culprit. It covers up an issue that should be explicitly handled and should have widespread awareness. (It is nice to see things like a text open dialog box on Mac OS X having an encoding selection option, in this regard. Shame Java doesn't have it standard.) "Text" entities do not exist. There is only text in a particular encoding. It is the long-standing and slack policy of defaulting to the locale's character encoding has meant that our API infrastructure do not provide enough mechanisms for passing information about character encoding. (IMHO, the C type byte and char is the big villian; the Java approach of specifically saying "a character is Unicode using UTF-16" is a big way out of the problem, but even Java's API did not at first take the external-internal encoding transition seriously. ) We take it for granted that our programming languages should have a data type for "float" which is different from "integer", and we are taught the difference in computer courses. Yet I believe that most IT students are not taught anything about character sets or character encodings. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|