[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: IE5.0 does not conform to RFC2376
MURATA Makoto wrote: > > I believe that IE 5.0 does not conform to RFC2376 (XML Media Types), > of which I am a co-author. > > As for the XML media type "text/xml", the charset parameter in the > MIME header is authoritative. Encoding declarations have to be ignored > so that transcoding is possible. So, if the file is saved to some local browser cache and then re-read, it may have no MIME header so the encoding declaration is then authoritative. Why can't the transcoding proxy also rewrite the encoding declaration, since it is rewriting the file anyway? It is trivially easy to find, process, and change. I imagine that someone could take some generic charset-converting code and make a n XML-aware transcoding servlet that rewrote the encoding declaration in about what, an hour? If someone does this, I will see about getting it included in the next Jigsaw version. > However, IE 5.0 appears to always ignore the charset parameter and use > the BOM or encoding declaration only. Therefore, IE 5.0 does not conform to > RFC 2376. Okay. But does RFC 2376 conflict with the XML 1.0 Recommendation? > Proof: I made a UTF-8 XML document which also parses even when it is assumed as > Shift_JIS. Then, I provided the correct charaset parameter "utf-8" > in the MIME header by configuring Apache and provided an encoding declaration > "Shift_JIS" in the XML document. Such mismatch is perfectly legal and > usual when proxies perform code conversion. I tried this document with IE 5.0. > Incorrect characters were displayed. Q.E.D. Okay, proof accepted. > When the charset parameter is not specified, it is assumed as US-ASCII. Wow. So, what this RFC says is that, when used in email and on HTTP, the encoding declaration is *always ignored*. That is a pretty big change and, frankly IMHO, ill-advised. > If you are using Apache and overriding by AddType is allowed, you only have to > create a file named .htaccess in your directory and write a line as below: > > AddType "text/xml; charset=utf-8" xml Correction: if you are the *administrator* of an Apache server. One of the ways in which the Web has changed over the last 5 years is that the percentage of Web authors who also administer the site that they serve from has dropped from a substantial majority to an insignificant minority. What this RFC appears to do is remove author control over correctly labelling the encoding, and ensure that most if not all XML documents get incorrectly labelled as US-ASCII. Then, if the parser is working correctly, they will compain about all bytes with value >127 being "illegal characters" and halt with a fatal error[1] So, this RFC removes at a stroke the possibility of authors correctly labelling the encoding of their XML documents and takes us back to that dark time (the present) when the majority of, say, Japanese Web content was mis-labelled. And it seems to have done this simply to save a very small part of coding effort for people writing transcoders. I suspect that this was not the desired result. This could have been avoided: 1) Require explicit charset for overriding the internal encoding declaration, so if one really wants to re-label a document as US-ASCII one actually has to send it out as text/xml; charset="US-ASCII" 2) Define the absence of an explicit charset encoding in the MIME header not as "US-ASCII" but as "use encoding in XML instance" in accordance with the XML 1.0 Recommendation. 3) Encourage transcoding software to rewrite the internal encoding declaration 4) Make suitable transcoding softare freely available so that the cost of not complying with point 3 (write your own) is higher than the cost of complying with it (use a pre-built one). Please consider points 1 and 2 to be a defect report on RFC2376 -- Chris [1] http://www.w3.org/TR/REC-xml.html#charencoding xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|