[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: IE5.0 does not conform to RFC2376
David Brownell wrote: > Chris Lilley wrote: > > > > What this RFC appears to do is remove author control over correctly > > labelling the encoding, and ensure that most if not all XML documents > > get incorrectly labelled as US-ASCII. > > Not at all. The best default MIME content type for all web > servers is "application/xml". Why? Do you consider anything not written in US-ASCII as a text document? I think the Unicode Consortium would disagree with you there. You don't actually show that application/xml is better, because you say: > Without a "charset=Big5" or > similar declaration, then the XML processor's autodetection > kicks in ... minimally handling UTF-8 and UTF-16, and quite > commonly handling a variety of additional encodings. If it has poor code to autodetect, it has poor code for both text/xml and application/xml. But it need not autodetect, in fact, autodetection is a bad thing. I was not suggesting autodetection, quite the converse. Rather, in the absence of an explicit MIME charset parameter, it should use the encoding declaration. If there is none, then the document is in UTF-8 or UTF-16 and the XML spec tells you how to determine which. [1]. If the processor is unable to deal with a particular encoding (8859-15, for example) then that is still the case whether the information was conveyed in a charset parameter on the MIME type (text/xml) or in the encoding declaration in the entity (application/xml). So, in what way is application/xml any better? So, the only difference between text/xml and application/xml in this regard is that the former *requires* the client to ignore the encoding declaration in the entity and forces an interpretation of US-ASCII in all cases. Now, the default for text/* over HTTP is ISO-8859-1 and the default for XML in the absence of an encoding declaration is UTF-8 or UTF-16. My position is that the most preferable option when registering text/xml would have been to use the rules in the XML spec (UTF-8 or UTF-16, unles there is an encoding declaration). > For example, Sun's XML processor handles about 140 encodings > at last count ... and _does_ conform to RFC 2376. You mean, when receiving a message body labelled as text/xml (via email or via HTTP) it ignores the encoding declaration, assumes US-ASCII, signals a fatal error because of invalid byte sequences in the file and then halts? Great ;-( > > So, this RFC removes at a stroke the possibility of authors correctly > > labelling the encoding of their XML documents and takes us back to that > > dark time (the present) when the majority of, say, Japanese Web content > > was mis-labelled. And it seems to have done this simply to save a very > > small part of coding effort for people writing transcoders. > > Again, no it doesn't. The idea is to get the web server to > attach the correct MIME content type, which is NOT "text/xml" > in many/most cases. So, your position is that since text/xml is unusable, best use application/xml instead? Surely it would have been better not to make text/xml unusable? Or if that was thought unreasonable, then why register text/xml at all? > Authors must rely on the administrator > not breaking their content, and this is part of it. Authors would love to rely on this, but have learned not to. The vast majority of content authors have *no control whatsoever* on server configuration. This isn't 1993; assuming that the person who wrote the content is also the person who administers the server is totally unwarranted. For 99.9995% of the folks, they sign up with an ISP; they get around 5Megs of web space and they are allowed to upload documents there. They share that server with thousands of other users. The server is not chosen by them, and is configured with all the default settings and the ISP will not change them no matter how many reasoned emails are sent by users. So, users cannot choose the MIME type that is used and certainly do not have the control to allow different documents to be served up with different MIME parameters depending on the encoding of their various documents. Which is my concern; control is removed from the users (who get to author the documents, and are in a position to do the right thing) and put in the hands of ISP administrators (who are installing new web servers at a rate of several a day, and do not want any special cases or anthing that is not right out of the box). Merely saying "so, ignore text/xml and use application/xml" does not help matters; its a workaround, not a solution. [1] http://www.w3.org/TR/REC-xml#charencoding -- Chris xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|