|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: IE5.0 does not conform to RFC2376
MURATA Makoto wrote: > > Chris Lilley wrote: > > The vast majority of content authors have *no control whatsoever* on > > server configuration. This isn't 1993; assuming that the person who > > wrote the content is also the person who administers the server is > > totally unwarranted. > > To overcome this problem, Uchida-san is proposing a convention for WWW server > configurations. His proposal is already used by some ISPs in Japan. It is > available at: > > http://www.asahi-net.or.jp/~sd5a-ucd/docs/suffix_guideline_981106.txt Its good to see a concrete proposal. On the other hand, relying on a complex convention of filename suffixes is problematic: - it either requires content negotiation to be enabled (something not all servers can do) or it results in a mulitplicity of URIs for the same resource. - it requires all content authoring applications to know about it and to offer to save using this naming convention - it duplicates (and may contradict) the XML encoding declaration - the information stored in this way may be lost when saving local copies to systems which do not allow double dots in filenames or which have other restrictions. The XML encoding declaration, on the other hand, is much more robust in the face of the multiplicity of file systems in use. The second condition is made harder because two alternative syntaxes are proposed in this note - so a content auuthor has to know which convention is used on a particular server. Also, the note says that these are only two of many possibilities. An alternative method for achieving the same result is to use a filter (this can be done in Apache and in Jigsaw) which automatically emits the correct charset parameter based on reading the encoding declaration in the XML instance. Thi s can easily cache its results, and need not result in processing overhead on each request. Of course, this still requires work - for example, to ensure that it is included in the standard Apache distribution; but it is easier than trying to get the hundreds of authoring tools to support a couple of naming conventions which may in any case be hard to deal with on some platforms (platforms are still in use which have trouble with .html, for example ;-) > Chris Lilley wrote: > > > > But not necessarily everyones favourite. It is a good choice for > > Japanese, because Kanji use less bytes per character in UTF-16 than in > > UTF-8. > > > > > (In the case that the charset is broken, autodetection of > > > UTF-16 is very easy. > > > > But autodetection should not be required; users can label their > > documents correctly. > > To me, the biggest advantage of UTF-16 is that UTF-16 XML documents can parse > only as UTF-16. Even if the charset parameter is incorrect, UTF-16 XML documents > do not parse incorrectly (and error recovery is very reliable). I am wary of relying on error recovery. If it doesn't work well, then there is reduced interoperability because of variation; if it does work well, or seems to work well in some cases, then people just use it all the time. > Chris Lilley wrote: > > On the other hand, if the RFC had been written as I suggested, saying > > that a charset parameter overode *if present* but that *if absent*, the > > rules in the XML recommendation were followed, then you would need no > > server reconfiguration and the rules to follow to have the encoding > > information correctly conveyed to the client would have been a matter of > > public record in the XML recommendation rather than private convention. > > A big win for interoperability, if that had happened. > > At *IETF*, the default of the charset parameter for text/HTML *is* 8859-1. Yes, which is different to the default for text/* - this demonstrates that it is possible to give a more specific rule for a particular registration. I gave an example of a particular rule for text/xml which would have saved all this bother. > You might want to change this first. Why? It is XML we are speaking of here. > It is going to be very difficult or > impossible, since HTTP and MIME people will disagree. I think you mean, HTTP and Mail(SMTP/IMAP/POP). MIME is used by both email and HTTP. > There have been a lot of discussion about this issue. None of your arguments > are new to me. In fact, my original opinion was not so different from yours but > I have changed my mind during the discussion. More about this, see the archive > of the XML SIG (around April and May of 1998). OK, I will check this out. I cannot of course discuss such material in this forum, however. Perhaps you could post your technical reasons for the change of direction here? > > Murata-san, you asked why a W3C team person was criticising this RFC in > > public. It is because the mission of W3C is to improve interoperability, > > so it is my duty to do so. > > You might want to check what the W3C I18N WG has said to the XML CG. If > W3C strongly recommends the use of the charset parameter, the world will > change. Sure, in the absence of any other indication, server-applied labelling is certainly better than no labelling or guesswork. I have nothing against the use of the charset parameter. But, if it is not present, then the XML Rec says exactly what should happen; carefull wording which this RFC nullifies. Problems arise if an XML file is saved from the Web to a local filesystem, perhaps for further editing; the MIME charset information is lost. It could perhaps be stored in some way - but, there is already a standard way - the XML encoding declaration. And if the charset parameter is present, then it should say the same thing as the encoding declaration. The best way to ensure this is to treat the XML encoding declaration as the prmary metadata resource and to programatically derive the charset parameter from this; greater robustness is at once achieved and also harmonisation of the MIME and XML labelling. > XML is the last chance. I agree, it is important to get it right. > I am strongly advocating the use of the > charset parameter in Japan whenever possible. Great. On the other hand, you seem to be trying to do so by enforcing a different default charset than that in the XML Recommendation, which means that local files and remote files work differently; this is clearly not desirable. > On the other hand, if even a > W3C team member does not respect the consensus, there is not much hope. I think that last comment was beneath you, and would thank you to restrict yourself to technical argument. However, I will point out that it is the consensus of the XML 1.0 Recommendation that I am respecting - and that the RFC does not, by altering the meaning of the default encoding. It could have been harmionised with the XML REC; it was not. Redundancy can be good; a charset parameter and an XML encoding declaration that say the same thing and work the same way, which is what I was suggesting, is good. What you are suggesting, which is a charset parameter and an XML encoding declaration that work in different ways, is clearly suboptimal. -- Chris xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








