|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Politics, and UTF-8+names considered harmful for text
Tim Bray wrote: > Only because such a revision is not politically viable. The only > advantage of the +names approach is that it doesn't touch XML. But because this is a new encoding (and there have been no successful new encodings for years AFAIK), it will take at best about 3-5 years minimum to have deployment as part of standard distributions such as Java etc, depending on the attitude of the vendors, and vendors such as MS and Sun probably see it as a waste of time not fitting in with their Unicode strategy and tools. So the only likely implementation route is for parser writers to add it (or for implementers to add it to entity management) on a product-by-product basis. But if you have a majority of parser vendors supporting it as an XML add-on, you already have the quorum for getting an XML revision. So arguments for it on the basis of realistic pragmatism don't make any sense to me. Adding together the W3C HTML/XHTML people + the W3C Schema people + the MathML people + the XSLT people (all of whom have language that are being held back by a named character references being tied to DTDs) + the I18n WG gives a group hardly without any policital clout in the W3C. This is a very different issue to the Unicode upgrade issue of 1.1. Furthermore, adopting XML's entity or NCR mechanism without also adopting a header mechanism for non-XML uses is allow in-band signalling that that encoding is currently in use is positively damaging, because it creates a dialect of UTF-8 that can only be detected by some who knows that the data may be using this convention checking to see whether it has things that look like delimiters and judging that they are being used as delimiters. At the moment, life is simple: you can look to see the byte patterns in a file and know that it is UTF-8: there is very little chance of a misdiagnosis because no other encoding really has the same modified Huffman signature. I don't know why on earth we would want to put ourselves in the same kind of position as the Japanese have with text: they have a couple of alternate mappings in some vendors' versions of various encodings which adds complication.[1] Why would we want to get a similar situation? Cheers Rick Jelliffe [1] http://www.w3.org/TR/2000/NOTE-japanese-xml-20000414/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








