[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Politics, and UTF-8+names considered harmful for text


names in politics
Tim Bray wrote:

> Only because such a revision is not politically viable.  The only 
> advantage of the +names approach is that it doesn't touch XML. 

But because this is a new encoding (and there have been no successful 
new encodings for years AFAIK),
it will take at best about 3-5 years minimum to have deployment as part 
of standard distributions
such as Java etc, depending on the attitude of the vendors, and vendors 
such as MS and Sun probably see
it as a waste of time not fitting in with their Unicode strategy and tools.

So the only likely implementation route is for parser writers to add it 
(or for implementers to
add it to entity management) on a product-by-product basis. But if you 
have a majority of
parser vendors supporting it as an XML add-on, you already have the 
quorum for getting
an XML revision.

So arguments for it on the basis of realistic pragmatism don't make any 
sense to me.

Adding together the  W3C HTML/XHTML people + the W3C Schema people
+ the MathML people + the XSLT people (all of whom have language that 
are being
held back by a named character references being tied to DTDs) + the I18n WG
gives a group hardly without any policital clout in the W3C.  This is a 
very different
issue to the Unicode upgrade issue of 1.1.

Furthermore, adopting XML's entity or NCR mechanism without also adopting
a header mechanism for non-XML uses is allow in-band signalling that 
that encoding
is currently in use is positively damaging, because it creates a dialect 
of UTF-8 that can
only be detected by some who knows that the data may be using this
convention checking to see whether it has things that look like delimiters
and judging that they are being used as delimiters.

At the moment, life is simple: you can look to see the byte patterns in a
file and know that it is UTF-8: there is very little chance of a 
misdiagnosis
because no other encoding really has the same modified Huffman signature.
I don't know why on earth we would want to put ourselves in the same 
kind of position
as the Japanese have with text: they have a couple of alternate mappings 
in some
vendors' versions of various encodings which adds complication.[1]  Why 
would we
want to get a similar situation?

Cheers
Rick Jelliffe

[1] http://www.w3.org/TR/2000/NOTE-japanese-xml-20000414/




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.