[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Blueberry/Unicode/XML

  • From: Tim Bray <tbray@t...>
  • To: xml-dev@l...
  • Date: Mon, 09 Jul 2001 21:33:12 -0700

unicode xml
Boy, this one's tough.  I buy neither Elliote's assertion that
changing XML is unthinkable, nor John Cowan's assertion that the
depth of the cultural affront to users of pre-Unicode-3.1 
languages is so high as to outweigh consideration of cost.

I just went and reviewed the Blueberry requirements at
http://www.w3.org/TR/xml-blueberry-req and I'm not very comfy
with them.  There is repeated and specific reference to the
problem being that posed by Unicode 3.1.  The problem isn't
3.1, it's that Unicode is an unfinished standard that
continues to grow actively, whereas it would be nice if
we could declare XML syntax finished and go back to our
plows.

XML 1.0 took a design decision in favor of enumeration of 
name characters, simply because the alternative - outsourcing 
the problem to the Unicode/ISO10646 process - had two 
problems:

(a) We didn't know them well enough to trust them, and
(b) writing a satisfying set of rules for XML name chars
    based solely on Unicode metadata is pretty hard.

The force of argument (b) is unabated.  (a) seems less of
a worry now simply because the Unicode and XML gangs have 
gotten pretty comfy with each other.  But I do have a worry
at the back of my mind whether the W3C *institutionally* 
ought to trust the consortium *institutionally* with 
something of this magnitude.  And what happens of ISO and
Unicode stop getting along one of these centuries, whose
side is XML on?

A few weeks ago, I was in favor of leaving it the way it
is, but only by about 55-45.  I found the most convincing
argument on the other side was the person who postulated
a Khmer user typing away in emacs and having a disconnect
because there are lots of characters they can use for 
people's names but not as attribute names.  On the other
hand, this problem is not unique to Khmer - just ask 
Mr. O'Hara.

And the notion of having a single monolithic XML whose
interoperability, while not perfect, is pretty $#!%* good,
partially based on those unwieldy character-class 
productions, is something that it will hurt to lose.  And
it is a reasonable position to say "The markup name character 
class snapshot was based on Unicode 2.0, sorry 'bout that."

Realistically, there are 3 options:

1. Leave it the way it is.
2. Do Blueberry and then repeat the process for Unicode 3.2
   and 4.0 and so on every couple of years forever.
3. Bite the bullet, write the rules in terms of Unicode
   metadata and go to a pure use-by-reference architecture,
   probably adding a syntactic signal to reference the
   Unicode version number.

I think (3.) will prove to be really hard to do well - and 
then the Unicode metadata fields might get changed and screw
it all up.  I think (2.) is not unreasonable, but has the 
institutional disadvantage that the XML standardization effort 
has to become an ongoing process ad infinitum.  

I still go for (1.).  My opposition to NEL has hardened,
because of a strong fear that this one will cause real 
wreckage on a widespread basis, not just in linguistic
corner cases.

But I really can't see how anyone can get behind any of 
these positions and feel entirely comfortable with where
they find themselves standing.  I sure don't. -Tim


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.