[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Well-formed Blueberry

  • From: Elliotte Rusty Harold <elharo@m...>
  • To: xml-dev@l..., www-xml-blueberry-comments@w...
  • Date: Sun, 15 Jul 2001 11:28:08 -0400

utf 16 well formed
Some more thoughts on requirements for well-formed Blueberry documents:

If the need for streaming documents makes it seem too problematic that only documents that use Blueberry name characters be allowed to carry a Blueberry declaration, then I propose a weaker alternative:

Only documents whose encoding declaration explicitly declares a character set which can include Blueberry characters is allowed to have a Blueberry declaration. e.g.

<?xml version="1.1" encoding="ISO-8859-1"?>

would be malformed. However, these would be well-formed:

<?xml version="1.1" encoding="UTF-8"?>
<?xml version="1.1" encoding="UTF-16"?>
<?xml version="1.1" encoding="UCS-4"?>

(I'm just using version="1.1" here to make my point. The details are not affected by what the Blueberry declaration eventually looks like.)

I further propose that the encoding declaration must be explicit. That is, this is malformed even though the default character set is UTF-8:

<?xml version="1.1"?>

My logic is that many authors just write this when what they really mean is encoding="US-ASCII". I do not think requiring Blueberry documents to 
explicitly specify UTF-8 is an onerous burden. Note that this does not change the default character set for  <?xml version="1.1"?> which would still be UTF-8. 

There are not that many encodings that can handle the Blueberry characters, basically just several variants of Unicode, one Japanese character set, and possibly a couple of Chinese character sets. Most of the scripts that are at issue here (Amharic, Khmer, Burmese, etc.) have never had a standard encoding prior to Unicode. Indeed that is exactly the reason it took until Unicode 3.0 to decide how to encode them. It was not possible to simply transpose an existing national character set. There have been numerous proposals for alternative encodings of Unicode lately, but all of them have been shot down with extreme hostility by the Unicode consortium. Thus I do not think it would be a huge problem to enumerate all the encodings anybody is likely to want for Blueberry characters. Certainly, any new encodings that do arise in the future should be round-tripabble to standard Unicode encodings. 
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@m... | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.