[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Character Encoding and the XML PR (was Re: PR.xml)

  • From: David Megginson <ak117@f...>
  • To: xml-dev@i...
  • Date: Fri, 16 Jan 1998 11:38:21 -0500

for xml explicit encode
Peter Murray-Rust writes:

 > Thanks. I am also aware of it now :-).  Can I make the assumption that:
 > 
 > 	- ISO-8859-1 and UTF-8 look identical to not-very-experienced humans.

They look identical to most English speakers, but differ in their
treatment of accented characters (> 0x7f), so French and German
speakers probably notice.

 > 	- in principle I should be able to sort this by adding something like
 > 
 > <?xml version="1.0" encoding="ISO-8859-1"?>
 > 	to the top of the document

Correct.  The other alternative is to configure your web server to
send the encoding ISO-8859-1 in the HTTP header for this document if
the text/xml MIME type is approved, but the problem will reappear if
you download the file and the parse it on your own system.

 > 	- in practice this fails because by the time it gets to the encoding
 > declaration it has already assumed the encoding is UTF-8 and has crashed :-)

It should not fail with AElfred -- I just downloaded the PR and added
your XML declaration to the top, and AElfred reported no errors.  

In fact, the XML declaration is guaranteed to use only ASCII
characters, which are the same in UTF-8 and ISO-8859-*.  AElfred is
very careful not to try to read too far until the document until it
has discovered whether there is an explicit encoding declaration.

 > I am not quite clear why we need this problem. Do different tools emit
 > different encodings? If so, what should I work with?. Can I convert this
 > document? 

ISO-8859-1, which is used for most web pages, contains characters only
for Western European languages.  UTF-8 can encode any Unicode
characters up to 0xff (and a little higher with surrogates), so it can
handle Kanji, Han Chinese, Arabic, etc.  The PR rightly specifies that
any entity that begins with neither an encoding declaration nor a
byte-order mark (for UCS-2) should be assumed to be encoded in UTF-8.

Conversion should be fairly simple -- take a look at the AElfred
source to see how the different encodings are constructed.  Just for
the record, AElfred accepts the following encodings, and to my
knowledge, supports them completely and correctly to the extent
allowed by Java's 16-bit characters and by surrogates:

- UTF-8
- ISO-10646-UCS-2 (both byte orders)
- ISO-10646-UCS-4 (four byte orders)
- UTF-16
- ISO-8859-1


All the best,


David

-- 
David Megginson                 ak117@f...
Microstar Software Ltd.         dmeggins@m...
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


  • References:
    • PR.xml
      • From: Peter Murray-Rust <peter@u...>
    • PR.xml
      • From: David Megginson <ak117@f...>
    • Re: PR.xml
      • From: Peter Murray-Rust <peter@u...>

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.