[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Is it a well-formedness error to use a character notin th

  • From: Liam R E Quin <liam@w3.org>
  • To: Greg Hunt <greg@firmansyah.com>
  • Date: Thu, 18 Mar 2010 20:25:43 -0400

Re:  Is it a well-formedness error to use a character notin  th
On Fri, 2010-03-19 at 09:55 +1100, Greg Hunt wrote:
> Is a substitution character (x'1a' in many single byte character sets
> or 65533 in UTF-8) a legal character?  I have a case where x'1a'
> appears not be to legal in a document with an encoding specified as
> ISO-8859-1.

WHen the encoding is ISO 8859-1, individual bytes ("octets" as
standards people often say, in case someone starts making 9-bit
computers again), individual bytes are read by the xML parser,
and mapped from ISO 8859-1 into Unicode. Numerical character
references like &#x1a; are always taken as Unicode numbers.

Having said that, as others pointed out, 0x1a (decimal 26, ASCII SUB)
is never allowed in an XML document unquoted, and you can only use
&#26; or &#x1a; in XML 1.1 -- but since its meaning is not well-defined,
you should not do this.

The most common reason people think they want to do this :-) is that
they have in fact some other character set, such as one of the Windows
"code pages", using some of the characters between 0 and 32 for actual
characters, rather than as device control codes. In that case, you
need to set the encoding correctly, or to use a conversion utility
such as (on Linux) iconv.

The other thing that can happen is that an http server sends a
charset parameter e.g. of windows-1252, but the Web browser ignores
this, and does not pass it to its XML parser. The charser
parameter was originally (as Mike Kay mentioned) supposed to
override the encoding in the document, but this turns out to be
a disaster. For this reason, application/xml (which does not
allow an intermediate proxy to rewrite the data) is preferred
these days over text/xml for use with MIME in HTTP and email.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.