[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Michael Kay" <mike@s...>
  • To: "'Greg Hunt'" <greg@f...>
  • Date: Thu, 18 Mar 2010 23:29:45 -0000

hex 1a is not a legal character in XML 1.0. It is legal in XML 1.1 provided it is written as a character reference, & # x 1 a;
 
I'm not sure offhand about 65533. You can look it up as easily as I can.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay



From: Greg Hunt [mailto:greg@f...]
Sent: 18 March 2010 22:55
To: Michael Kay
Cc: xml-dev@l...
Subject: Re: Is it a well-formedness error to use a character not in the encoding specified by the XML declaration?

Is a substitution character (x'1a' in many single byte character sets or 65533 in UTF-8) a legal character?  I have a case where x'1a' appears not be to legal in a document with an encoding specified as ISO-8859-1. 

On Fri, Mar 19, 2010 at 6:01 AM, Michael Kay <mike@s...> wrote:
It's not well-formed.

From the XML 1.0 spec [1]:
"It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding."

 
Unless of course there is a "higher-level protocol" that tells you it's really a different encoding. (The term higher-level protocol is not really defined. I think they had in mind the media-type from the HTTP content header. In terms of the protocol stack, that of course is a lower-level protocol. But it's sufficiently woolly that a phone call from the sender to say "Oops, I meant EBCDIC" would be enough to make the document well-formed.
 

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member