- From: "Michael Kay" <mike@s...>
- To: "'Greg Hunt'" <greg@f...>
- Date: Thu, 18 Mar 2010 23:29:45 -0000
hex 1a is not a legal character in XML 1.0. It is legal in XML
1.1 provided it is written as a character reference, & # x 1
a;
I'm not sure offhand about 65533. You can look it up as
easily as I can.
Regards,
Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
Is a substitution character (x'1a' in many single byte character
sets or 65533 in UTF-8) a legal character? I have a case where x'1a'
appears not be to legal in a document with an encoding specified as
ISO-8859-1.
On Fri, Mar 19, 2010 at 6:01 AM, Michael Kay <mike@s...> wrote:
It's not well-formed.
From the
XML 1.0 spec [1]: "It is a fatal error if an XML entity is
determined (via default, encoding declaration, or higher-level protocol) to
be in a certain encoding but contains byte sequences that are not legal in
that encoding."
Unless
of course there is a "higher-level protocol" that tells you it's really a
different encoding. (The term higher-level protocol is not really
defined. I think they had in mind the media-type from the HTTP content
header. In terms of the protocol stack, that of course is a lower-level
protocol. But it's sufficiently woolly that a phone call from the sender to
say "Oops, I meant EBCDIC" would be enough to make the document
well-formed.
Regards,
Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|