[Home] [By Thread] [By Date] [Recent Entries]
David Carlisle wrote:
Unfortunately, that says it all. Control characters are not allowed in UTF-8 and as a result, are not allowed in XML, when the encoding is UTF-8 (making XML not well-formed) Colin Adams wrote: Unfortunately, that says it all. Control characters are not allowed in UTF-8 and as a result, You are all so alert! Like I said to Florent earlier today: I shouldn't post too late anymore. Yet, reading these posts, I had to look it up to find out the details, just of curiosity. From Unicode Standard 4.0 (I know, XML requires at least v3.1), it says in chapter 15.1, and I quote: "There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes [....] U+0000 - U+001F, U+007F, U+0080 - U+009F." Reading on reveals that when you use UTF-8, they will be represented as their hexadecimal value <03> for x03 etc, padded with one NUL for UTF-16 and thre NULs in UTF-32. Meaning that the hexadecimal appearance of x08 indeed is legal in UTF-8 (note that for the higher range, UTF-8 will encode to a two-byte sequence). Thanks for pointing me to this. Cheers, -- Abel Braaksma http://abelleba.metacarpus.com
|

Cart



