[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Need to remove unusual character in source

Subject: Re: Need to remove unusual character in source
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 27 Sep 2006 20:12:11 +0200
utf 8 remove control characters
David Carlisle wrote:
Unfortunately, that says it all. Control characters are not allowed in UTF-8 and as a result, are not allowed in XML, when the encoding is UTF-8 (making XML not well-formed)

Not so, utf8 can encode control characters, but they are not allowed in XML 1.0 (whatever the encoding)

David

Colin Adams wrote:
Unfortunately, that says it all. Control characters are not allowed in UTF-8 and as a result,

Oh yes they are!

You are all so alert! Like I said to Florent earlier today: I shouldn't post too late anymore. Yet, reading these posts, I had to look it up to find out the details, just of curiosity. From Unicode Standard 4.0 (I know, XML requires at least v3.1), it says in chapter 15.1, and I quote:


"There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes [....] U+0000 - U+001F, U+007F, U+0080 - U+009F."

Reading on reveals that when you use UTF-8, they will be represented as their hexadecimal value <03> for x03 etc, padded with one NUL for UTF-16 and thre NULs in UTF-32. Meaning that the hexadecimal appearance of x08 indeed is legal in UTF-8 (note that for the higher range, UTF-8 will encode to a two-byte sequence).

Thanks for pointing me to this.

Cheers,

-- Abel Braaksma
  http://abelleba.metacarpus.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Cast Your Vote

We need your help – Vote for DataDirect XML Products!

  • Best SOA or XML site

Winners and finalists announced at SOA World Conference in November.

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.