[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: control characters

  • From: Eldar Musayev <eldarm@m...>
  • To: xml-dev@x...
  • Date: Wed, 21 Jun 2000 16:47:13 -0700

0x009f
Out of context, you are right. However, it is still using something already
allocated, because 0x0080-0x009F are different control characters than
0x0000-0x001F. And, by the way, in some applications characters like
"break-permitted-here" or "no-break-here" should be even more popular than
national encodings.
On the other hand in this sense my note was also out of context. Within the
context of original question, original proposal to use private use range
sounds the correct one.

 > -----Original Message-----
 > From: John Cowan [mailto:jcowan@r...]
 > Sent: Wednesday, June 21, 2000 12:53 PM
 > To: Eldar Musayev; xml-dev@x...
 > Subject: Re: control characters
 > 
 > 
 > Eldar Musayev wrote:
 > 
 > > In a case you may be interested: there is a lot of 
 > charsets/encodings using
 > > this range as well.
 > 
 > Encodings using the *bytes* 0x7F to 0x9F aren't the issue.  
 > What counts here
 > is the Unicode *characters* U+007F to U+009F, which are 
 > solely the control
 > characters.
 > 
 > E.g. Win1252 uses 0x80 to encode EURO SIGN, but the 
 > corresponding Unicode
 > character is U+20AC, which is what counts for XML.
 > 
...quote

>
>The workaround I usually suggest is to represent control characters
>with (references to) characters from the Unicode private use range.
>This makes the necessary transformation a simple character
>substitution (which can even be just a subtraction - no need for a
>table).
>
>  -- Richard

Actually, as someone has already pointed out, 0x007F - 0x009F are fair game 
for XML documents, and Unicode has these defined as control character 
aliases.

Mapping 0x0000 - 0x001F to the private use area sounds like the "correct" 
unicode thing to do, But for US-ASCII/UTF-8 documents I would map to 0x0080 
- 0x009F instead.
This way you preserve the deprecated anglo centric english-only bigoted 
assumption of 1 character == 1 byte.

The only downside is that someone might actually have data in this range. I 
think this is about as likely as someone having data in the private use 
area.

XSLT will not _ALWAYS_ give you a perfect output format.
XML --> XSLT --> simple_text_filter seems like a win to me.


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.