|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: control characters
Out of context, you are right. However, it is still using something already allocated, because 0x0080-0x009F are different control characters than 0x0000-0x001F. And, by the way, in some applications characters like "break-permitted-here" or "no-break-here" should be even more popular than national encodings. On the other hand in this sense my note was also out of context. Within the context of original question, original proposal to use private use range sounds the correct one. > -----Original Message----- > From: John Cowan [mailto:jcowan@r...] > Sent: Wednesday, June 21, 2000 12:53 PM > To: Eldar Musayev; xml-dev@x... > Subject: Re: control characters > > > Eldar Musayev wrote: > > > In a case you may be interested: there is a lot of > charsets/encodings using > > this range as well. > > Encodings using the *bytes* 0x7F to 0x9F aren't the issue. > What counts here > is the Unicode *characters* U+007F to U+009F, which are > solely the control > characters. > > E.g. Win1252 uses 0x80 to encode EURO SIGN, but the > corresponding Unicode > character is U+20AC, which is what counts for XML. > ...quote > >The workaround I usually suggest is to represent control characters >with (references to) characters from the Unicode private use range. >This makes the necessary transformation a simple character >substitution (which can even be just a subtraction - no need for a >table). > > -- Richard Actually, as someone has already pointed out, 0x007F - 0x009F are fair game for XML documents, and Unicode has these defined as control character aliases. Mapping 0x0000 - 0x001F to the private use area sounds like the "correct" unicode thing to do, But for US-ASCII/UTF-8 documents I would map to 0x0080 - 0x009F instead. This way you preserve the deprecated anglo centric english-only bigoted assumption of 1 character == 1 byte. The only downside is that someone might actually have data in this range. I think this is about as likely as someone having data in the private use area. XSLT will not _ALWAYS_ give you a perfect output format. XML --> XSLT --> simple_text_filter seems like a win to me. *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/ ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








