[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Supporting Unicode (was Some comments on the 1.1 draft)

  • To: xml-dev@l...
  • Subject: RE: Supporting Unicode (was Some comments on the 1.1 draft)
  • From: Rob Griffin <Rob.Griffin@o...>
  • Date: Thu, 20 Dec 2001 16:59:55 +1100
  • Cc: Rick Jelliffe <ricko@a...>

unicode control characters
Title: RE: Supporting Unicode (was Some comments on the 1.1 draft)

Wasn't one of the design goals of XML to be human readable?

How do I do that? I display the document on my screen,
or I print it out. Surely having the least number of control
characters in the document makes that more readily achieveable.
I don't want to have to use a hex editor to see the 'real'
contents of a document. Nor have my printer go ballistic
or print blocks in place of control characters.

If serializers want to use XML then they should obey the
rules. OK, it's less efficient to have to encode all the
control characters as character references, but after all
they chose a relatively inefficent format in the first

Rob Griffin
Quest Software

E-mail: Rob.Griffin@o...
Web site: http://www.quest.com 

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@a...]
> Sent: Thursday, December 20, 2001 4:23 PM
> To: xml-dev@l...
> Subject: Supporting Unicode (was Some comments on the 1.1
> draft)
> From John Cowen:
> > However, the control characters are *characters*, not really very
> > different from other control characters in the Unicode space
> > which are already allowed: not only the ISO C1 controls, but
> > also such things as: the Mongolian variant controls (and the
> > Unicode 3.2 generic variant controls); the bidi marks, overrides,
> > etc; and the music symbol begins/ends.
> The Unicode recommendations w.r.t. control characters are in
>  http://www.unicode.org/unicode/uni2book/ch13.pdf
> That makes it clear that control characters are unlike other
> characters,
> for which Unicode provides "semantics". The only C0 or C1
> characters for
> which Unicode provides "semantics" are TAB, CR, LF and NEL.
> Unicode completely defers the use and semantics of the other control
> characters to whatever makes sense for the application in question.
> There is no justification for saying "we need to support the
> C0 and C1
> characters in order to support Unicode" because Unicode does not
> require any such thing.  
> But what if we do decide to support these control characters:
> what does
> it mean?  It means that we recognize their semantics,
> according to which
> it is inappropriate to embed most of them (e.g. EOF, BS,
> BELL, flow control,
> etc) in a text file for transmission anyway.  
> Cheers
> Rick Jelliffe


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.