[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode surrogate block in XML?

  • From: "Paul W. Abrahams" <abrahams@v...>
  • To: XMLDev list <xml-dev@i...>
  • Date: Fri, 17 Sep 1999 22:16:29 -0400

xd800
Tony Graham (tgraham@m...)
Fri, 17 Sep 1999 01:15:51 -0400 (EST)

>> In any XML document, you can make numeric references to any Unicode

character in the range #x10000 to #x10FFFF (as well as to any other
legal character number).  These references are independent of the
encoding used in the XML document. <<

Is it really correct to refer to #x10FFFF, say, as a Unicode
character, since Unicode characters are limited to 16 bits?  I'd think
it's necessary here to refer to that as a UCS-4 character.

>> The sequence of #xD800 #xDC00 is the two Surrogate code values that

address #x10000.  That four-byte sequence may occur in a UTF-16
encoded file to represent #x10000.  In contrast, "&#xD800;&#xDC00;" in

an XML document is two illegal character references in a row. <<

I've been trying to fathom the distinction between Unicode and UTF-16,
if there is one, and how these in turn relate to the UCS-2 encoding of
ISO 10646.  There's also the question of whether an XML document can
be stored directly in Unicode, or whether instead it must be stored in
either UTF-8 or UTF-16,  as Section 2.2 seems to imply when it says
``all XML processors must accept the UTF-8 and UTF-16 encodings of
10646''.   The latter appears to be the case; but if it isn't, then
how would an XML  document be stored directly in Unicode?   I've
pondered both Appendix C of the Unicode Standard and the relevant part
of the FAQ on the Unicode website, and I'm still unclear about all of
this.  (By the way, the FAQ erroneously refers to UTF as the Unicode
Transformation Format rather than the UCS transformation format.)

In any event, thanks, Tony, for your very enlightening response to my
original query.

Paul Abrahams



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.