[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Character Range: surrogate blocks

  • From: John Cowan <cowan@l...>
  • To: xml-dev@i...
  • Date: Sat, 17 Oct 1998 18:57:26 -0400 (EDT)

range of character
Richard Emberson scripsit:

> To extend the available characters in Unicode one
> can use to 16 bit characters with surrogate blocks.

Well, no.  One uses the 16-bit *codes* in the surrogate blocks.
They aren't *characters* (what in Unicode is called *abstract characters*)
at all.

> Now in production rule #2 titled Character Range 
> surrogate blocks are explicitly excluded (along 
> with FFFF and FFFE). 

Correct, because these codes do not represent characters.

> Does that mean that if one were reading a character
> stream that included characters not in the basic 
> set of Unicode characters (those not using surrogate
> blocks) that it would be a wellformedness violation?

I don't understand this question.  Is there an extra "not" somewhere?

> There are the extra, beyond 16-bit, characters specified
> by the spec in production rule #2 as "[x10000-#x10FFFF]".
> Is this how Unicode characters that use the surrogate
> blocks get represented in an XML document?

In UTF-16 (= Unicode) representation, yes.  In UTF-8 representation
they are represented as the appropriate 4-byte sequences.

> Is there
> an algorithm for the convertions defined somewhere? 

Same as Unicode or ISO 10646 UTF-16, namely: a codepoint in the
range D800-DBFF followed by one in the range DC00-DFFF represents
the character whose code is (first-D800)*400+(second-DC00)+10000
(hex arithmetic).

> Short of getting a copy of the Unicode 2.0 spec, is there 
> anywhere where the conversion algorithm is documented?

http://www.cm.spyglass.com/unicode/standard/wg2n1035.html#x10

> Why was it decided to exclude the uses of surrogate 
> block-base Unicode characters within XML documents?

What is excluded is surrogate characters appearing in unpaired form.
These could be generated, e.g. by UTF-8 ED A0 80 = U+D800.

-- 
John Cowan					cowan@c...
		e'osai ko sarji la lojban.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.