[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Char & Java implementation

  • From: Richard Tobin <richard@c...>
  • To: Jeni Tennison <jft@P...>, xml-dev@i...
  • Date: Wed, 4 Mar 1998 10:52:48 GMT

char in java
> [2]  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF]
>               | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
>                                   ^^^^^^^^^^^^^^^^^^
> 
> Am I right in thinking that, since the indicated characters are longer than
> 16 bits, they can't be represented in Java with the char data type, and int
> must be used instead?

The answer to this explains the otherwise mysterious missing range
D800 to DFFF.  These 2 * 2^10 missing characters can be used in pairs
to represent the first 2^20 characters above FFFF.  The character
10000 + x is represented by the pair D800 + (x >> 10), DC00 + (x & 3FF).

Since none of the characters above FFFF are name characters, they are
irrelevant to the syntax of XML, and you don't need to convert the
pairs of "surrogates" into the characters they represent - you can
just pass them through to the application.

So you can treat the range of legal characters as being 9,A,D,20-FFFD.

There are a few things you have to take account of:

- the surrogates must appear in pairs in the input, one in the range
  D800-DBFF followed by one in the range DC00-DFFF

- if a character entity refers to a character in the range 10000-10FFFF
  it should be converted to a pair of surrogates before it is passed to
  the application

- a character entity must not expand to a character in the surrogate
  range D800-DFFF.

I think, but I'm not certain, that this encoding only applies to UTF-16
and not UCS-2 (which would mean that the surrogate characters are an
error if encountered in a UCS-2 stream).  Can anyone confirm/deny this?

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.