[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Genx


utf 16.so
Tim Bray scripsit:

> Almost.  How about we leave it as wchar_t, but *not* UTF-16, so a value  
> that's in a surrogate block is an error.  Then we change the name from  
> codePoint (which could be interpreted as meaning "UTF-16 Code Point" to  
> something more explicit like
> 
> numericValueCorrespondingToAUnicodeCharacterAsInUPlusFourHexDigitsIsThat 
> Clear
> 
> John Cowan has suggested that "codeUnit" might be a good name, I'd be  
> inclined to "uniChar", any other ideas?

I must have unintentionally misled you.  A "code point" is an integer
in the range 0-0x10FFFF; Unicode maps characters to code points.  "Code
units" are chunks o' bits:  UTF-8, UTF-16, and UTF-32 map code points to
8-bit code units, 16-bit code units, and 32-bit code units respectively.
"UTF-16 code point" is a contradiction in terms.

However, on reflection I think that the Right Thing is to use
wchar_t directly in the API, since the whole point of using it is for
compatibility with other wchar_t-aware routines, either standardized
or platform-specific.  There is no point in hiding it behind a type name.
(As I said, if your platform has 8-bit wchar_t's, you deserve to lose.)

> If someone wants to put a generic UTF-16 processor on top of genx, that  
> would be fine.  I don't see the demand for supporting it at the input  
> end of genx because the UTF-16 centric languages like Java and C# have  
> decent xml-writing software already. -Tim

C and C++ on the Windows platform *are* UTF-16 centric.  If you put
a Gothic character into a "..."L string, for example, it will produce
a string which is three wchar_t's long on Windows, whereas on Unix it
will be two wchar_t long (including the trailing 0 in both cases).  As I
said, the additional code for converting UTF-16 (as opposed to UTF-32)
into UTF-8 is very small, and can be conditionalized on sizeof(wchar_t).

-- 
As you read this, I don't want you to feel      John Cowan 
sorry for me, because, I believe everyone       jcowan@r...
will die someday.    -- From a Nigerian-type    http://www.reutershealth.com
                        scam spam I got         http://www.ccil.org/~cowan

  • Follow-Ups:
    • Re: Genx
      • From: Joe English <jenglish@f...>
    • Re: Genx
      • From: Tim Bray <tbray@t...>
  • References:
    • Genx
      • From: Tim Bray <tbray@t...>
    • Re: Genx
      • From: Joe English <jenglish@f...>
    • Re: Genx
      • From: jcowan@r...
    • Re: Genx
      • From: Tim Bray <tbray@t...>

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.