Tim Bray wrote: > jcowan wrote: > > C and C++ on the Windows platform *are* UTF-16 centric. If you put > > a Gothic character into a "..."L string, for example > > So you're saying that it would be satisfactory for genx to infer that if > > sizeof(wchar_t) == 2 > > then the values are UTF16 coded units? -Tim I'd say that depends on what degree of portability you're after, and whether or not you use any of the wcs* or mb* standard library routines. If you want it to be strictly-conforming C, that's *not* a safe assumption. If OTOH you only need it to be portable to a plurality of relatively modern, not-too-badly-braindamaged systems, it's probably OK. More specifically: if sizeof(wchar_t) == 2 and NBBY == 8, then you can safely assume that a wchar_t can hold a UCS-16 code point. You should *not* assume that the compiler and C standard library will interpret them as such. Nor should you assume that the compiler and C standard library will interpret multibyte sequences as UTF-8 (many don't). You should *definitely* not assume that wchar_t's are UTF16 coded units: any implementation that does so is just plain wrong -- UTF-16 is a variable-width encoding (unless you restrict it to the BMP, in which case it's the same as UCS-16). --Joe English jenglish@f...
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format