On Jan 21, 2004, at 11:57 AM, jcowan@r... wrote: >> The 'codePoint' typedef may be problematic: >> >> // Unicode code points (4-byte int on most systems) >> typedef wchar_t codePoint; >> >> ... > I have argued privately that wchar_t is in fact the Right Thing here > despite its variability in size (UTF-32 on Unix platforms, UTF-16 on > Windows), because it makes genx compatible with both standardized and > non-standardized facilities, most especially "..."L strings. Some > conditional logic will be needed to interpret the input as UTF-16 or > UTF-32, which can be based on sizeof(wchar_t). Hypothetical platforms > where sizeof(wchar_t) == 1 can be neglected. Almost. How about we leave it as wchar_t, but *not* UTF-16, so a value that's in a surrogate block is an error. Then we change the name from codePoint (which could be interpreted as meaning "UTF-16 Code Point" to something more explicit like numericValueCorrespondingToAUnicodeCharacterAsInUPlusFourHexDigitsIsThat Clear John Cowan has suggested that "codeUnit" might be a good name, I'd be inclined to "uniChar", any other ideas? If someone wants to put a generic UTF-16 processor on top of genx, that would be fine. I don't see the demand for supporting it at the input end of genx because the UTF-16 centric languages like Java and C# have decent xml-writing software already. -Tim
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format