[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Unicode confusion
>> If anything, it should go the other way. Unicode should be the core >> API, and there should be helper API to allow the use of local code >> page chars where necessary. Everything should be set up to optimize >> use of the Unicode API, with local code page use paying the price, >> since Unicode is the more desireable format. > >No one's disagreeing with the use of Unicode; we're talking about >which character encoding we'll use to represent it. You can represent >Unicode in variable-width 8-bit or 16-bit encodings or in fixed-width >32-bit encodings. > >Note that Java uses UTF-16, which isn't quite fixed-width, though no >one really notices. > Our parser already adopts to whether the native wchar_t is 16 or 32 bits, though it still uses surrogates and stores 16 bit data points in the 32 bit values when its 32 bits. However, it could also pretty reasonably also adopt to not using surrogates if the local wchar_t is 32 bits. I guess it comes down to whatever the local system's wide character APIs expect. If it expects 32 bit values without surrogates, then it would be kind of necessary to give them that. If it expects 16 bit code points with surrogates, irregardless of the fact that the wchar_t is 32 bits perhaps, then it would best to give them that. Going this far would require some support in parsers that might not be common, but I think that we could do that reasonably in the Xerces/XML4C stuff without too much pulling out of hair or added complexity. The internalization of text into the local format is pretty constrained. The big iss though is that you are kind of dependent upon what transcoding package you use. For those incodings that we handle intrinsically, we could do this well enough. But we allow each platform to use its own transcoding mechanism if they choose to, and they probably are going to support one scheme or the other. Hopefully they would support the local scheme, but you could also choose to use some portable package such as ICU which is going to do one thing. So, perhaps the question is: Are there any systems out there which use 32 bit wchar_t *and* expect that surrogates will not be used? ---------------------------------------- Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley roddey@u... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|