[Home] [By Thread] [By Date] [Recent Entries]
On Thu, 2002-01-10 at 16:24, David Brownell wrote: > > It might be worth noting the current discussion on xml-dev (or content > > thereof) regarding surrogate pairs, as SAX relies on the Java char and > > String constructs throughout. > > I'll catch up on that, but my advice on that point is unlikely to > change. As I've pointed out in an upcoming O'Reilly book > (you might have heard about it, called "SAX2" ... ;-) surrogate > pairs aren't the only place that a Java "char" doesn't match > a "character" ... there are also composed characters to > worry about, even in the absence of surrogate pairs. Sure thing, all advertising for our joint projects aside... I just suspect the point's worth making a little more strongly, as so many of us have been brainwashed to think Java char=Unicode character. Surrogate pairs whacked me a lot harder over the head than I thought, and Java doesn't seem to take note. > Point is that anyone working at the "character" level MUST > NOT ASSUME that such characters consist only of a single > Java "char" value. And that'd be true even if "char" were > to make an incompatible change, and acquire a few extra > bits at the left so that surrogates could in some cases be > eliminated. So could the paragraph above appear in the documentation somewhere? I think that would take of all my concerns. -- Simon St.Laurent Ring around the content, a pocket full of brackets Errors, errors, all fall down! http://simonstl.com
|

Cart



