[Home] [By Thread] [By Date] [Recent Entries]


> I just suspect the point's worth making a little more strongly, as so
> many of us have been brainwashed to think Java char=Unicode character. 
> Surrogate pairs whacked me a lot harder over the head than I thought,
> and Java doesn't seem to take note.

True for most folk.  XML made me get my hands dirty with
I18N stuff, and that one took a while for me to grok.  I don't
think it'll be intuitive to most folk, who've rarely had to look
at such I18N issues.


> > Point is that anyone working at the "character" level MUST
> > NOT ASSUME that such characters consist only of a single
> > Java "char" value.  And that'd be true even if "char" were
> > to make an incompatible change, and acquire a few extra
> > bits at the left so that surrogates could in some cases be
> > eliminated.
> 
> So could the paragraph above appear in the documentation somewhere?  I
> think that would take of all my concerns.

Yes, I was thinking of doing that.  After I imbibe the other thread
a bit more deeply, to make sure I pick up any other details.  That
should make it into the SAX2 r2 ContentHandler docs, and maybe
also LexicalHandler.comment() if I get ambitious.

- Dave



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member