[Home] [By Thread] [By Date] [Recent Entries]
Elliotte Rusty Harold wrote: > It could be worse, though. You could be using C, and trying to decode > UTF-8. :-) ?? It's about 10 lines of code, and has been written lots of times now. Last time I needed it I couldn't find one with the exact buffer interface I needed so I coded it up from scratch sometime in the course of an afternoon and it worked first time. The spec is hardly unclear. And it's a set of shift/mask operations that are processor-friendly. You need to use a loop iterator rather than a for (i = 0; string[i]; i++) idiom, big deal. UTF8 only really causes extra work when you want per-character addressing into big strings, because then you need an indirect table - the most common case I can think of is maintaining on-screen render state. But in most apps it's more common to point into text at a few places (tags, word-starts, search matches) in which case you needed that indirect array anyhow. Conclusion: somewhat to my surprise, I find that for a lot of C tasks, you can keep your text in UTF-8 and work with it that way very efficiently. Elliote is right about the irritating fact that a Java "char" isn't an XML character. The nasty fact is that I suspect many Java application programmers will end up simply blowing off non-BMP text either through ignorance or based on a decision that it's not cost-effective. -Tim
|

Cart



