Re: ANN: Gorille 0.3
From: "Richard Tobin" <richard@c...> > >The nasty fact is that > >I suspect many Java application programmers will end up > >simply blowing off non-BMP text either through ignorance > >or based on a decision that it's not cost-effective. > > It depends what they want to do with it. Won't they just end up > passing it through as pairs of surrogates? And also, do surrogate pairs really introduce any issues that are not already present in combining character sequences? I have been going through this recently for our markup editor. For the first version, we have decided to not-barf-but-not- provide-support-for combining character sequences or surrogates, because the 1 Java char = 1 glyph assumption makes life very easy. Using IBM's Internationalization Classes for Unicode (bulk kudos to Mark Davis), it is quite straightforward to add normalization to data import and character entry in an interactive application. This means that your application uses combined characters where they are available rather than combining character sequences. For most Western Latin languages, Unicode provides pre-combined characters: enough even to support Vietnamese with multiple levels of accent. The other issue here is that 1 Java char = 1 glyph assumption does not imply that every character is the same width: if you support proportional width characters you can still support Chinese and Japanese. The W3C I18n WG has a new version of their "Character Model for the WWW" at http://www.w3.org/TR/ which is looking pretty good. It is really well written and anyone who wants to get a grip on internationalization or character issues should find it a good place to start. Cheers Rick Jelliffe Topologi Pty. Ltd.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format