[Home] [By Thread] [By Date] [Recent Entries]
Tim Bray wrote: > > So I think it would be appropriate, in this discussion, > to have some people in the mainframe trenches give us > a briefing on the scale and the difficulty of the problems > they face, and for some of our i18n gurus to highlight > the problems faced by an XML language designer who wants > to use one of the newly-added languages. CR, LF and NEL are not the only space characters in Unicode. I can't say I'm an i18n expert, and it's been a while since I've touched a mainframe terminal, but when I did the software I wrote spoke Hebrew to its users. Now Hebrew is written right to left. Of course, Latin characters or digits may be found in Hebrew documents, and are written left to right. There is an elaborate algorithm to determine whether a particular character should go to the right or left of the preceding one, the so called bi-di algorithm. But there are cases where this algorithm is non deterministic, and so special characters were introduced in Unicode -- right-to-left space and left-to-right space. Why not add these two to the S production for the sake of Hebrew and Arabic users? There's no end to what can be regarded as whitespace. Ari.
|

Cart



