[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode normalization in XML 1.1
From: "Michael Kay" <michael.h.kay@n...> > While this policy makes sense, its translation into rules for software > components is unfortunately full of absurdities. The fact that the > character model [1] bans text processing software from doing > normalization [2] means that senders are going to have a tough job > meeting the requirement to normalize the text, because they won't be > able to find any text processing software that does the job for them. > [1] http://www.w3.org/TR/charmod/ > [2] Section 4.4: "A text processing component .... must not normalize suspect text". When reading charmod, you must keep the context in mind: it is written for "for interoperable text manipulation on the World Wide Web". Note that the spec talks of Web components: servers, proxies, clients, etc. Not text processing applications in general. In other words, Charmod may easily not apply to * systems of private exchange (where interoperability can be defined by agreement rather than policy) * intranets (where you are not on the World Wide Web) * processing documents locally on a machine (again you are not on the World Wide Web.) For example, my company's editor normalizes all text coming in. But I do not believe that goes against Charmod. Indeed, it is one way to insure early uniform normalization. What Charmod does is to say that it is sender of data's job to make sure their data is uniform: the receiver should not have to worry. Rather than considering it an added burden on senders, we can think of it as letting recipients off the hook: you don't have to normalize both your inputs and output, just your outputs. And, if you are a sender, you can largely do this by selecting an output encoding that only has normalized characters. To implement it, you might put a switch in SAXON for "server mode" which switches off any normalization. The other thing is that really normalization should be performed by transcoders automatically: one effect of Charmod will be that transcoder developers can be expected to move over to generating normalized characters rather than trying to round-trip combining characters. That will reduce early normalization largely to an issue of warning generation. Cheers Rick Jelliffe (Invited expert, W3C I18n IG, views my own)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|