[Home] [By Thread] [By Date] [Recent Entries]


> The point is that normalization is expensive, and it may be 
> too expensive to do at all in small systems.  Therefore, the 
> W3C's choice (expressed in the Character Model) is to have 
> senders normalize, and receivers check for normalization.  In 
> this way documents are normalized once at creation (or 
> publication) time, rather than every time a document is 
> received; this conserves net-wide cycles, since checking is 
> cheaper than normalizing.

While this policy makes sense, its translation into rules for software
components is unfortunately full of absurdities. The fact that the
character model [1] bans text processing software from doing
normalization [2] means that senders are going to have a tough job
meeting the requirement to normalize the text, because they won't be
able to find any text processing software that does the job for them.


[1] http://www.w3.org/TR/charmod/

[2] Section 4.4: "A text processing component .... must not normalize
suspect text".

Michael Kay


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member