[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Unicode normalization in XML 1.1
Lars Marius Garshol scripsit: > - clearly, documents that are not normalized are still well-formed, > so if the application is to have any guarantees here the processor > must do normalization before passing on the information, Not so. A processor in normalization-check mode will report non-normalized input, so the application may make up its mind whether or not to accept it. > - the text says that "XML processors must not transform the input to > be in fully normalized form." This seems to say that processors are > not allowed to do the transformation. Correct. > Wouldn't it be far better if the application could be certain > that an XML 1.1 processor would provide normalized character data and > to ignore the whole issue of how the document was encoded? After all, > isn't the whole purpose of *having* XML parsers to insulate > applications from worries about the lexical details of documents? The point is that normalization is expensive, and it may be too expensive to do at all in small systems. Therefore, the W3C's choice (expressed in the Character Model) is to have senders normalize, and receivers check for normalization. In this way documents are normalized once at creation (or publication) time, rather than every time a document is received; this conserves net-wide cycles, since checking is cheaper than normalizing. > In other words, why not rewrite this so that processors are required > to normalize character data? Forcibly normalizing incoming documents can spoof signature schemes, and can also render documents well-formed that were not well-formed before (e.g. if a start-tag uses A WITH ACUTE and the end-tag uses A followed by COMBINING ACUTE). http://www.w3.org/TR/charmod/#sec-Normalization goes into more detail. -- John Cowan http://www.ccil.org/~cowan cowan@c... To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|