[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Unicode normalization in XML 1.1


normalization and xml
Lars Marius Garshol scripsit:

>  - clearly, documents that are not normalized are still well-formed,
>    so if the application is to have any guarantees here the processor
>    must do normalization before passing on the information,

Not so.  A processor in normalization-check mode will report non-normalized
input, so the application may make up its mind whether or not to accept it.

>  - the text says that "XML processors must not transform the input to
>    be in fully normalized form." This seems to say that processors are
>    not allowed to do the transformation.

Correct.

> Wouldn't it be far better if the application could be certain
> that an XML 1.1 processor would provide normalized character data and
> to ignore the whole issue of how the document was encoded? After all,
> isn't the whole purpose of *having* XML parsers to insulate
> applications from worries about the lexical details of documents?

The point is that normalization is expensive, and it may be too expensive
to do at all in small systems.  Therefore, the W3C's choice (expressed
in the Character Model) is to have senders normalize, and receivers check
for normalization.  In this way documents are normalized once at creation
(or publication) time, rather than every time a document is received; this
conserves net-wide cycles, since checking is cheaper than normalizing.

> In other words, why not rewrite this so that processors are required
> to normalize character data? 

Forcibly normalizing incoming documents can spoof signature schemes, and
can also render documents well-formed that were not well-formed before
(e.g. if a start-tag uses A WITH ACUTE and the end-tag uses A followed
by COMBINING ACUTE).  http://www.w3.org/TR/charmod/#sec-Normalization
goes into more detail.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@c...
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.