[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Some comments on the 1.1 draft


xml protect characters
On Wed, Dec 19, 2001 at 06:02:55PM +1100, Rick Jelliffe wrote:
> One big advantage of disallowing control characters from XML documents
> and silly characters from XML names is that it catches most common encoding
> errors.
> 
> For example, the very common problem of data labelled ISO 8859-1 containing
> a 0x85 byte (for the Euro character).
...
> And that lies at the heart of the matter: if we allow control characters
> and silly name characters, we won't actually increase the number of
> characters that can be reliable sent: we will just make non-ASCII 
> characters suspect and unreliable.  
> 
> Cheers
> Rick Jelliffe

To separate the two issues - I have no opinion on name characters.
PCDATA however is different. I read through you entire post twice
and must admit I still don't quite understand what your point is
exactly. I *think* you might be saying "its good to specify the
encoding because that way its possible to make sure characters
not valid in that encoding are rejected." (My reading of the XML spec
is that 0x85 is legal in the Unicode character set - that is, its
not marked as UNUSED in the good old SGML jargon.)

If this is your point, then would it be possible to define a new
encoding which permitted the full range of Unicode characters
(including control characters which are valid in Unicode).
Would that address your issues?

But I must admit that I do not understand why allowing control
characters in PCDATA results in "we won't actually increase the number
of characters that can be reliable sent: we will just make non-ASCII 
characters suspect and unreliable." It may make translation between
different character sets harder, but hey - how do I turn Unicode
encoded chinese into plain ASCII? My point is that not permitting
a small number of characters does not solve all such problems.

Or have I missed the whole point (I have jumped in late into this
discussion) - in which case sorry for muddying the waters.

If you are only talking about name characters (element names, attribute
names etc), then that is a different matter.

But I think its wrong to put too much trust into XML to protect
against data corruption. This seems (to me) to be a poor rationale
for omitting a small select number of characters. But as I said, I
may have missed your point. But currently to me you have not made
a convincing argument (for PCDATA). Whether I count - well that is
another matter! :-)

Alan

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.