Re: Some comments on the 1.1 draft
On Wed, Dec 19, 2001 at 06:02:55PM +1100, Rick Jelliffe wrote: > One big advantage of disallowing control characters from XML documents > and silly characters from XML names is that it catches most common encoding > errors. > > For example, the very common problem of data labelled ISO 8859-1 containing > a 0x85 byte (for the Euro character). ... > And that lies at the heart of the matter: if we allow control characters > and silly name characters, we won't actually increase the number of > characters that can be reliable sent: we will just make non-ASCII > characters suspect and unreliable. > > Cheers > Rick Jelliffe To separate the two issues - I have no opinion on name characters. PCDATA however is different. I read through you entire post twice and must admit I still don't quite understand what your point is exactly. I *think* you might be saying "its good to specify the encoding because that way its possible to make sure characters not valid in that encoding are rejected." (My reading of the XML spec is that 0x85 is legal in the Unicode character set - that is, its not marked as UNUSED in the good old SGML jargon.) If this is your point, then would it be possible to define a new encoding which permitted the full range of Unicode characters (including control characters which are valid in Unicode). Would that address your issues? But I must admit that I do not understand why allowing control characters in PCDATA results in "we won't actually increase the number of characters that can be reliable sent: we will just make non-ASCII characters suspect and unreliable." It may make translation between different character sets harder, but hey - how do I turn Unicode encoded chinese into plain ASCII? My point is that not permitting a small number of characters does not solve all such problems. Or have I missed the whole point (I have jumped in late into this discussion) - in which case sorry for muddying the waters. If you are only talking about name characters (element names, attribute names etc), then that is a different matter. But I think its wrong to put too much trust into XML to protect against data corruption. This seems (to me) to be a poor rationale for omitting a small select number of characters. But as I said, I may have missed your point. But currently to me you have not made a convincing argument (for PCDATA). Whether I count - well that is another matter! :-) Alan
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format