Re: Some comments on the 1.1 draft
On Wednesday 19 December 2001 02:24 am, Alan Kent wrote: > To separate the two issues - I have no opinion on name characters. > PCDATA however is different. I read through you entire post twice > and must admit I still don't quite understand what your point is > exactly. I *think* you might be saying "its good to specify the > encoding because that way its possible to make sure characters > not valid in that encoding are rejected." (My reading of the XML spec > is that 0x85 is legal in the Unicode character set - that is, its > not marked as UNUSED in the good old SGML jargon.) > > If this is your point, then would it be possible to define a new > encoding which permitted the full range of Unicode characters > (including control characters which are valid in Unicode). > Would that address your issues? The point is that characters != bytes != encoding. If you start allowing control characters (which are somewhat debatable *as* characters in the first place), it becomes very easy to abuse the power and to have application-specific uses of embedded encodings. This is effectively what Mr. Rhys from MS wanted: the ability to store arbitrary binary streams inside XML encoded data. The problem is that XML is *text*. It is made from *characters*, and arbitrary binary strings have no place in it. Once you change that, you have essentially ruined XML as a textual markup language. People could say that NUL et al. are still *characters* and so would be fine, even in UTF-8 encoded documents, but I bet they'd be rather unhappy to find their binary streams changing if I saved the document as UTF-16. The point here is that these things are unreliable.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format