[Home] [By Thread] [By Date] [Recent Entries]
At 10:34 AM 22/03/02 +0000, Michael Kay wrote: >I don't want to dumb XML down. But we do sometimes need to store data (e.g. >WebDAV property values) which can potentially contain characters that are >not permitted in XML. In fact, it's very unlikely that a WebDAV property >value will contain such a character, but the software still needs to allow >for the possibility. > >I don't personally see any good reason why C0 (and C1) characters shouldn't >be permitted XML characters I am very strongly against this, and not just for the excellent statistical reasons that Rick raised. XML's greatest strength is as an interchange format; as such it offers a degree of cross-system interoperability that nothing else quite achieves in my experience. The interoperability is partly due to the fact that the content consists of Unicode characters, which have widely agreed on semantics as documented in Unicode and ISO 10646. However, the C0 controls do *not* have such widely agreed on semantics (what do ETX and EOD mean to you today?). And in general binary data is less interoperable than textual data. Thus it has no place in XML. If you need to interchange binary data (and we all do) that's fine, but don't claim doing so is interoperable and don't try to dress it up in XML clothes unless you're willing to base64 it or otherwise clearly mark it as an opaque blob. The fact that the C1 characters are currently allowed in XML is simply a design error. I'd love to fix it but it's probably too late. Finally, the notion that allowing C0 & C1 chars helps with binary data packing seems kind of bogus to me anyhow - in any case you're going to have to filter to deal with U+0000 not to mention "<" and "&", right? Wouldn't it be about the same amount of work, and a lot cleaner, just to throw this stuff into base64? -Tim
|

Cart



