|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Quick Review of XML 1.1 Candidate Recommendation
From: "Tim Bray" <tbray@t...> > My problem is that XML has de facto been a significant step forward for > interoperability between heterogeneous systems, and this seems like a > step backward. At the moment, we can say confidently that XML markup > exposes logical structure unambiguously, and the content is text, which > means a sequence of unicode characters, and the characters have the > semantics that Unicode says they have. This is fine for characters such > as 'a' or ∫ (the integral sign), but the range � -  is > another kettle of fish. By my reading, none of the characters in the > ranges 0-#x7, #xb, #xe-#x1a have any agreed-upon semantics de jure or de > facto (let's go down to the mall and do some ). Starting with about Unicode 3.0, the U+0080-U+009F characters are now occupied by the ISO C1 controls, unless specifically overridden; XML 1.0 and XML 1.1 does not specifically override. See http://www.unicode.org/unicode/uni2book/ch13.pdf s.13.1 XML 1.1 is intended to cope with Unicode 3.n, and the new fixing of the C1 controls is one of those things. So the backwards compatability issue is really one that springs from Unicode, not from XML IMHO. It was pretty sus (or a convenient hack) to use the C1 code points before. Tim's point about needing to follow the Unicode semantics is well-made and important, but I think the XML 1.1 draft *does* do this. The semantics of a text stream is that a control character appearing in it is a control character that should be interpreted or stripped or used. A control character that is desired to be part of the data content (rightly or wrongly) should never be sent directly: it is a mistake of XML 1.0 to allow direct C1 characters. Ultimately, it comes down to a model of layering. I believe the layering is applications and data stores ------------------------------------------------------------------------------- Infoset data (can include controls not null) ------------------------------------------------------------------------------- XML, which must be compatible with "textual" text/* MIME ------------------------------------------------------------------------------- text data being sent as a data stream, by some system using controls ------------------------------------------------------------------------------- packets ------------------------------------------------------------------------------- That is more the kind of old telnet/modem-ish model that the RFCs have underlying them, and XML 1.1 supports this better than XML 1.0 does. The second prong that Tim raises is that in XML "the content is text" (i.e. and not binary) by which he is suggesting that non-text data should not be serialized as XML but first encoded using, say Bin64 notation. Unfortunately, this currently requires some kind of schema processing and some kind of PSVI to extract the string: a lot of overhead for a little feature. And the WXS Bin64 has a problem that there is no standard way to say what the data is after it is decoded: what is its notation or MIME type? So Bin64 can only be used with private conventions anyway. As Richard comments, arbitrary binary data still cannot be sent, because the U+0000 character NULL is not available in numeric character references. If we have no objection to Bin64 encoded data content, I don't see the problem with characters with controls as NCRs: both are textual and opaque. > And furthermore, the reason why our friends at Microsoft & IBM et al > want this is so they can take filthy dirty data out of database fields > and wrap XML tags around it and claim interoperability, which is pretty > questionable. -Tim As long as it is represented as text, why are the controls (when sanitized) any less filthy than the PUA characters? I am all in favour of making XML more comprehensive and more "textual" as a notation (in the terminology of the RFC for MIME types for text/*), and when this is still safe (no nulls), seems to fit into Internet layers more, is more mainstream SGML-ish, *and* improves robustness no end (better encoding detection), it is a pretty credible package. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








