[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Quick Review of XML 1.1 Candidate Recommendation
I think this XML 1.1 version is a big step forward from previous versions: the XML Core WG has considerably toned down on their initial features, to the point where now XML 1.1 may well be better than XML 1.0. 1) Normalization Normalization is definitely a good thing. There should be more of it, especially by other people:-) But currently we are not well-served by normalization libraries. I use a stripped down version of ICU4J in a product for normalization: but the off-the-shelf jars currently distributed for ICU4J are about 10 Meg. Unrealistically big. So the XML 1.1 approach of saying normalization is good and may be checked for is probably the most realistic approach. It allows natural movement in a positive direction, like old underpants. 2) End-of-line handling XML 1.1 takes the line-of-least resistance here as well: don't change the definition of spaces (which would then have to propogate through other specs and technologies that use XML tokens or S production), but allow a couple more name characters. I have implemented this in a product, and it really is trivial to put in. So the XML 1.1 approach does no harm but opens the door for people who say they need NELs. 3) Characters XML 1.1's new character production is, I think, a real step forward for XML. It allows almost more kinds of characters to be sent, and so improves XML for data exchange. But it also disallows controls from being sent directly (numeric character references must be sent), which takes a good stand that XML is a textual format: that a control character sent in the data stream *is* a control character and not data content. The main reason I think this new character rule is a big step forward is that (as argued in http://www.topologi.com/public/XML_Naming_Rules.html ) the control characters, especially the C1 controls U+0080-U+009F, are excellent for detecting encoding-labelling errors (robustness). XML 1.0 provided meagre but useful encoding-labelling error-detection, but the XML 1.1 rules will work on non-ASCII data, not just non-ASCII markup. See also the sidebar "How could XML 1.1 help?" in the Euro article at http://www.xml.com/pub/a/2002/09/18/euroxml.html for more info. So the XML 1.1 character rules are a step forward for coverage, robustness and XML as textual. 4) Name Characters XML 1.1's new name rules stink, but not as much as they used to, and not so much that I couldn't get used to them. The objections I had raised to the previous draft rules were: * They reduced encoding- error detection: but the new Character rules do this better, so that objection has been met. * They cannot be justified by being "Unicode-version independent" because normalization-checking is Unicode-version dependent anyway: but the earker normalization-checking requirement makes this objection lose force. * They would allow line-breaks in bad places: the latest draft removes many breaking characters (i.e., the space characters in the early U+2000s and the ideographic space). I would prefer it went further... * The earlier drafts did not pay adequate attention to XML as being textual: the new control rules and the new whitespace rules for naming meet this objection. * The initial drafts seemed to downplay the importance of the basic readability (not to be confused with comprehensibility!) of XML documents: since the April draft they put in Appendix B, and based it on Unicode character classes rather than enumeration, which I think is a better approach. But a stricter application of these guidelines would have been better. On the other hand, specs such as XML Schemas reference XML 1.0, so they provide a nice bit of intertia to prevent crazy characters. And checking that names are nice might be better done by another layer, such as a schema tool or editor, anyway. * Some characters simply do not have any pronunceation or common name, in the language they are used: symbol characters and math characters for example. Consequently, they represent a real barrier for accessability (for programmers with impaired eye-sight for example): speech synthesizers will typically remove unknown characters. I think there is a strong difference between allowing an Ethopian character (which could be pronounced) and a dingbat in XML Names: the former affords communication when used appropriately, the latter blocks communication. So this XML 1.1 goes some way in meeting my previous objections. 5) Versions It was not at all clear from previous drafts whether XML 1.1 required a new infoset. It seems now that while it changes WF-ness of a document, it does not change the XML infoset or require new Infoset spects. This is a good thing, because it reduces cascading effects through XML-land. So this XML 1.1 seems to meet my previous objection. All in all, I congratulate the XML Core WG on this XML 1.1 draft, and all the sensible compromises in it. It will be interesting to see whether it takes off. Cheers Rick Jelliffe Topologi, Pty. Ltd. P.S., I think the reason that U+0000 is not allowed as an XML character is that the standard C libraries (and maybe other libraries) cannot allow nulls in strings. It is a sensible rule IMHO.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|