[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The Rising Sun: How XML Binary Restored the Fortunes
> Not all of the benefits of XML derive from its basis in text. Some of > the benefits derive from its paranoia. Everything is checked every time. > If a process is generating bad data whether through malice, > incompetence, bugs, line noise, spec misinterpretation, disk corruption, > cosmic rays, or a dozen other reasons, we find out very quickly. On the other hand, to some extent the redundancy in XML is required precisely because it is a text format: programmatically generated data can have unit tests to detect ill-formedness, but as soon as you can type and cut-and-paste your documents it becomes more likely than not that there will be an WF error. Furthermore, some binary formats that use indexes or trees are not susceptible to the class of errors caused by a missing end-tag, for example. > Binary formats are no more fundamentally resistant to corruption than > text based formats are. Indeed the ones being proposed are less > resistant because they are compressed and therefore less redundant. > While error correction can certainly be added to binary formats (CDs do > this, for example) I've yet to notice anyone proposing this for NOT XML. But the redundancy in XML qua text, to the extent that it exists, is not enough to allow error correction either. Indeed, in XML 1.0 the unavailable code points are only enough to detect some WF problems: XML 1.1 is more systematic and preferable there, at the minor cost that people who have done stupid and dangerous things like using the non-whitespace C0 or C1 code points in XML 1.0 have their follies exposed by their only indirect availability in XML 1.1. Actually, I am not sure it is logically consistent to praise XML for its use of redudancy without being impelled by the same argument to favour XML 1.1, which is objectively or systematically better in this regard. Also, I don't understand the point about text-based formats being more resistant to corruption because of redundancy. For example, when there is some hardware fault that inject errors at random, the smaller the format the fewer errors (absolute #). > The goal of NOT XML seems to be size and speed at all costs, including > the cost of transparency and disaster recovery. At least with real XML, > when something goes horribly wrong with critical data, a human can > probably fix the mistakes and recover most of the information. With a > binary format, that's going to be much harder to do, if it's even > possible. There also is a needing-to-shove-one's-head-up-one's-arsehole-to-prevent-embarrassment aspect too. One reason XML Schemas is type-based is because that was expected to allow certain efficiencies; standards or technologies which can actually make these marvelous efficiencies materialize (XQuery, Binary XML, type-based linking) need to descend from heaven at regular intervals to improve XML Schemas' bang-per-buck. "Build the text field and they will come!" If standards to use the PSVI or schemas are not forthcoming or don't work, then the complexity or poor fit of XML Schemas will not be as excusable (as I hope it will be at some time in the future). The big players need to leverage the PSVI before it leverages them, IYKWIM. For a blog on Fast Infoset, see http://www.oreillynet.com/pub/wlg/6206 For Fast Infoset, why not see it as ASN.1 becoming more XML-infrastructure compatible rather than XML going binary? I honestly don't see what is wrong with well-thought-out alternative approaches to the same problem, in particular where they have very different characteristics. Plurality is healthy. Protesting that an Infoset-carrying binary format will have different properties than XML is rather the point of the exercise: a format with exactly the same properties would be a futile competitor rather than a complement. The people who want binary infosets may well be bit-crazed losers who don't understand their problems and want to stuff up our world as well, but then again they may not: the most charitable thing is not to be a nanny but to say "Spread your wings and fly my eaglet child" or "Go hang yourselves":-- let them make a binary infoset standard and see in practise what solutions it is good for. There is a strong streak of puritanism (Thou shalt only have one way to do anything) that is counter-productive. All engineering involves measuring and understanding the characteristics of a technique or material, to allow repeated projects with known performance characteristics; what is important isn't that the world contains only perfect technologies, but that we know what their strengths and weaknesses are, when to use them, and how to influence our local standard's bodies in positive directions to get broad and pragmatic coverage of our different use cases. The Binary Infoset issue is small compared to the larger one that has crippled fundamental standards at the W3C: the chaotic development of DTD-replacing layers (xml:include, xml:base, xml:id, xlink, XML Schemas) without having a corresponding dependable processing sequence like that of DTDs. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|