[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Postel's law, exceptions


postel program

On Jan 13, 2004, at 7:26 PM, Julian Reschke wrote:


It was mis-specified (actually it wasn't specified at all, and as it wasn't UTF-8 it should have been).

OK, so the insistence on the encoding declaration being correct is the "draconian" bit here. Thanks.

I'll not comment on the rest because it seems to say that because of recent advances, we don't need a well-defined markup syntax. Somehow I doub this is true :-)

Not my argument. I'm saying that well defined markup syntax is basically for machine-machine communication (although obviously the übergeeks on this list can hand-author it), so machines are going to be doing the work to produce it from human-authored slop. One still needs good markup specs to define the template of the stuff that the machine creates, and to allow the de-soupification to be done only once in a processing pipeline.

The debate about Postel seems a bit pointless, since there's no way that ordinary humans are going to be trained to be conservative in what they produce and will insist on being liberal in what they consume. The only alternative to despair seems to be to automate the drudgery, ideally in the authoring tool, but more realistically in a downstream filter. (Actually the RSS/Atom debate about this seems to be over whose job it is to de-soupify, the syndicator or the aggregator).

There will be cases where one must insist that no dumb machine "fix" the inputs, such as Tim Bray's example of the ill-formed stock transaction message. I suspect there will be thousands of times more cases, however, where it's more like the mismatch between the encoding declaration and the character set in Sam Ruby's example, and machines can be trusted to do the right thing.

A year ago, I probably would have disagreed, but I've seen how an utterly stupid statistical tool (SpamBayes) has liberated me from spam with a grand total of 1 known false positive (and that was a legitimate message that sounded exactly like a spam, something like "the information you requested is at such-and-such a URL") out of tens of thousands of spams. Dave Raggett's tidy is another example of a fairly dumb program fixing a lot of tag soup with minimal damage to actual content structure. For that matter, Google and the next-generation stuff such as Vivisimo do an awfully good job of making "judgements" from tag soup, using inferred metadata rather than hand-authored metadata. I don't think it requires strong AI to make a really good guess at markup, especially in highly regular content such as weblogs and news feeds.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.