[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: patterns vs. identifiers
Mike Champion wrote: > > ... > > I may well be over-optimistic; I'm trying to put together some code > to explore the issue. For what it's worth, my suspicion that there > *is* a lot one could do with fairly simple heuristics was strengthened > by reading http://www.paulgraham.com/spam.html (a discussion of > spam filtering): There is a big difference between depending on context and depending on context *heuristically*. Every programming language uses context. Very few (!) use heurisics. > " A few simple rules will take a big bite out of your incoming spam. > Merely looking for the word "click" will catch 79.7% of the emails in > my spam corpus, with only 1.2% false positives." That's fine because the price of a wrongly classified email slipping through is so low. That is rarely the case in many other computer science applications. > Also check out Eugene Kuznetzov's article in XML Journal on > XML-aware network equipment http://www.sys-con.com/xml/articleprint.cfm?id=459 > In discussing the challenge of recognizing a specific XML > vocabulary and routing messages in that vocabulary to a specialized > processor, he says "the same device could send messages in a particular > XML vocabulary to the server capable of processing them, or it could > send separate XML-RPC and SOAP messages. The routing rules are specified > using either proprietary pattern-matching languages or a limited subset of XPath." But there is nothing heuristical about using XPath! XPaths are precise matching expressions. >... > Also, I really hate to mention this :-) but think of the "wonderful" job > that browsers do in making sense out of hideously invalid HTML. Once again, the cost of getting things wrong is low. > ... Is there > any reason to think that that level of creative hackery can't or won't > be applied to the challenge of making sense out of business messages > in XML, some of which will come from buggy software, some of which will be > human edited, some of which will come from organizations that support > newer versions of some spec than the receiver does, some will be > generated by software that interprets the ambiguities in the spec differently > from the receiver, some of which will come from software that "embraces and > extends" the spec .... ad nauseum? A "draconian" error handling policy > just won't be any more viable than it would have been in Netscape 1.0. I disagree. The cost of getting things wrong is too high. The cost of coding the heuristics is too high. What percentage of the RDF out there is non-wellformed. What percentage of XML-RPC messages do not conform to the standard (modulo bugs in the standard like "ASCII"). -- "When I walk on the floor for the final execution, I'll wear a denim suit. I'll walk in there like Willie Nelson, John Wayne, Will Smith -- Men in Black -- James Brown. Maybe do a Michael Jackson moonwalk." Congressman James Traficant.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|