[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: patterns vs. identifiers


ascii patterns
Mike Champion wrote:
> 
> ...
> 
> I may well be over-optimistic; I'm trying to put together some code
> to explore the issue.  For what it's worth, my suspicion that there
> *is* a lot one could do with  fairly simple heuristics was strengthened
> by reading http://www.paulgraham.com/spam.html  (a discussion of
> spam filtering):

There is a big difference between depending on context and depending on
context *heuristically*. Every programming language uses context. Very
few (!) use heurisics.

> " A few simple rules will take a big bite out of your incoming spam.
> Merely looking for the word "click" will catch 79.7% of the emails in
> my spam corpus, with only 1.2% false positives."

That's fine because the price of a wrongly classified email slipping
through is so low. That is rarely the case in many other computer
science applications.

> Also check out Eugene Kuznetzov's article in XML Journal on
> XML-aware network equipment http://www.sys-con.com/xml/articleprint.cfm?id=459
> In discussing the challenge of recognizing a specific XML
> vocabulary and routing messages in that vocabulary to a specialized
> processor, he says "the same device could send messages in a particular
>  XML vocabulary to the server capable of processing them, or it could
> send separate XML-RPC and SOAP messages. The routing rules are specified
> using either proprietary pattern-matching languages or a limited subset of XPath."

But there is nothing heuristical about using XPath! XPaths are precise
matching expressions.

>...
> Also, I really hate to mention this :-) but think of the "wonderful" job
> that browsers do in making sense out of hideously invalid HTML. 

Once again, the cost of getting things wrong is low.

> ... Is there
> any reason to think that that level of creative hackery can't or won't
> be applied to the challenge of making sense out of business messages
> in XML, some of which will come from buggy software, some of which will be
> human edited, some of which will come from organizations that support
> newer versions of some spec than the receiver does, some will be
> generated by software that interprets the ambiguities in the spec differently
> from the receiver, some of which will come from software that "embraces and
> extends" the spec .... ad nauseum?  A "draconian" error handling policy
> just won't be any more viable than it would have been in Netscape 1.0.

I disagree. The cost of getting things wrong is too high. The cost of
coding the heuristics is too high. What percentage of the RDF out there
is non-wellformed. What percentage of XML-RPC messages do not conform to
the standard (modulo bugs in the standard like "ASCII").

-- 
"When I walk on the floor for the final execution, I'll wear a denim 
suit. I'll walk in there like Willie Nelson, John Wayne, Will Smith 
-- Men in Black -- James Brown. Maybe do a Michael Jackson moonwalk."
Congressman James Traficant.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.