RE: Statistical vs "semantic web" approaches to making sense o

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: "Mike Champion" <mc@x...>,<xml-dev@l...>
Subject: RE: Statistical vs "semantic web" approaches to making sense of the Net
From: "Danny Ayers" <danny666@v...>
Date: Thu, 24 Apr 2003 14:54:09 +0200
Importance: Normal
In-reply-to: <oprn3z6mj7ezizxn@localhost>

By coincidence I've been writing up a semi-refutation of Cory's 'metacrap'
piece, hopefully ready in a day or so.
Semi-refutation because while I agree with most of his observations, they
take a blinkered, hobbled view of metadata and as result I believe the
general conclusions to be way off the mark.

The factor I think that has most relevance to your post (though I've not
read the links yet)  is that it's not an either/or situation. I personally
believe that the web will start getting *really* useful when the explicit
(semweb) and implicit (Google) meet. A question - do you think Google takes
note of the title of documents it indexes?

Cheers,
Danny.

> -----Original Message-----
> From: Mike Champion [mailto:mc@x...]
> Sent: 24 April 2003 03:10
> To: xml-dev@l...
> Subject:  Statistical vs "semantic web" approaches to making
> sense of the Net
>
>
>
> There was an interesting conjunction of articles on the ACM
> "technews" page
> [http://www.acm.org/technews/current/homepage.html] -- one on "AI"
> approaches to spam filtering
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and
> the other on
> the Semantic Web
> http://www.computerworld.com/news/2003/story/0,11280,80479,00.html.
>
> What struck me is that the "AI" approach (I'll guess it makes
> heavy use of
> pattern matching and statistical techniques such as Bayesian
> inference) is
> working with raw text that the authors are deliberately trying to
> obfuscate
> the meaning of to get past "keyword" spam filters, and the Semantic Web
> approach seems to require explicit, honest markup.  Given the "metacrap"
> argument about semantic metadata
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in
> general the
> only way we're going to see a "Semantic Web"  is for statistical/pattern
> matching software to create the semantic markup and metadata.
> That is, if
> such tools can make useful inferences today about spam that
> pretends to be
> something else, they should be very useful in making inferences tomorrow
> about text written by people who try to say what they mean.
>
> This raises a question, for me anyway:  If it will take a "better Google
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or
> similar statistical techniques to create the markup that the Semantic Web
> will exploit, what's the point of the semantic markup?  Why won't people
> just use the "intelligent" software directly?  Wearing my "XML database
> guy" hat, I hope that the answer is that it will be much more
> efficient and
> programmer-friendly to query databases generated by the 'bots containing
> markup and metadata to find the information one needs.  But I must admit
> that 5-6 years ago I thought the world would need standardized, widely
> deployed XML markup before we could get the quality of searches
> that Google
> allows today using only raw HTML and PageRank heuristic algorithm.
>
> So, anyone care to pick holes in my assumptions, or reasoning?
> If one does
> accept the hypothesis that it will take smart software to produce the
> markup that the Semantic Web will exploit, what *is* the case for
> believing
> that it will be ontology-based logical inference engines rather than
> statistically-based heuristic search engines that people will be using in
> 5-10 years?  Or is this a false dichotomy?  Or is the "metacrap" argument
> wrong, and people really can be persuaded to create honest,
> accurate, self-
> aware, etc. metadata and semantic markup?
>
> [please note that my employer, and many colleagues at W3C, may
> have a very
> different take on this and please don't blame anyone but me for this
> blather!]
>
>
> --
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>

Follow-Ups:
- RE: Statistical vs "semantic web" approaches to makingsenseof the Net
  - From: <martin@h...>
- Re: Statistical vs "semantic web" approaches to making senseof the Net
  - From: Mike Champion <mc@x...>

References:
- Statistical vs "semantic web" approaches to making sense of the Net
  - From: Mike Champion <mc@x...>

Prev by Date: Re: Statistical vs "semantic web" approaches to making sense of the Net
Next by Date: Re: Data streams and schema use and identification
Previous by thread: Re: Statistical vs "semantic web" approaches to making sense of the Net
Next by thread: Re: Statistical vs "semantic web" approaches to making senseof the Net
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >