[Home] [By Thread] [By Date] [Recent Entries]
By coincidence I've been writing up a semi-refutation of Cory's 'metacrap' piece, hopefully ready in a day or so. Semi-refutation because while I agree with most of his observations, they take a blinkered, hobbled view of metadata and as result I believe the general conclusions to be way off the mark. The factor I think that has most relevance to your post (though I've not read the links yet) is that it's not an either/or situation. I personally believe that the web will start getting *really* useful when the explicit (semweb) and implicit (Google) meet. A question - do you think Google takes note of the title of documents it indexes? Cheers, Danny. > -----Original Message----- > From: Mike Champion [mailto:mc@x...] > Sent: 24 April 2003 03:10 > To: xml-dev@l... > Subject: Statistical vs "semantic web" approaches to making > sense of the Net > > > > There was an interesting conjunction of articles on the ACM > "technews" page > [http://www.acm.org/technews/current/homepage.html] -- one on "AI" > approaches to spam filtering > http://www.nwfusion.com/news/tech/2003/0414techupdate.html and > the other on > the Semantic Web > http://www.computerworld.com/news/2003/story/0,11280,80479,00.html. > > What struck me is that the "AI" approach (I'll guess it makes > heavy use of > pattern matching and statistical techniques such as Bayesian > inference) is > working with raw text that the authors are deliberately trying to > obfuscate > the meaning of to get past "keyword" spam filters, and the Semantic Web > approach seems to require explicit, honest markup. Given the "metacrap" > argument about semantic metadata > (http://www.well.com/~doctorow/metacrap.htm) I suspect that in > general the > only way we're going to see a "Semantic Web" is for statistical/pattern > matching software to create the semantic markup and metadata. > That is, if > such tools can make useful inferences today about spam that > pretends to be > something else, they should be very useful in making inferences tomorrow > about text written by people who try to say what they mean. > > This raises a question, for me anyway: If it will take a "better Google > than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or > similar statistical techniques to create the markup that the Semantic Web > will exploit, what's the point of the semantic markup? Why won't people > just use the "intelligent" software directly? Wearing my "XML database > guy" hat, I hope that the answer is that it will be much more > efficient and > programmer-friendly to query databases generated by the 'bots containing > markup and metadata to find the information one needs. But I must admit > that 5-6 years ago I thought the world would need standardized, widely > deployed XML markup before we could get the quality of searches > that Google > allows today using only raw HTML and PageRank heuristic algorithm. > > So, anyone care to pick holes in my assumptions, or reasoning? > If one does > accept the hypothesis that it will take smart software to produce the > markup that the Semantic Web will exploit, what *is* the case for > believing > that it will be ontology-based logical inference engines rather than > statistically-based heuristic search engines that people will be using in > 5-10 years? Or is this a false dichotomy? Or is the "metacrap" argument > wrong, and people really can be persuaded to create honest, > accurate, self- > aware, etc. metadata and semantic markup? > > [please note that my employer, and many colleagues at W3C, may > have a very > different take on this and please don't blame anyone but me for this > blather!] > > > -- > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> >
|

Cart



