[Home] [By Thread] [By Date] [Recent Entries]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike, I think what you're seeing here is that current approaches to the description of data are not very human-friendly. And, in the data description world, we are trying to express the very rich concept of 'semantics' using the blunt instruments of 'metadata' and 'resource description'. What I mean by that is that as human beings, we can quickly ascribe a large, deep and varied set of semantics to any particular spoken sentence, item or situation. On the other hand, computers basically understand no semantics at all, only syntax. The use of metadata and RDF to describe data is an (albeit small) intermediate step on the way to improving chances of computers being able to ascribe semantics to words or concepts. At this point, humans have to do all of the work of description for the computer (by providing adequate metadata and markup). Artificial Intelligence techniques may eventually ease this process, but right now, computers are best at processing large amounts of data without much semantic input, very quickly, and thus statistically-based searches are likely to remain our best effort. Of course, over time, more subtle methods will emerge. Until they do, on the continuum of interaction between humans and computers, humans will be doing most of the teaching, and computers will mostly just be sitting there dumbly, waiting to be told how they should interpret a particular piece of data (particularly if the power switch is off). Which, as the 'metacrap' article points out, is not particularly attractive to most humans :) So, xml-dev'ers should probably consider themselves on the forefront of a pioneering effort to teach computers about semantics. An effort which, given the paucity and quality of tools to do this work, should be applauded by all human-kind (or not) ;) - - JohnK On Wednesday, Apr 23, 2003, at 21:09 US/Eastern, Mike Champion wrote: > > There was an interesting conjunction of articles on the ACM "technews" > page [http://www.acm.org/technews/current/homepage.html] -- one on > "AI" approaches to spam filtering > http://www.nwfusion.com/news/tech/2003/0414techupdate.html and the > other on the Semantic Web > http://www.computerworld.com/news/2003/story/0,11280,80479,00.html. > > What struck me is that the "AI" approach (I'll guess it makes heavy > use of pattern matching and statistical techniques such as Bayesian > inference) is working with raw text that the authors are deliberately > trying to obfuscate the meaning of to get past "keyword" spam filters, > and the Semantic Web approach seems to require explicit, honest > markup. Given the "metacrap" argument about semantic metadata > (http://www.well.com/~doctorow/metacrap.htm) I suspect that in general > the only way we're going to see a "Semantic Web" is for > statistical/pattern matching software to create the semantic markup > and metadata. That is, if such tools can make useful inferences today > about spam that pretends to be something else, they should be very > useful in making inferences tomorrow about text written by people who > try to say what they mean. > > This raises a question, for me anyway: If it will take a "better > Google than Google" (or perhaps an "Autonomy meets RDF") that uses > Baysian or similar statistical techniques to create the markup that > the Semantic Web will exploit, what's the point of the semantic > markup? Why won't people just use the "intelligent" software > directly? Wearing my "XML database guy" hat, I hope that the answer > is that it will be much more efficient and programmer-friendly to > query databases generated by the 'bots containing markup and metadata > to find the information one needs. But I must admit that 5-6 years > ago I thought the world would need standardized, widely deployed XML > markup before we could get the quality of searches that Google allows > today using only raw HTML and PageRank heuristic algorithm. > > So, anyone care to pick holes in my assumptions, or reasoning? If one > does accept the hypothesis that it will take smart software to produce > the markup that the Semantic Web will exploit, what *is* the case for > believing that it will be ontology-based logical inference engines > rather than statistically-based heuristic search engines that people > will be using in 5-10 years? Or is this a false dichotomy? Or is the > "metacrap" argument wrong, and people really can be persuaded to > create honest, accurate, self- aware, etc. metadata and semantic > markup? > > [please note that my employer, and many colleagues at W3C, may have a > very different take on this and please don't blame anyone but me for > this blather!] > > > -- > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin) iD8DBQE+p8M9n677NT86+ZsRAvIsAJ9aeY1qYIAJvGeyOSuu1ubG2bjYrQCgpVKF +0KeeqNp5CukQ5u6Jjbfuoc= =hIgl -----END PGP SIGNATURE-----
|

Cart



