RE: Success factors for the Web and Semantic Web
> Ignoring metadata and working with the raw link topology is > driven by the assumptions that referrers (ie. human authors) will > more often than not make relevant links; that if several things > are all linked to from the same place they're quite likely to be > related in some way; and that links will cluster naturally around > relatively distinct topics of interest rather than degenerating > into mush. Notice that we can say all of this without once having > to worry about what any of the stuff actually _means_. > > We also get something else, not for free, but at least tractably. > To pick up the theme of another thread, link topologies are > essential global, whereas link metadata is typically local. Where > the former is the result of the activites of numerous, mutually > oblivious authors with overlapping areas of knowledge, interest > and expertise, the latter is typically the product of individuals > or small groups with particular, partial, interests. Making > metadata global in any useful way requires massive coordinated > intellectual and political effort (Simon's already raised some > doubts about whether or not we should consider that an > unqualified good). Global link topology just needs a warehouse > full of servers and a ludicrous amount of bandwidth. (And some decent algorithms) Though it may be painful to the logic/symbol crowd, the number crunchers are definitely winning the scalability argument. The other point is that beating a corpus to death with statistics and machine learning algorithms is known to work reasonably well. We're hypothesizing that an annotated corpus plus inference will work: it's very much a grand experiment. It would be a riot though if we create all this metadata just to have it processed statistically :) Joking aside, hybrid systems make a lot of sense: I'd love to see Google crunch metadata instead of melonballing web pages. > The assumptions behind this approach seem pretty plausible a > priori, and both Google and my long-time favourite domain- > specific search engine, ResearchIndex (aka CiteSeer) seem to > back them up. Then again, I've always used bibliographies as my > primary research tool, so maybe I'm biased. Some of over the over-your-shoulder agent/browser assistant type research made a similar assumption a few years ago. That is, we already have a semantic web: at some point some human linked some two documents together, for a reason. We just don't know how to reverse engineer that intent very well, hence the perceived need for metadata. Indeed any corpus can be assumed to have semantic links, since links aren't usually randomly distributed any more than words are. Maybe the idea of a non-semantic link is a nonsense. This assumption may be less valid now that many pages and links are machine generated: or maybe it's even more valid, I stop at discussing machine intentionality :) -Bill ----- Bill de hÓra : InterX : bdehora@i...
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format