[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Success factors for the Web and Semantic Web

  • From: Bill dehOra <BdehOra@i...>
  • To: Miles Sabin <MSabin@i...>, xml-dev@l...
  • Date: Tue, 02 Jan 2001 11:14:26 +0000

topology semantic web

> Ignoring metadata and working with the raw link topology is
> driven by the assumptions that referrers (ie. human authors) will
> more often than not make relevant links; that if several things
> are all linked to from the same place they're quite likely to be
> related in some way; and that links will cluster naturally around 
> relatively distinct topics of interest rather than degenerating 
> into mush. Notice that we can say all of this without once having
> to worry about what any of the stuff actually _means_.
> 
> We also get something else, not for free, but at least tractably.
> To pick up the theme of another thread, link topologies are
> essential global, whereas link metadata is typically local. Where
> the former is the result of the activites of numerous, mutually
> oblivious authors with overlapping areas of knowledge, interest 
> and expertise, the latter is typically the product of individuals 
> or small groups with particular, partial, interests. Making 
> metadata global in any useful way requires massive coordinated 
> intellectual and political effort (Simon's already raised some 
> doubts about whether or not we should consider that an 
> unqualified good). Global link topology just needs a warehouse 
> full of servers and a ludicrous amount of bandwidth.

(And some decent algorithms)

Though it may be painful to the logic/symbol crowd, the number crunchers are
definitely winning the scalability argument. The other point is that beating
a corpus to death with statistics and machine learning algorithms is known
to work reasonably well. We're hypothesizing that an annotated corpus plus
inference will work: it's very much a grand experiment.

It would be a riot though if we create all this metadata just to have it
processed statistically :) Joking aside, hybrid systems make a lot of sense:
I'd love to see Google crunch metadata instead of melonballing web pages.


> The assumptions behind this approach seem pretty plausible a 
> priori, and both Google and my long-time favourite domain-
> specific search engine, ResearchIndex (aka CiteSeer)[1] seem to 
> back them up. Then again, I've always used bibliographies as my 
> primary research tool, so maybe I'm biased.

Some of over the over-your-shoulder agent/browser assistant type research
made a similar assumption a few years ago. That is, we already have a
semantic web: at some point some human linked some two documents together,
for a reason. We just don't know how to reverse engineer that intent very
well, hence the perceived need for metadata. Indeed any corpus can be
assumed to have semantic links, since links aren't usually randomly
distributed any more than words are. Maybe the idea of a non-semantic link
is a nonsense. This assumption may be less valid now that many pages and
links are machine generated: or maybe it's even more valid, I stop at
discussing machine intentionality :)

-Bill

-----
Bill de hÓra  :  InterX  :  bdehora@i...


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.