[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Statistical vs "semantic web" approaches to makingsense of
martin@h... <martin@h...> wrote: > > On Thu, 24 Apr 2003, Danny Ayers wrote: > > > By coincidence I've been writing up a semi-refutation of Cory's > > 'metacrap' piece, hopefully ready in a day or so. > > i'd be interested to see that. my initial reaction to this > piece was 'crap'! can't help it, but i think it should be > obvious that all his arguments apply equally well to data as > it does to metadata. > > there seems to be an underlying view that anything done by a > machine - set-top boxes for TV stats or google for metadata - > is almost by definition better and more reliable than > anything produced by a human. I realize I'm close to slipping down the rabbit hole here, but given the way you posed this statement I can't resist playing devils advocate for a moment: if you're dealing with a random representative of the masses here then it's probably true that Google type information is more reliable. Just because Martha down the street tells you that the best TV on the market is a SuperSuchAndSuch "because her cousin Freddy has a 60" in his double wide" doesn't make it true any more so than finding the same opinion on some random Web site. However, Google finding what it considers the best authority on TV's is a lot more likely to get you a true evaluation of which TV is the best. The fact is that the Web is the first searchable and cross referenced repository of 1000,000s of opinions and as such it is reasonable that there are actually ways to reliably sample those opinions and weight them. This has little to do with semantics or XML (but certainly something to do with linking); its more a case of finding algorithms that can judge authority in some way or other (more on that in a moment). It almost seems that all the metadata in the world won't really change the way something like Google works except that the types of links one can make will become greater in number, and perhaps some types of links may prove better than others (eg. RSS vs. xpointer to stretch things a little). > "Google can derive statistics about the number of Web-authors > who believe that that page is important enough to link to, > and hence make extremely reliable guesses about how reputable > the information on that page is." really? my friend freddy's > got a website with links to the most unreliable sites on the > web. how does that affect google's 'reputability' scoring? Probably not one iota; your friend Freddy is likely not going to be considered an authority by Google. The evaluation of links is recursive and global. So, someone else has to also value Freddy's opinion before Google is going to let him influence things. Yes Freddy and Martha could collude to point to each others sites but a local island of links still won't have much affect on the global evaluation. > maybe the number of links to a page is a measure of exactly > that and nothing else - but do feel free make any assumptions > you want about why those links are there. personally i don't > tend to see googles search results as a reputability grading > at all, and i wouldn't recommend that anyone does ("it's > true, i found it on google!"). Of course not, TV news anchors and newspaper editors are the true font of all knowledge... > ultimately, if you care about the information that you > publish, then you care about the metainformation. and yes, > it's generally much easier to find web pages that have > meaningful titles. >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|