Re: Is google a conceptual graph engine?
My comments: On Mon, 6 Oct 2003, Didier PH Martin wrote: > Now if we are using xlink, some additional information can be added > <a xlink:type="simple" xlink:href="index.html" xlink:role="partOf">XML > Guide</a> > Since the source is currentDoc and the destination index.html, then the > conceptual graph for this statement is: > [currentDoc]->(partOf)->[index.html] --- currentDoc is part of index.html > Which could make sense if we consider that the first document represents the > cover page or that it is the domain's table of contents (most of the time, > the document associated to the domain is also a table of contents linking to > the other documents). Based on this premise, documents are organized as a > hierarchy and the document associated to the domain is the root. Note also that we need to use English words in role, otherwise whatever little semantics we have is lost. Your partOf will probably be considered by a search engine as one special keyword.. For example, consider a document, which google indexes as: I am partOf xml-dev if you search "part of xml-dev" the above document will not be returned. google uses "anchor context" also to determine the importance of pages. > Now the problem is, for any classification agent that in other to satisfy > mercantile appetites (or simply to pay the monthly bills) some people > knowing that agent are using the role to establish relationship between two > documents will play with the system in order to get a good ranking. Some > would reply, let's then get rid of these search engines and let's create > autonomous agents that will travel the web to collect relevant documents. No > problems, How long will it take for such agent to cover enough of the web to > collect significant documents. What are your guaranties that all links will > honestly report (by will or simply by error) their relationship with other > documents to your agent? Your agent travel agenda may be dependent on these > relationship types.... > > Hummm, definitively, the semantic web is not a simple affair... As some of > our social problems are rooted in our nature or prehistoric times, some > problems which could potentially be a plague to the semantic web are rooted > in today's web. Cheating on the web and getting false importance is one of the things that google cleverly avoids. This is how google give importance to pages - this is the page rank algorithm.. every web page is given the same rank of say 1. now google does the following for several iterations. In every iteration, the rank of every page is given by the sum of the rank of pages that point to this page. For example, if your page is pointed to by yahoo, then you have a high rank, but if your page is pointed to say by my home page, you will have a lesser rank. They do this for several iterations, and finally it converges (I think experiments as well as theoretical results show that the no. of iterations needed is < 10, if I remember correctly, irrespective of the starting rank for every page).. Main thing is: it is difficult to get your page ranked highly by doing tricks.. Note; Can someone seen any case when even page rank can be fooled? I do not remember having seen anything.. best regards, murali.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format