[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Is google a conceptual graph engine?
Hi Murali, Murali said: Note also that we need to use English words in role, otherwise whatever little semantics we have is lost. Your partOf will probably be considered by a search engine as one special keyword.. Didier replies: Off course Murali and actually most of search engine do not process xlink. I gave that as an example of an "eplicit" ontology created from links between documents. This by opposition to a "tacit" ontology created from links between documents. If we add some semantics to a link (i.e. through a role or relationship typing) then we add more information to the link. This is something we do not have today in currently published documents. Relationships are not typed. Let's say that we live in a totally honest world. You publish a document on the web; I publish a document on the web. We start to build a certain view of the world by linking our documents. However, for an external observer; the kind of relationship between these two documents is not that obvious. Yes indeed, a human reader with average intelligence can infer the type of relationship between the two documents but a totally dumb machine named a computer will struggle to figure out the type of relationship. However, if we add extra information about the type of link/relationship we then add some additional information that could help build an ontology from these two documents, especially if we can associate these two documents to a theme. Again, let's suppose that these two documents help our dumb machine to figure out what going on by including some statements like: Document 1: <rdf:description about="self"> <author>Didier PH Martin</author> <theme>ontology</theme> </rdf:description> document 2: <rdf:description about="self"> <author>Murali Mani</author> <theme>model theory</theme> </rdf:description> If now you include a link in your document such as this one: <a xlink:type="simple" xlink:href=http://adomain.com/ontology.html xlink:role="is_partOf"> more info...</a> Then you add a new statement in your document and you create a new kind of statement: [ontology]->(is_part of)->[model theory] That assertion may be true or false but nonetheless will reflect a view of the world or an assertion about a world. The relationship is explicit and specified in your document. Otherwise it is tacit and deduce from the content and the algorithm used to classify the content. Today, we can say that the ontology created from links between documents encoded either in HTML (SGML based) or XHTML (xml based) represents a tacit ontology. This tacit ontology cannot easily be discovered from the tags or the content of the marked up documents. The more I think about this whole issue, the more I think that ontologies as specified by W3C can work only in some domains and this only if tools to make it simple and easy are available; for example, for some type of transactions or international transaction where more formal definitions are required to facilitate the exchanges. This won't be the internet per se, but more intra or extranets, more limited networks than what we call the "web". You also mentioned something about page rank. You are right, the series is converging quite rapidly (ref: http://www.iprcom.com/papers/pagerank/). I totally disagree with you on the fact that Google's page rank cannot be cheated. Just go to: http://www2.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=weapon+of+mass+destr uction and look at the first link. Now try to figure out how a joke about such serious subject can be ranked so high (yes the trick is no secrets). The actual page rank algorithm without theme or concept cluster (a la toema) is simply a political statement corresponding to a vote. It's a little bit like the highgrade sausage ads of a couple years ago. Everybody likes it because everybody eats it and everybody eats it because every body likes it. Said differently, a document should be important if a lot of important people say so. It doesn't say that the assertion made by a document is true, false, serious or a joke. Just that a lot of important people voted that is important. The previous example shows that this algorithm can be fooled by a group. Cheers Didier PH Martin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|