[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Data mining the semantic web? (was RE: Semantic

  • To: Mike Champion <mc@x...>
  • Subject: RE: Data mining the semantic web? (was RE: Semantic Web)
  • From: jborden@a...
  • Date: Wed, 31 Jul 2002 14:42:01 +0000
  • Cc: xml-dev@l...

data mining semantic web


> >Basically it seems to me that way Google has approached the web is as a
> >giant problem in Bayesian Analysis, and that this method has been
> >relatively successful(at least more successful than other methods have
> >been).
> Hmmm ... then maybe ontologies could help seed the process with "prior
> probabilities" or something?  

This is my exact research interest. The problem with Bayesian/statistical/markov chain analysis is that if the "search space" is unbounded then the process may take an approaching infinite amount of time to resolve. The trick would seem to provide the ability for "local context" or as you say: an ontology seeding the process. This would be done in an interative fashion. For example we can use the "oneOf" mechanism to define a _Class_ as being composed of a given number of _Individuals_ e.g. as determined statistically. One might then equate two Classes, or use a classifier to find the equation of two classes, one determined by statistically derived individual membership, the the other (Class) as being part of a deep hierarchy (ontology). This might go round and round, with the output of each stage statistical stage being fed into a subsequent logical classification stage etc. It might work. On the other hand I might just be wasting my time.

> ... Or maybe, let people specify somehow
> that the search should be constrained by the sense that words are used
> in some vocabulary/ontology , i.e.,  if I'm looking for information
> about "madonna" I mean the religious personage rather than the pop
> singer, so I somehow tell Google to use the "christianity" vocabulary
> rather then the "pop culture" vocabulary. 

Yeah basically. Google could then devote more of its servers to parsing words next to "madonna" in the desired sense of the word.
> I should get back to work, sigh, but this subject fascinates me.
> I heard about SNOMED and the questions that healthcare professionals
> would like to use it  to answer a couple of years ago. I'd  been
> thinking of it as a database query problem ... sortof a join
> of the clinical data  with the vocabulary data.  Jonathan Borden
> has helped me see that this could also be seen as a "semantic
> web", and this thread has made it clear that the question of 
> how to combine vocabulary/taxonomy/ontology information to 
> inform web searches or XML queries is wide open for R&D.

There are several folks in WebOnt who are very much interested in integrating XQuery and OWL (you might look for some papers that Peter Patel-Schneider has written with Jerome Simeon on this topic). Much of the point of using an ontology, certainly in the historic sense, as been as a way to provide structure for a free wheeling stream of natural language text. Think of the current Web as that free form text stream and then using OWL to structure queries may start to make more sense.

Jonathan (who is at work :-))


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.