[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Something altogether different?
> I believe we can use vector-space model only when the document collection > is "homogeneous" in some manner.. and has repetitive words etc. > > Also note -- vector space model, you have to obtain rank of documents in > real-time given a query. Cohen's '99 WHIRL paper discusses the ranking heuristics, the storing of similarities instead of computing them in real-time, and the use of views to persist information about the highest-scoring answers: "Fortunately, in most cases, it is not necessary to compute all answers to a query, as only the high-scoring answers will be of interest. WHIRL's inference algorithms are thus designed to finds a few good answers to a query, without generating all possible answers. The operations most commonly performed by a user (or program) interacting with WHIRL are to define and r-materialize views. To r-materialize a view, WHIRL finds the "r" highest-scoring ground atoms "a" associated with a view, and store those facts in the EDB (extensional database) for later use." > For other metrics such as say pagerank, rank of documents can be > pre-computed, and we can use better algorithms based on this property. In the "Recommending Music by Crawling The Web" paper, Cohen and Fan researched music preferences by spidering the web and using four different scoring algorithms: popularity, K-nearest neighbor, weighted majority and a extended direct Bayesian prediction. In a 1998 paper, Cohen, Shapir and Yagir discussed the use of a preference function when determining ranking (excerpt below): http://citeseer.ist.psu.edu/cache/papers/cs/17244/http:zSzzSzdnkweb.denken.or.jpzSzboostingzSzpaperszSzCohSchSin98.pdf/cohen98learning.pdf Learning to Order Things There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order, given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a preference function, of the form PREF ... Nevertheless, we describe a simple greedy algorithm that is guaranteed to find a good approximation. We then discuss an on-line learning algorithm, based on the "Hedge" algorithm, for finding a good linear combination of ranking "experts."
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|