|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Something altogether different?
On Mon, Apr 25, 2005 at 03:49:35PM -0500, Bullard, Claude L (Len) wrote: [...] > So where we do understand how the vector model > works for text analysis, If you mean the cosine vector similarity model espoused by the late Dr. Gerald Salton and others, I think what we know is that it was an interesting theory that supported a lot of useful research, but has a number of practical difficulties. I don't know how Dr Cohen (cited earlier by Steve DeRose) has dealt with them. Difficulties include the fact that humans attribute significance (in English) to word order, and also use colocation of terms to help with sense disambiguation. Another difficulty with earlier systems like SMART was that sufficiently large documents contained all the terms -- use of markup to do term weighting for individual sections (or even paragraphs) can be a significant win in some environments. In the extract, Cohen mentions that term weighting can be "surprisingly effective" and goes on to say that > One advantage of this "vector space" representation is that the > similarity of two documents can be easily computed. Sometimes the thing that's easy to implement gets far enough of the way that doesn't seem worth implementing anything better. The use of fuzzy logic (is this a derivative of Zadeh?) is also interesting. Liam -- Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ http://www.holoweb.net/~liam/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








