[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Something altogether different?


Re:  Something altogether different?
On Mon, Apr 25, 2005 at 03:49:35PM -0500, Bullard, Claude L (Len) wrote:
[...]
> So where we do understand how the vector model 
> works for text analysis,
If you mean the cosine vector similarity model espoused by
the late Dr. Gerald Salton and others, I think what we know
is that it was an interesting theory that supported a lot of
useful research, but has a number of practical difficulties.

I don't know how Dr Cohen (cited earlier by Steve DeRose) has
dealt with them.  Difficulties include the fact that humans
attribute significance (in English) to word order, and also
use colocation of terms to help with sense disambiguation.
Another difficulty with earlier systems like SMART was that
sufficiently large documents contained all the terms -- use of
markup to do term weighting for individual sections (or even
paragraphs) can be a significant win in some environments.

In the extract, Cohen mentions that term weighting can be
"surprisingly effective" and goes on to say that
> One advantage of this "vector space" representation is that the
> similarity of two documents can be easily computed.

Sometimes the thing that's easy to implement gets far enough
of the way that doesn't seem worth implementing anything better.

The use of fuzzy logic (is this a derivative of Zadeh?) is also
interesting.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.