[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Something altogether different?


Re:  Something altogether different?
Murali Mani wrote:

> One disadvantage of term-based weighting or vector space model is the
> well-known example cited in the Google's original paper (rather sales
> pitch??) --
>
> A document with only the words "Bill Clinton [expletive deleted]"; as opposed to the
> actual white house page was considered more important for the query "Bill
> Clinton" (when Clinton was the president)
>
> I believe we can use vector-space model only when the document collection
> is "homogeneous" in some manner.. and has repetitive words etc.

Google is apparently looking at a noun clustering scheme.
http://news.zdnet.com/2100-9588_22-5605127.html?tag=nl.e539

Norvig highlighted a research paper written by a Google employee last year
regarding a classification engine the company is testing. The technology can
parse a proper noun or compound nouns into several categories in order to
deliver clustered results, for example. For a query on "ATM," or asynchronous
transfer mode, the engine would be able to use the terms "such as" on Web pages
indexed with the term to discover that it can be linked to the expression
"high-speed networks." As a result, a search for high-speed networks might pull
up a cluster on ATM.




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.