|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Something altogether different?
Murali Mani wrote: > One disadvantage of term-based weighting or vector space model is the > well-known example cited in the Google's original paper (rather sales > pitch??) -- > > A document with only the words "Bill Clinton [expletive deleted]"; as opposed to the > actual white house page was considered more important for the query "Bill > Clinton" (when Clinton was the president) > > I believe we can use vector-space model only when the document collection > is "homogeneous" in some manner.. and has repetitive words etc. Google is apparently looking at a noun clustering scheme. http://news.zdnet.com/2100-9588_22-5605127.html?tag=nl.e539 Norvig highlighted a research paper written by a Google employee last year regarding a classification engine the company is testing. The technology can parse a proper noun or compound nouns into several categories in order to deliver clustered results, for example. For a query on "ATM," or asynchronous transfer mode, the engine would be able to use the terms "such as" on Web pages indexed with the term to discover that it can be linked to the expression "high-speed networks." As a result, a search for high-speed networks might pull up a cluster on ATM.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








