[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Search Engine
Tim Bray writes: > What I said was: > 1. I have not seen any research which demonstrates that word proximity > achieves better results than character proximity based on any > well-known IR metric. > 2. Doing word proximity at all is a *very* hard problem in the languages > used by a large majority of the world's population. I think that there might be a disconnect here. What we're talking about is minimal-semantic-unit proximity -- for some languages/contexts, the minimal semantic unit will always be a single grapheme, and for others, it will be a cluster of one or more graphemes. This type of clustering is critical for search engines, which often (usually?) provide inverse indexes only for minimal semantic units, not for all graphemes. The argument, then, is that proximity testing should be done by counting the units that were indexed, which may or may not be single graphemes. All the best, David -- David Megginson david@m... http://www.megginson.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|