[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Search Engine
>Hi all, >Can anyone tell me where the difference lies in implementing a search >engine for HTML and a search engine for XML. The main difference is that in HTML the tagging is almost useless in localising the query, whereas in XML it is potentially very valuable. Many search engines support field-oriented query, e.g. find "Ireland" as a surname; with the right input filter for XML it becomes possible to map XML elements to the fields understood by the search engine, making such queries a feasible proposition, which is not the case for HTML. Switching thrreads, I am a little surprised by Tim's remarks on word proximity versus character proximity. Confining our attention to European languages (as most search engines do), word proximity searching is a common feature of the high-end search engines, whereas character proximity is hardly found outside basic desktop tools like grep. Apart from anything else, once you've done the word normalisation (normalising different linguistic forms or spellings of the same word), character proximity is meaningless. In the older boolean engines word proximity is used rather mechanistically, in the newer engines it is used more subtly as part of a statistical or linguistic approach to relevance ranking, but either way it is an established feature of the scene, and it is not there on whim: the search algorithms used are based on extensive research and benchmarking of relevance and recall scores. An interesting comparison of web search engines is at http://www.netstrider.com/search/features.html ; this asserts that all the well-known web search engines other than Lycos use word proximity matching. (A good survey in spite of the fact that it fails to distinguish the effectiveness of the query matcher from the effectiveness of the web crawler) Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|