[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Search Engine

  • From: "Michael Kay" <M.H.Kay@e...>
  • To: <xml-dev@i...>
  • Date: Thu, 5 Nov 1998 12:27:12 -0000

crawler xml
>Hi all,
>Can anyone tell me where the difference lies in implementing a search
>engine for HTML and a search engine for XML.


The main difference is that in HTML the tagging is almost useless in
localising the query, whereas in XML it is potentially very valuable. Many
search engines support field-oriented query, e.g. find "Ireland" as a
surname; with the right input filter for XML it becomes possible to map XML
elements to the fields understood by the search engine, making such queries
a feasible proposition, which is not the case for HTML.

Switching thrreads, I am a little surprised by Tim's remarks on word
proximity versus character proximity. Confining our attention to European
languages (as most search engines do), word proximity searching is a common
feature of the high-end search engines, whereas character proximity is
hardly found outside basic desktop tools like grep. Apart from anything
else, once you've done the word normalisation (normalising different
linguistic forms or spellings of the same word), character proximity is
meaningless. In the older boolean engines word proximity is used rather
mechanistically, in the newer engines it is used more subtly as part of a
statistical or linguistic approach to relevance ranking, but either way it
is an established feature of the scene, and it is not there on whim: the
search algorithms used are based on extensive research and benchmarking of
relevance and recall scores.

An interesting comparison of web search engines is at
http://www.netstrider.com/search/features.html ; this asserts that all the
well-known web search engines other than Lycos use word proximity matching.
(A good survey in spite of the fact that it fails to distinguish the
effectiveness of the query matcher from the effectiveness of the web
crawler)

Mike Kay


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.