[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

SAX, DOM, and Search Engines (was Re: xml parser)

  • From: <david@m...>
  • To: <xml-dev@i...>
  • Date: Wed, 4 Nov 1998 17:32:41 -0500 (EST)

search engines parser
Tim Bray writes:

 > At 10:55 AM 11/4/98 -0000, Michael Kay wrote:
 > >My immediate answer to this is yes, all the information you need for a
 > >search engine is available via the SAX or DOM interface offered by many
 > >parsers.
 > 
 > I disagree.  Few parsers track byte offsets or other locational info in
 > the file, and I think you need that to do basic things like proximity
 > and phrase search.

I disagree.  While byte offsets might be useful for other purposes,
they would be inappropriate for proximity and phrase searches -- for
those, you need to track the relative positions of words, not their
absolute positions.  Consider the following example:

  <p>WORD1 &x; WORD2</p>

Is WORD1 close to WORD2?  It's only five bytes away (assuming an 8-bit
encoding), but might be separated by 20,000 words, depending on what
&x; expands to.  SAX and the DOM do give you enough information to
determine the relative positions of words.

Byte offsets would be helpful for displaying context around a match,
but there would be no 100% reliable way to format that context without
starting from the top of the document, in which case an XPOINTER (also
derivable from SAX or DOM) might be more helpful unless you want the
search engine to display raw XML markup for the context.


All the best,


David

-- 
David Megginson                 david@m...
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.