[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: indexing and querying XML (not XQuery)


indexing xml documents lucene
* Robert Koberg <rob@k...> [2005-08-23 09:06]:
> Hi,
> 
> Someone on the Lucene user's list posted a link to this paper:

> http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html

> that talks about indexing and searching XML documents. I have been doing 
> something similar for a while (3 years, I think) but it is specific to 
> our configuration/content which probably doesn't have wider 
> applicability. I have also found it to be:

> "a fast, reliable XML search engine, which has exceeded our expectations 
> in terms of flexibility and low development cost."

> I was thinking the article would be of interest to many people here. I 
> was also wondering about your thoughts on this method of dealing with 
> XML. I have not looked in depth at XQuery, and I am wondering what 
> strengths/benefits XQuery would have over using something like Lucene to 
> index/query XML.

> It would be interesting to see what folk from this list would come up 
> with if they put their brains to work on ways to handle 
> indexing/searching with something like Lucene.

    Len was in a thread a while back, on Web 2.0, where I posited
    the notion of a REST interface to full text search of syndicated
    feeds, or blogs.

    While we're at it, Len, did you think about that any further?

    Reading through the article, the thing that strikes me is that
    it that full text search of an XML document depends so much on
    the structure of the document. If that document can be divided
    into chapters, messages, articles, pages, etc, then it's best to
    create a full-text index with application specific documents.

    So, perhaps, the scaleable solution, is full-text engine that
    is fed a XML documents, and a full-text indexing schema.

    The existing schema langauges like to atomize documents, while a
    full-text indexing schema might group their elements into
    concepts, like paths, links, articles, and clues for ranking
    articles based on conditions specified in XPath.

    I've wanted to explore the use of Lucene in my document object
    model, so I'd like to hear more about this.

--
Alan Gutierrez - alan@e...
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.