[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: xml search engine?

  • From: "Martin Bryan" <mtbryan@s...>
  • To: <xml-dev@x...>
  • Date: Tue, 4 Apr 2000 09:08:25 +0100

hamlet search engine
It strikes me that this debate is missing something.

The advantage of XML queries over other forms of query is that you can use context to identify the subset of information within a document that you need to search to find a meaningful result. Instead of having to look at all indexed occurrences of the term you only need to look at that subset that are "associated" with a given context. This should, hopefully, reduce the level of information overload we all suffer from at present.

The key to efficiency is going to be the mapping  between the semantics of the context determining elements (remembering that we are talking about a chain of ancestors for most elements) and the terms used in a natural language (or near natural language) query. Unless there is a close match between query semantics and markup semantics the results of the query will be meaningless.

The first question that needs to be asked is "how do users identify the contexts in which data is likely to be meaningful?" Take the example used in another thread "like "find a SPEECH whose SPEAKER contains 'hamlet'". What happens if I coded my text as <Hamlet>To be or not to be</Hamlet>? How do I know that the tag name identifies the speaker of a speech? Yet it obviously does - thats the whole intention of the tag. OK, so its a non-generalized DTD, bad practice. But what about <Part role="Hamlet">To be or not to be</Part>. Again how do I relate the tag to the query? 

Structured queries can only be generated accurately from knowledge of the DTD they are intended to query contents related to. Len Bullard hit the nail on the head. The first port of call is the namespace. The second is the DTD/schema for that namespace, and the third is the contents of elements coded using a specific element within the DTD. Queries need to be based on the elements defined for a particular namespace.

So lets try to write queries based on this, something like:

Find me occurences of the phrase "ABC DEF" within elements whose parents contain "ELEMENT-X" or "Attribute-Y" within "Namespace-Z"

Indexing for such a query will need to be based on a combination of Namespace, Context and Contents. Omitting any one of these components will make it impossible to efficiently search servers. To suggest that we might be able to map between the contexts in different namespaces is, I feel, going to be beyond what current systems will be able to provide. But we need to consider how it might be possible to do this.

The real key is how we break the query down into something that can be exchanged between servers. Something along the lines of:

<query>
<namespace>www.mysource.com</namespace>
<element>Part</element>
<attribute name="role">Hamlet</attribute>
<content>ABC DEF</content>
</query>

might allow us to allow different servers to use different tools to select the data using differing engines.

Martin Bryan


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.