[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: When Searching With Google

  • To: 'Murali Mani' <mani@C...>, xml-dev@l...
  • Subject: RE: When Searching With Google
  • From: "Bullard, Claude L (Len)" <clbullar@i...>
  • Date: Mon, 8 Dec 2003 09:57:37 -0600

google mystery
None of which does the average user understand. 
The question is what would they pay in terms 
of learning curve or subscription costs for a 
search engine that behaves exactly as they 
think it should.

The model with which they insert terms and the 
results are at variance; two systems are contending 
for the same resource.  It makes it's best 
guess, and then the user starts searching in 
the results.  Ok.  The human is smart enough 
one assumes to recognize what they are looking 
for if they find it in the results.  On the 
other hand, ascribing importance to the order 
of the results, the Google numbers, or the 
negative space (results not returned) is at 
best, a superstitious endeavor as long as 
the model they used to pick the initial 
terms and the model by which those terms 
are used to select results are not the same. 

Multiple systems contending for the same resource 
is a working definition of non-linearity, or 
unpredictable correlation.  This is the well-known 
mental ontology contending with the search ontology 
problem.

Then there is the further problem of source vetting.
Are authors doing high quality credentialed work?

Note that Michael Kay did not write that first 
bit below.  I did.  You removed my name and 
left Michael's.   Now what does Google do with 
that?  Possibly nothing, but a human might and 
it is likely to be wrong unless they follow 
the thread back to pick up the source.  Now we have not 
only the mystery of Google's algorithms, but 
the vagaries of human authoring habits.  That 
is why credentialed sources would be of value 
as part of a search filter.  Let's say I am a 
university professor and I want my students to 
use the web to do research.  How should I interpret 
their results if their sources are uncredentialed?

The simple interface can lead to amplified error.  
The complex interface can lead to high costs and reduce the scale 
of use.  But is it better to swap scale for reliable results?

len

Also I heard recently that google is making the search results adaptive
based on user using some heuristics - probably domain or something..??

In short, I heard that if I search for the key words "w1 w2 ..." and
someone else searches for the same set of key words, google might give
different ranked results - in other words, user perceives the results
ranking as non-deterministic.

I am not sure if that is true actually.. can someone confirm this??

Google uses lot of proprietary heuristics for fine-tuning the search
results ranking, such as tf-idf (which is greater weight to a term that
occurs infrequently) which is well known in literature etc...

anyways, best, murali.

On Mon, 8 Dec 2003, Michael Kay wrote:

> > That is why I wondered if it picked up on the topic
> > word or phrase.  That is likely what they are after.
> > The other words are qualifiers, at least, that is
> > how I use it.  I was questioning the Google strategy
> > because I realized I have a mental model of how it
> > works, and that is how I select and enter search
> > terms.  It is probably not the right mental model
> > but the interface doesn't make it clear, and as a
> > result, its filtering strategy is opaque.  The user
> > does the best they can.
>
> Most modern search engines give greater weight to a term the more
> infrequent it is in the corpus. Most also weight terms according to
> where and how often they appear in the source document, and some also
> recognize when adjacent words in the query constitute a noun phrase.
> What google does is anyone's guess.
>
> Michael Kay
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.