[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Advanced text searching vs XML??? (was Re: Note from theTroll)


google vs verity
10/28/2002 2:38:19 PM, tblanchard@m... wrote:

>
>The current solution is to use a free form text indexer like verity, 
>autonomy, or the google appliance to handle resumes and other 
>documents, and relational db for structured info.  Text indexers based 
>on interesting fuzzy match and bayesian techniques are rapidly reducing 
>the requirement for markup in document management I think.  Google is 
>an excellent example (and now you can get it in a box).

Hmm, this sounds like a more interesting topic than sectarian squabbles
over the hermeneutics of the XSLT spec.  <hint, hint>

On one hand, one major value of XML to me (employee of XML DBMS vendor)
is to avoid the necessity of separating the "text" from the "structured
info."  While XML DB's don't have the advanced fuzzy/baysean capabilities
of high-end text DBs (yet!), they do have the ability to query for
text matches IN THE CONTEXT OF the structure.  Given a certain amount
of predictability about the tagging of a resume, one could look for 
people with actual EXPERIENCE with some technology combination (Java
on Linux, for example) rather than just "Java" and "Linux" mentioned
somewhere near each other or whatever.  

(I don't know if Verity can do this too with whatever knowledge of tags
that it has, but I sure can't figure out how to do with with Google!)

On the other hand, I must say that for me in daily life, Google allows
all sorts of useful queries that 5 years ago I thought  would
require the widespread adoption of XML and XML-based format standards
(e.g., for resumes).  Certainly many of the claims/proposals of metadata
advocates 5 years ago look a bit shopworn in hindsight now that we
see how well Google does by ignoring all (most?) metadata other than the
linking patterns.  Likewise (playing troll and jumping out from 
under one of my favorite bridges) the Semantic Web vision seems a lot
less compelling after experiencing Google for a few years than it might
in a Google-less world.  Why invest in all that metadata when Google 
a) will ignore it anyway and b) does 80% of what the metadata would
allow with ZERO additional effort by web authors/developers?

Do others think that this trend will continue (for the Web, not for
aircraft maintenance manuals or public safety agencies, please!) ?
To what extent does putting heuristic smarts in the indexing/search
engine rather than structured tags in the text take us where we want to
go?  Or are we headed toward a local maxima that merely distracts
ordinary users from the need to learn markup/metadata and Google
from the need to support XML and/or RDF to achieve a more global
optima?



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.