|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Advanced text searching vs XML??? (was Re: Note from theTroll)
10/28/2002 2:38:19 PM, tblanchard@m... wrote: > >The current solution is to use a free form text indexer like verity, >autonomy, or the google appliance to handle resumes and other >documents, and relational db for structured info. Text indexers based >on interesting fuzzy match and bayesian techniques are rapidly reducing >the requirement for markup in document management I think. Google is >an excellent example (and now you can get it in a box). Hmm, this sounds like a more interesting topic than sectarian squabbles over the hermeneutics of the XSLT spec. <hint, hint> On one hand, one major value of XML to me (employee of XML DBMS vendor) is to avoid the necessity of separating the "text" from the "structured info." While XML DB's don't have the advanced fuzzy/baysean capabilities of high-end text DBs (yet!), they do have the ability to query for text matches IN THE CONTEXT OF the structure. Given a certain amount of predictability about the tagging of a resume, one could look for people with actual EXPERIENCE with some technology combination (Java on Linux, for example) rather than just "Java" and "Linux" mentioned somewhere near each other or whatever. (I don't know if Verity can do this too with whatever knowledge of tags that it has, but I sure can't figure out how to do with with Google!) On the other hand, I must say that for me in daily life, Google allows all sorts of useful queries that 5 years ago I thought would require the widespread adoption of XML and XML-based format standards (e.g., for resumes). Certainly many of the claims/proposals of metadata advocates 5 years ago look a bit shopworn in hindsight now that we see how well Google does by ignoring all (most?) metadata other than the linking patterns. Likewise (playing troll and jumping out from under one of my favorite bridges) the Semantic Web vision seems a lot less compelling after experiencing Google for a few years than it might in a Google-less world. Why invest in all that metadata when Google a) will ignore it anyway and b) does 80% of what the metadata would allow with ZERO additional effort by web authors/developers? Do others think that this trend will continue (for the Web, not for aircraft maintenance manuals or public safety agencies, please!) ? To what extent does putting heuristic smarts in the indexing/search engine rather than structured tags in the text take us where we want to go? Or are we headed toward a local maxima that merely distracts ordinary users from the need to learn markup/metadata and Google from the need to support XML and/or RDF to achieve a more global optima?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








