|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] Regular Expression searchJohn Snelson jsnelson at sleepycat.comThu Dec 15 18:00:37 PST 2005
Hi Martin, I've used the MarkLogic database. However, the custom ctd:contains() function is not quite the same as the full regular expressions accepted by fn:matches(), is it? Does MarkLogic use it's indexes to optimise fn:matches()? John Jason Hunter wrote: > MarkLogic actually uses indexes for wildcard queries. For example, the > original poster's questions about finding things starting with > "MyNameIs" could be solved efficiently using a query like this: > > //(subTagA|subTagB)[starts-with(., "MyNameIs")] > > That should execute efficiently against a large data set if the > character indexes are enabled. If the poster instead wanted any word > token to start with that sequence of characters (rather than the element > itself), he could use the MarkLogic function cts:contains() and the * > wildcard: > > //(subTagA|subTagB)[cts:contains(., "MyNameIs*")] > > The cts:* functions operate on tokens rather than simple character > sequences, providing search engine style features. You can see the > difference in the previously discussed query to find the token "Name". > Using standard XQuery you write this: > > //*[contains(., "Name")] > > But this matches "xName" and "Nameste". When I search for "foo" I don't > want to find "food"! Using cts:contains() you match just word tokens: > > //*[cts:contains(., "Name")] > > The tokens are broken at index time according to language rules, and you > have the option at query time to specify stemming rules (should Names > and Naming match?), case sensitivity (is "name" ok?), thesaurus (what > about "nom de plume"?), and so on. > > It's fun stuff. I wrote about this in longer form at: > http://idealliance.org/proceedings/xtech05/papers/02-04-01/ > > -jh- > _______________________________________________ > http://xquery.com/mailman/listinfo/talk > http://xquery.com/mailman/listinfo/talk -- John Snelson, Berkeley DB XML Engineer Sleepycat Software, Inc http://www.sleepycat.com Contracted to Sleepycat through Parthenon Computing Ltd http://blog.parthcomp.com/dbxml
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






