[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Regular Expression search

John Snelson jsnelson at sleepycat.com
Thu Dec 15 18:00:37 PST 2005


marklogic search
Hi Martin,

I've used the MarkLogic database. However, the custom ctd:contains() 
function is not quite the same as the full regular expressions accepted 
by fn:matches(), is it?

Does MarkLogic use it's indexes to optimise fn:matches()?

John

Jason Hunter wrote:
> MarkLogic actually uses indexes for wildcard queries.  For example, the 
> original poster's questions about finding things starting with 
> "MyNameIs" could be solved efficiently using a query like this:
> 
> //(subTagA|subTagB)[starts-with(., "MyNameIs")]
> 
> That should execute efficiently against a large data set if the 
> character indexes are enabled.  If the poster instead wanted any word 
> token to start with that sequence of characters (rather than the element 
> itself), he could use the MarkLogic function cts:contains() and the * 
> wildcard:
> 
> //(subTagA|subTagB)[cts:contains(., "MyNameIs*")]
> 
> The cts:* functions operate on tokens rather than simple character 
> sequences, providing search engine style features.  You can see the 
> difference in the previously discussed query to find the token "Name". 
> Using standard XQuery you write this:
> 
> //*[contains(., "Name")]
> 
> But this matches "xName" and "Nameste".  When I search for "foo" I don't 
> want to find "food"!  Using cts:contains() you match just word tokens:
> 
> //*[cts:contains(., "Name")]
> 
> The tokens are broken at index time according to language rules, and you 
> have the option at query time to specify stemming rules (should Names 
> and Naming match?), case sensitivity (is "name" ok?), thesaurus (what 
> about "nom de plume"?), and so on.
> 
> It's fun stuff.  I wrote about this in longer form at:
> http://idealliance.org/proceedings/xtech05/papers/02-04-01/
> 
> -jh-
> _______________________________________________
> http://xquery.com/mailman/listinfo/talk
> http://xquery.com/mailman/listinfo/talk


-- 
John Snelson, Berkeley DB XML Engineer
Sleepycat Software, Inc
http://www.sleepycat.com

Contracted to Sleepycat through Parthenon Computing Ltd
http://blog.parthcomp.com/dbxml


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.