[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Regular Expression search

John Snelson jsnelson at sleepycat.com
Thu Dec 15 18:12:35 PST 2005


expression lookup
Martin Probst wrote:
>>It's not impossible - although I don't know of any XML database that 
>>does use indexes for regular expressions at the moment.
> 
> Are you sure? It's probably possible for simple cases (e.g. "Foo|Bar"),
> but for general regular expressions? How would you do that?

Sleepycat's Berkeley DB XML has what it calls a "substring" index which 
is currently used to optimise fn:contains(), fn:starts-with() and 
fn:ends-with(). This works by splitting the content down into sequential 
three character segments, ie:

"abccccb" is split into "abc", "bcc", "ccc", "ccb"

This type of index could be used to optimise regular expression. If you 
define a regular expression to match the string above, it might look 
like this:

"abc+b"

 From this regular expression, you can see that the keys you need to 
look up in the container are:

"abc" & ("bcb" | ("bcc" & "ccb") | ("bcc" & "ccc" & "ccb"))

Where "&" is set intersect of the index results, and "|" is set union. 
So a simple combination of index lookups will give you a candidate set 
for matching the regular expression. Of course, you will need to double 
check that the candidate set /actually/ fits the regular expression, 
after doing the index lookups.

> Apart from that, if you need regular expressions to search your XML,
> there's probably a major problem with your XML design ;-)

Search and querying are very different. Search is basically for 
document-centric XML (like XHTML), where as querying is for data-centric 
XML (like invoices, etc). If you're using regular expressions for 
data-centric XML, then I'd say you have a design flaw - but not if you 
are using them for document-centric XML.

John

-- 
John Snelson, Berkeley DB XML Engineer
Sleepycat Software, Inc
http://www.sleepycat.com

Contracted to Sleepycat through Parthenon Computing Ltd
http://blog.parthcomp.com/dbxml


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.