[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Regular Expression search

Martin Probst martin at x-hive.com
Fri Dec 16 11:05:37 PST 2005


expression lookup
> > Are you sure? It's probably possible for simple cases (e.g. "Foo|Bar"),
> > but for general regular expressions? How would you do that?
> 
> Sleepycat's Berkeley DB XML has what it calls a "substring" index which 
> is currently used to optimise fn:contains(), fn:starts-with() and 
> fn:ends-with(). This works by splitting the content down into sequential 
> three character segments, ie:
> 
> "abccccb" is split into "abc", "bcc", "ccc", "ccb"
> 
> This type of index could be used to optimise regular expression. If you 
> define a regular expression to match the string above, it might look 
> like this:
> 
> "abc+b"
> 
>  From this regular expression, you can see that the keys you need to 
> look up in the container are:
> 
> "abc" & ("bcb" | ("bcc" & "ccb") | ("bcc" & "ccc" & "ccb"))

Interesting. Though of course the real general case for Regular
Expressions is probably just out of reach.

> > Apart from that, if you need regular expressions to search your XML,
> > there's probably a major problem with your XML design ;-)
> 
> Search and querying are very different. Search is basically for 
> document-centric XML (like XHTML), where as querying is for data-centric 
> XML (like invoices, etc). If you're using regular expressions for 
> data-centric XML, then I'd say you have a design flaw - but not if you 
> are using them for document-centric XML.

Yes, that's right. But if your "searching" in the document way, then
what you want is probably a full text index operating on tokens (as
Jason posted) rather than a regular expression specification of the
content. I use to think of regular expressions as structured search for
non structured content ;-)

Martin



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.