Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform s

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$'))
From: Liam R E Quin <liam@xxxxxx>
Date: Wed, 13 Jul 2011 19:12:30 -0400

O
> Another interesting article is this one describing some of the
> optimizations performed by the regex engine in Google Chrome:
> http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html
> 
> This mentions another trick used by some regex implementations.  In
> their example "Sun|Mon", their engine recognises that a match for this
> expression always contains "n" in the third character, and so rather
> than testing for a match at each index in the string (which was the
> problem with the example given) they first scan the string to find "n"
> characters and only try to apply the regex starting two characters
> preceding one.

This is similar to the Boyer-Moore algorithm that Perl (and probably
pcre) also uses - originally it was just for constant strings. There was
a fun race to get the fastest "grep" in the early 1990s.

The difficulty with a lot of these optimizations is that capturing
groups can make them harder - and in Perl people use capturing groups
all the time of course. You can't turn (font|foot) into fo(nt|ot), and
although you can turn it into (fo(?:nt|ot)) you may then lose the
benefit.

Since the article Mike Key mentioned was written in 2007, Perl's regexps
seem to have got a lot faster. But even in the early 1990s I remember
converting sed scripts into perl and getting huge speedup, even though
Perl was using recursive back-tracking. The BM-style delta tables
probably accounted for most of that.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/

Current Thread
Re: why matches($title,'.?(\.\|,)\s$')) can perform so much worse than matches($title,'(\.\|,)\s$')), (continued)* Michael Kay - 13 Jul 2011 14:14:04 -0000 Oliver Hallam - 13 Jul 2011 17:46:37 -0000 Michael Kay - 13 Jul 2011 18:17:55 -0000 Oliver Hallam - 13 Jul 2011 18:56:39 -0000 Liam R E Quin - 13 Jul 2011 23:13:08 -0000 <=

<- Previous	Index	Next ->
Re: why matches($title,'.?(\, Oliver Hallam*	Thread	format-number/bankers' roundi, daniel whitney
Re: why matches($title,'.?(\, Oliver Hallam*	Date	format-dateTime and time zone, Birnbaum, David J
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >