[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Backtracking and eternal loops caused by regular
Michael Kay wrote:
(b) The specs have nothing to say about performance. I understand. Sounds reasonable and logical. (c) Saxon relies entirely on the regex engines in the underlying platform I cannot speak for .NET, but in the thread on the Saxon list I go deeper into this. It's not just about a spurious optimization feat or mismatch, it is about quite a common case. (d) I was under the impression that regular expression evaluation will In technical terms, you are right, however, not if you include the definition the astronomers have given about time: it ends some day. Jeffrey Friedl gives a calculation on page 227 of his 2nd ed book that a 50 char string with a bad regular expression, may run for a years. Here's the same with Java: 25 char: 2.5 seconds 26 char: 5.3 seconds 27 char: 10.7 seconds 28 char: 21.4 seconds 35 char: 2739 seconds 40 char: 24.4 hours 50 char: almost 3 years 85 char: 97 * 10^9 years (97 billion years) Meaning that if I apply such a regex to a corpus of over 85 characters, the process will stop long after the the whole universe has died out (current thesis: it will last about 30 billion years at most). Which is pretty long ;) Calculation of this performance penalty: 2^(stringlen) * (time-one-char-match) Java (Sun) is just a little behind with adding this optimization, because really, there is no need in testing any string more than string-len times, usually much less even. -- Abel Braaksma
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|