[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Complex Regex takes 201 steps in regex buddy but r

Subject: Re: Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Mon, 31 Jan 2011 20:12:10 +0100
Re:  Complex Regex takes 201 steps in regex buddy but r
On 31 January 2011 19:59, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
> The parentheses '(' and ')' do not match well in <xsl:variable
> name="CompleteListIdentificationRegex" >. Please check.
>

Oops - disregard the following part of my email, please.
Sorry
-W

> But one evil subpattern is this (with spaces inserted for readability):
>
> B  ( ( B+[^B;B$]+B; | \s+ B | B B'[^B'B$]+B' B ){0,255})
>
> This will try many combinations of zero to 255 repetitions of "any
> number > 0 of spaces"
>
> Cleaner is
> B  B (\s+|( B+[^B;B$]+B;|B'[^B'B$]+B'){0,255})
>


> -W
>
> On 31 January 2011 19:40, Alex Muir <alex.g.muir@xxxxxxxxx> wrote:
>> Hi,
>>
>> With the following code:
>> ------------------------------
>>
>> <?xml version="1.0"?>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>> B xmlns:saxon="http://saxon.sf.net/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
>> B version="2.0" B exclude-result-prefixes="#all">
>> B <xsl:output method="xml" indent="no"/>
>>
>>
>> B <xsl:template match="unknown[exists(text())]">
>> B  B <xsl:copy>
>> B  B  B <xsl:copy-of select="@*"/>
>>
>> B  B  B <xsl:call-template name="CompleteListAnalyze">
>> B  B  B  B <xsl:with-param name="content" select="text()"/>
>> B  B  B </xsl:call-template>
>>
>> B  B </xsl:copy>
>> B </xsl:template>
>>
>>
>> B <xsl:template name="CompleteListAnalyze">
>> B  B <xsl:param name="content"/>
>>
>> B  B <xsl:variable name="CompleteListIdentificationRegex" >
>> B  B 
B <xsl:text>((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)(((B+[^B;B$]+B;|\s+|B'[^B
'B$]+B'){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})</xsl:text>
>> B  B </xsl:variable>
>>
>> B  B <xsl:analyze-string select="$content"
>> regex="{$CompleteListIdentificationRegex}">
>> B  B  B <xsl:matching-substring>
>> B  B  B  B <xsl:text>B$COMPLETELIST POSITION="</xsl:text>
>> B  B  B  B <xsl:value-of select="position()"/>
>> B  B  B  B <xsl:text>" PLACEMENT=""B$</xsl:text>
>> B  B  B  B <xsl:value-of select="regex-group(1)"/>
>> B  B  B  B <xsl:text>B$b
/COMPLETELISTB$</xsl:text>
>> B  B  B </xsl:matching-substring>
>> B  B  B <xsl:non-matching-substring>
>> B  B  B  B <xsl:value-of select="."/>
>> B  B  B </xsl:non-matching-substring>
>> B  B </xsl:analyze-string>
>> B </xsl:template>
>>
>> </xsl:stylesheet>
>>
>>
>> And the following input file:
>> ----------------------------------
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <doc>
>> B  B <unknown>B$LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT"
>> TYPE="SNLI"B$B+B'HLB'FONT size="2"
id="H13211"B;15B+/B'HLB'FONTB;B+/B'HLB'TDB;
>> B  B+B'HLB'TD id="H13213"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13215"B;B+B'HLB'TD
>> id="H13216"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13218"B;
>> B  B  B B+B'HLB'TD id="H13220"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13222"B;B+B'HLB'FONT size="2" id="H13223"B;TEXT TEXT TEXT
>> TEXTB+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B  B  B  B  B+TD
id="H13225"B;B+/TDB;
>> B+TD id="H13227"B;B+/TDB; B  B  B  B  B+TD id="H13229"B;B+/TDB; B  B  B  B 
B+TD
>> id="H13231"B;B+/TDB; B  B  B  B  B+TD align="right"
id="H13233"B;B$LISTITEM
>> BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
>> id="H13234"B;16B+/B'HLB'FONTB;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
>> id="H13236"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13238"B;B+B'HLB'TD
>> id="H13239"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13241"B;
>> B  B  B B+B'HLB'TD id="H13243"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13245"B;B+B'HLB'FONT size="2" id="H13246"B;TEXT TEXT TEXT TEXT TEXT
>> B+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B  B  B  B  B+TD id="H13248"B;B+/TDB;
B  B  B  B  B+TD
>> id="H13250"B;B+/TDB; B  B  B  B  B+TD id="H13252"B;B+/TDB; B  B  B  B 
B+TD
>> id="H13254"B;B+/TDB; B  B  B  B  B+TD align="right"
id="H13256"B;B$LISTITEM
>> BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
>> id="H13257"B;17B+/B'HLB'FONTB;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
>> id="H13259"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13261"B;B+B'HLB'TD
>> id="H13262"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13264"B;
>> B  B  B B+B'HLB'TD id="H13266"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13268"B;B+B'HLB'FONT size="2" id="H13269"B;TEXT TEXT TEXT TEXT TEXT
>> B+/B'HLB'FONTB;B$/LISTITEMB$</unknown>
>> </doc>
>>
>> The regex held in the variable CompleteListIdentificationRegex runs
>> fine on the same input executing to completion in 201 steps. It
>> essentially just identifies all the content within the above <unknown>
>> element.
>>
>> However the equivalent Analyze-String running in oxygen 12.1 will
>> continue running and not stop on the same input.
>>
>> Any ideas?
>>
>> Been working on it for 4 hours without much progress other than
>> reducing the number of execution steps in regex buddy by 40.
>>
>> Thanks Much
>>
>>
>> --
>> Alex
>> -----
>> Currently:
>> Freelance Software Engineer 6+ yrs exp
>>
>> Previously:
>> https://sites.google.com/a/utg.edu.gm/alex/
>>
>>
>> A Bafila, is two rivers flowing together as one:
>> http://www.facebook.com/pages/Bafila/125611807494851

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.