[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Complex Regex takes 201 steps in regex buddy but runs

Subject: Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String
From: Alex Muir <alex.g.muir@xxxxxxxxx>
Date: Mon, 31 Jan 2011 18:40:18 +0000
 Complex Regex takes 201 steps in regex buddy but runs
Hi,

With the following code:
------------------------------

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:saxon="http://saxon.sf.net/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
  version="2.0"  exclude-result-prefixes="#all">
  <xsl:output method="xml" indent="no"/>


  <xsl:template match="unknown[exists(text())]">
    <xsl:copy>
      <xsl:copy-of select="@*"/>

      <xsl:call-template name="CompleteListAnalyze">
        <xsl:with-param name="content" select="text()"/>
      </xsl:call-template>

    </xsl:copy>
  </xsl:template>


  <xsl:template name="CompleteListAnalyze">
    <xsl:param name="content"/>

    <xsl:variable name="CompleteListIdentificationRegex" >
      <xsl:text>((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)(((B+[^B;B$]+B;|\s+|B
'[^B'B$]+B'){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})</xsl:tex
t>
    </xsl:variable>

    <xsl:analyze-string select="$content"
regex="{$CompleteListIdentificationRegex}">
      <xsl:matching-substring>
        <xsl:text>B$COMPLETELIST POSITION="</xsl:text>
        <xsl:value-of select="position()"/>
        <xsl:text>" PLACEMENT=""B$</xsl:text>
        <xsl:value-of select="regex-group(1)"/>
        <xsl:text>B$b
/COMPLETELISTB$</xsl:text>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>


And the following input file:
----------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<doc>
    <unknown>B$LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT"
TYPE="SNLI"B$B+B'HLB'FONT size="2"
id="H13211"B;15B+/B'HLB'FONTB;B+/B'HLB'TDB;
   B+B'HLB'TD id="H13213"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13215"B;B+B'HLB'TD
id="H13216"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13218"B;
      B+B'HLB'TD id="H13220"B;B+/B'HLB'TDB;         B+B'HLB'TD colspan="2"
id="H13222"B;B+B'HLB'FONT size="2" id="H13223"B;TEXT TEXT TEXT
TEXTB+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB;         B+TD id="H13225"B;B+/TDB;
B+TD id="H13227"B;B+/TDB;         B+TD id="H13229"B;B+/TDB;         B+TD
id="H13231"B;B+/TDB;         B+TD align="right" id="H13233"B;B$LISTITEM
BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
id="H13234"B;16B+/B'HLB'FONTB;B+/B'HLB'TDB;         B+B'HLB'TD
id="H13236"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13238"B;B+B'HLB'TD
id="H13239"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13241"B;
      B+B'HLB'TD id="H13243"B;B+/B'HLB'TDB;         B+B'HLB'TD colspan="2"
id="H13245"B;B+B'HLB'FONT size="2" id="H13246"B;TEXT TEXT TEXT TEXT TEXT
B+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB;         B+TD id="H13248"B;B+/TDB;
B+TD
id="H13250"B;B+/TDB;         B+TD id="H13252"B;B+/TDB;         B+TD
id="H13254"B;B+/TDB;         B+TD align="right" id="H13256"B;B$LISTITEM
BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
id="H13257"B;17B+/B'HLB'FONTB;B+/B'HLB'TDB;         B+B'HLB'TD
id="H13259"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13261"B;B+B'HLB'TD
id="H13262"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13264"B;
      B+B'HLB'TD id="H13266"B;B+/B'HLB'TDB;         B+B'HLB'TD colspan="2"
id="H13268"B;B+B'HLB'FONT size="2" id="H13269"B;TEXT TEXT TEXT TEXT TEXT
B+/B'HLB'FONTB;B$/LISTITEMB$</unknown>
</doc>

The regex held in the variable CompleteListIdentificationRegex runs
fine on the same input executing to completion in 201 steps. It
essentially just identifies all the content within the above <unknown>
element.

However the equivalent Analyze-String running in oxygen 12.1 will
continue running and not stop on the same input.

Any ideas?

Been working on it for 4 hours without much progress other than
reducing the number of execution steps in regex buddy by 40.

Thanks Much


--
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp

Previously:
https://sites.google.com/a/utg.edu.gm/alex/


A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.