[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: String contains a regex and then junk ... how to

Subject: Re: String contains a regex and then junk ... how to remove the junk?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Dec 2024 13:41:25 -0000
Re:  String contains a regex and then junk ... how to
A good case for Invisible XML, though sadly we don't have it integrated into
Saxon yet.

The first step here is finding a matching closing paren. The second step is
dealing with backslash-escaped parens.

For the first step, I would use xsl:iterate iterating over the characters of
the string (in 4.0 use the fn:characters function, in 3.0 use
string-to-codepoints). Maintain a variable $depth over the iteration,
increment it on a left paren, decrement it on a right paren, break the
iteration when the depth reaches zero.

Then handling backslashes is just an extra bit of logic: in your xsl:iterate,
define a second variable that indicates whether the immediately preceding
character is a backslash (or rather, an unescaped backslash) and avoid
recognizing parens if it is.

Michael Kay
Saxonica



> On 16 Dec 2024, at 13:24, Roger L Costello costello@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Folks,
>
> I want to convert this:
>
> <REG_EXP>(([\W\w]{1,80})?) &lt;INFO&gt;</REG_EXP>
>
> to this:
>
> <REG_EXP>(([\W\w]{1,80})?)</REG_EXP>
>
> Convert this:
>
> <REG_EXP>([A-Z]{2}[0-9A-Z ]{0,13}) &lt;ARF ID&gt;</REG_EXP>
>
> to this:
>
> <REG_EXP>([A-Z]{2}[0-9A-Z ]{0,13})</REG_EXP>
>
> I want to remove the junk that follows the regex.
>
> I wrote a recursive function to do this. See below. Is there is a simpler
way to do it?
>
> -------------------------------------
> <xsl:function name="f:get-regex">
>    <xsl:param name="string"/>
>    <xsl:choose>
>        <xsl:when test="substring($string,1,1) ne '('">
>            <xsl:message>Error! Expecting the regex to start with left
paren</xsl:message>
>        </xsl:when>
>        <xsl:otherwise>
>            <xsl:value-of
select="concat('(',f:get-regex-helper($string,2,1))"/>
>        </xsl:otherwise>
>    </xsl:choose>
> </xsl:function>
>
> <xsl:function name="f:get-regex-helper">
>    <xsl:param name="string"/>
>    <xsl:param name="index"/>
>    <xsl:param name="count-left-parens-to-match"/>
>    <xsl:choose>
>        <xsl:when test="$count-left-parens-to-match eq 0">
>            <xsl:value-of select="substring($string,1,$index - 1)"/>
>        </xsl:when>
>        <xsl:when test="substring($string,$index,1) eq ')'">
>            <xsl:value-of
select="f:get-regex-helper($string,$index+1,$count-left-parens-to-match -
1)"/>
>        </xsl:when>
>        <xsl:otherwise>
>            <xsl:value-of
select="f:get-regex-helper($string,$index+1,$count-left-parens-to-match)"/>
>        </xsl:otherwise>
>    </xsl:choose>
> </xsl:function>

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.