[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Special characters in regex expression

Subject: Re: Special characters in regex expression
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 24 Jul 2014 04:46:53 -0000
Re:  Special characters in regex expression
On 23/07/2014, Michael Dykman mdykman@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> It is my understanding that Java' regular expression builtin emulates
> 'pcre' pretty closely.

Perl 5 has, over time, added some rather unique features that aren't
available with Java. XPath is a subset of Java's regex.

>
> To escape spacial characters that have special meaning in a regular
> expression, defining it as a character class (using the square bracket
> notation) generally works
>
> ie. if you want to match a question mark at the beginning of a line,
> use:  "^[?].*$"

Thus,  regex="(\.|\!|\?)(?!\)|\.|\d|\w)" (ignoring the lack of look-ahead)
were better rewritten as

     regex="[.!?](?![).\d\w])" <!-- not valid -->

It is possible to select groups within the matching substring:

     regex="([.!?])([^).\d\w])"

Thus, in this simple case it is possible to use regex-group(1) and
regex-group(2)
to get the two characters individually, and insert nodes as required.

I am not sure what Gabor expects to happen with, e.g., "...??..." or
"...!!...", which are matched by this regex.

-W

>
> On Wed, Jul 23, 2014 at 3:55 PM, mike@xxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Exclamation mark is not a special character in XPath regular expressions,
>> and there does not need to be (and must not be) escaped.
>>
>> Negative lookaheads are not supported in the XPath regular expression
>> dialect.
>>
>> You can't assume that all regular expression dialects are the same.
>>
>> Michael Kay
>>
>> Saxonica
>>
>>
>>
>>> Dear All,
>>>
>>> I am using xsl:analyze-string to retrieve and replace punctuation,
>>> however, I got the following error:
>>>
>>>  Error in regular expression: net.sf.saxon.trans.XPathException: Syntax
>>> error at char 6 in regular expression: Escape character '!' not allowed.
>>>
>>> How should I escape and match '?' and '!' ? I am also using a negative
>>> look-ahead, why isn't that working?
>>>
>>> Here is a sample from my code, thanks,
>>>
>>> Gabor
>>>
>>>
>>> <xsl:template match="//TEI:p//text()[ not
>>>         ((parent::TEI:note)|(parent::TEI:hi)|(parent::TEI:date))]">
>>>  <xsl:analyze-string select="." regex="(\.|\!|\?)(?!\)|\.|\d|\w)">
>>>
>>>             <xsl:matching-substring>
>>>
>>>                 <xsl:element name="seg"
>>> namespace="http://www.tei-c.org/ns/1.0"><xsl:value-of
>>> select="."/></xsl:element>
>>>            </xsl:matching-substring>
>>>             <xsl:non-matching-substring>
>>>                 <xsl:value-of select="."/>
>>>             </xsl:non-matching-substring>
>>>         </xsl:analyze-string>
>>>
>>
>> XSL-List info and archive
>> EasyUnsubscribe (by email)
>
>
>
> --
>  - michael dykman
>  - mdykman@xxxxxxxxx
>
>  May the Source be with you.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.