[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XPath 2.0 Regex misunderstanding

Subject: Re: XPath 2.0 Regex misunderstanding
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Fri, 19 Jan 2007 20:58:56 +0100
Re:  XPath 2.0 Regex misunderstanding
cknell@xxxxxxxxxx wrote:
I have a date element:

example

<DATE>11/01/2006</DATE>

I'm trying to write an XPath 2.0 Regex to winnow some of the more obvious date format errors. I have tried for about a half-hour, and I admit to being stumped.

I have some trouble with understanding your "passing" and "failing" is about. However, if you are trying to remove the "more obvious date format errors", I believe your "matches(...)" needs to become a "not(matches(...))", since your regular expression is about inclusion, not exclusion.


That said, you can try the following (assuming American dates: MM/DD/YYYY) for matching any date, disallowing years > 2006 and allowing the format 1/2/2006:

<xsl:variable name="dates">
<DATE>07/18/2006</DATE>
<DATE>07/12/2006</DATE>
<DATE>09/25/2006</DATE>
<DATE>10/24/2006</DATE>
<DATE>10/18/2006</DATE>
<DATE>10/10/2006</DATE>
<DATE>1/2/2006</DATE>
<!-- false dates -->
<DATE>22/12/2006</DATE> <DATE>00/10/2000</DATE> <DATE>01/32/2006</DATE> <DATE>10/10/2007</DATE> <DATE>12/12/20006</DATE> </xsl:variable>


<xsl:variable name="date-regex">^(
   0?[1-9]|     <!-- 01-09 and 1-9 -->
   1[0-2]       <!-- 10, 11, 12 -->
   )/(
   0?[1-9]|     <!-- 01-09 and 1-9-->
   [1-2]\d|     <!-- 10-20 -->
   3[01]        <!-- 30, 31 -->
   )/(
   1\d{3}|      <!-- 1000-1999 -->
   200[0-6]     <!-- 2000-2006 -->
   )$
</xsl:variable>

<xsl:for-each select="$dates/DATE">
<xsl:value-of select="concat(., ': ')" />
<!-- add normalize-space, because of a bug
in saxon prior to 8.0.0.4 with leading space -->
<xsl:value-of select="matches(.,
normalize-space($date-regex), 'x')" />
<xsl:text>&#xA;</xsl:text>
</xsl:for-each>


This outputs:
07/18/2006: true
07/12/2006: true
09/25/2006: true
10/24/2006: true
10/18/2006: true
10/10/2006: true
1/2/2006: true
22/12/2006: false
00/10/2000: false
01/32/2006: false
12/12/20006: false


Here is the relevant part of the template:


<xsl:when test="matches(DATE,'[0-1][0-2]/[0-3][0-9]/2006')"><bad-date /></xsl:when>

What your statement implies is: output "bad-date" node when:
1) a date month is in the range (00, 01, 02, 10, 11, 12)
2) a date day is in the range (00, 01,... 09, 10, 11,.... 19, 20, 21, .... 29, 30, 31, ... 39
3) the year is 2006.


Well, I don't know much of your calendar system, but I can hardly believe you consider a date as "00/39/2006" as being correct, so here's a part of your problem. I know from my own experience that regexing numeric values is a tricky business (and is: think strings, not numbers).

For an article I wanted to write for a long time, but still haven't, I created a template that helps in regexing numeric values. It will simply output the right regexes for you, if you give it a number:

my:regex-from-number('376', 0)
will give:
[0-2]\d{2}|
3[0-6]\d|
37[0-5]|
376|\d{2}

it requires some getting used to, but I recall that Jeffrey Friedl named this: enrolling the number, or something similar. For small numbers you can easily do it by hand, but it is still hard for many mere mortals. It is optimized for repeated digits (like 2006). The output regex works perfect. A few notes (if you plan to use it):

|\d{2}
Leave out this part if you require a fixed number of digits. I.e.: 034 and 009. By default, 34, 9 etc are allowed.


376
The input number. Repeating the number is not necessary for making a bullet proof regular expression, but it made me feel good. The larger the maximum number you need to match, the easier it gets putting it there: you see instantly what number is being matched.


The rest speaks for itself, I believe. But call in anytime if you want some additional help. The expressions in the opening are taken from this template to ensure I did the right thing, however, I made them a bit more readable.

<xsl:function name="my:regex-from-number">
<xsl:param name="number" />
<xsl:param name="pos" />
<xsl:variable name="digit1" select="substring($number, $pos, 1)" />
<xsl:variable name="digit2" select="substring($number, $pos + 1, 1)" />
<xsl:variable name="len" select="string-length($number)" />
<xsl:value-of select="
if($len = $pos)
then concat
(
$digit1,
'|\d',
if($pos - 1 le 1) then ''
else concat('{', $pos - 1, '}')
)
else
if ($digit2 = '0')
then concat
(
$digit1,
my:regex-from-number($number, $pos + 1)
)
else concat
(
$digit1,
if(xs:integer($digit2) - 1 = 0) then '0'
else concat('[0-', xs:integer($digit2) - 1, ']'),
if($pos + 1 = $len) then '|'
else
if($len - $pos - 1 = 1) then '\d|'
else concat('\d{', $len - $pos - 1, '}|'),
'&#xA;', substring($number, 1, $pos),
my:regex-from-number($number, $pos + 1)
)" />
</xsl:function>


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.