[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XML Schema regex

  • From: Robin Cover <robin@i...>
  • To: "Bullard, Claude L (Len)" <clbullar@i...>
  • Date: Thu, 16 Aug 2001 08:54:59 -0500 (CDT)

regex dates
Of possible interest WRT Len's question on regexes:

"Regular expressions for checking dates"
By Eric Howland and David Niergarth
In Markup Languages: Theory and Practice
ISSN 1099-6621
http://mitpress.mit.edu/MLANG
Volume 2, Issue 2 (Spring 2000 ), pages 126-132
WRT MLTP Contest on writing the shortest correct
  regular expressions for dates...

* Date checking regular expression that catches all bad dates: 245 char long
* Same regular expressison without \D characters: 277 characters
* Two regular expressions, the first of which must match to ensure that
  the expression is well formed and the second of which must not match to catch
  all the bad numbers: 184 characters total

Below we present several regular expressions for checking dates
including leap years. These expressions were inspired by the article
by C.M.  Sperberg-McQueen in Markup Languages (**see [Sperberg-McQueen
1999]).  Specifically they were inspired by the challenge at the end
of the article (and the date on which that challenge expires) to
shorten the long regular expression generated by lex.

** http://xml.coverpages.org/mltpTOC14.html#MLTP-14Sperberg

The regular expression offered here is, unfortunately, not
deterministic but it is more than an order of magnitude shorter than
the regular expression generated by lex. The expression is the inverse
of the lex expression and actually finds incorrect dates rather than
correct dates.The proposed expression uses the \D convention of Perl
and Python to detect characters that are not numbers and the
.{1,8}notation to indicate a string of one to eight characters. Note
that a somewhat longer (and arguably less readable) version of the
regular expression is also included in case you find those conventions
distasteful.

Also note that about 40% of this expression is dedicated to finding
poorly formed dates (dates not in the nnnn-nn-nn format where n is a
number).  This implies that a much shorter total expression is
possible if one allows two passes (one pass to insure that the
potential date is well-formed and the second pass to detect incorrect
dates). The percentage saved by using two passes is even larger when
the Python conventions are not allowed. Using two passes is, however,
a less aesthetically pleasing response to the challenge.

The expression is a series of tests for errors OR'ed
together. Perhaps the easiest way to understand this expression is to
see how it is built up from the various types of possible errors. This
approach turns out to be effective, but it is hard to guarantee that
all possible errors have been found.

Because the challenge specifies a well defined (and enforcable) format
for the input to be tested, it is possible to exaustively test for
errors. A Python program has been created (the second listing below)
that exhaustively tests all dates of the form nnnn-nn-nn (where n is a
number) using both algorithmic and regular expression based tests. A
comparison of the results from these two methods exposes any errors in
the regular expression and guarantees that the regular expression is
as accurate as the algorithm.

[...]

A Python program to check date-checking regular expressions:
A program to check all possible numeric dates in the form
nnnn-nn-nn.Compares two methods of validating such dates,
one based on a regular expression and one based on an algorithm.
Substituting a different regular expression for whole_re would
allow it to be checked for accuracy.

[...]

On Thu, 16 Aug 2001, Bullard, Claude L (Len) wrote:

> Does anyone know of a repostitory of moreorless
> reusuable regexes, eg, international phone numbers,
> file paths etc, for XML Schema?

Best wishes,

Robin Cover
XML Cover Pages
http://xml.coverpages.org/



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.