[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Regular expression functions (Was: Re: comments on

Subject: Re: Regular expression functions (Was: Re: comments on December F&O draft)
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Sat, 12 Jan 2002 11:18:01 -0800 (PST)
regular.grammar download
> But anyway, I like the LEX/YACC idea, and think it might be the way to
> go. You basically create a grammar for the structure, using regular
> expressions to represent the component parts.

Hi Jeni,

YACC is a parser generator for LALR(1) grammars (a subset of context-free grammars),
which describe languages much more powerful than what regular expressions can

> Something along the lines of:
>   $row      => ($frac)|($sqrt)|($expr)
>   $frac     =>  \\frac\{($row)\}\{($row)\}
>   $sqrt     =>  \\sqrt\{($row)\}
>   $expr     => ($times)|(($operand)?(\s*($operator)\s*($operand))+)
>   $times    => ($operand){2,}
>   $operand  => ($row)|($sup)|($number)|($ident)
>   $sup      => ($operand)\^($operand)
>   $operator => \\pm|\-
>   $number   => -?[0-9]+(\.[0-9]+)?
>   $ident    => [a-z][a-z0-9]*

This is not a regular grammar but a context-free grammar (although it is not
formally defined and I have to guess which exactly is the set of terminal and
non-terminal symbols, as well as which is the starting symbol).

It seems that $row, $frac, $sqrt, $expr and $operand are the non-terminal symbols.
In this case this is a context-free grammar that does not describe a regular
language (a language that can be described/generated by a set of regular
expressions). This is so, because in a set of rules describing a regular language
the rules can only have terminal symbols and then one variable. The rule for $expr
is in clear violation of this principle.

Therefore, the language described by the above grammar cannot be matched by a set of
regular expressions.

> These assignments could be done with xsl:regexp elements, which would
> work like variable-binding elements, but would be used purely for
> identifying regular expressions. For example:

This cannot be done with regular expressions -- see above.

> <xsl:regexp name="row" select="'($frac)|($sqrt)|($expr)'" />
> <xsl:regexp name="frac" select="'\\frac\{($row)\}\{($row)\}'" />
> ...
> Then you'd have a xsl:match instruction which would select an
> expression that it turned into a string, and have a regexp attribute
> which held a regular expression. For example:
>   <xsl:match select="'\frac{-b \pm \sqrt{b^2 -4ac}}{2a}'"
>              regexp="($row)">
>     ...
>   </xsl:match>
> Within the xsl:match, a current-match() function would give you access
> to a tree that represents the matched portions of the string. In this
> case it would look something like the following (but with less
> whitespace, obviously):

[long tree snipped]

> (Of course it would look different with a different grammar.)
> I am fairly convinced that it's possible to create an implementation
> to do this, based on the fact that there are plenty of lexers out
> there that do roughly the same thing.

As shown above, what would be needed is not a "lexer" but a parser for CF grammars.

In case a parser generator is used, it has to be fed with the rules of the grammar.
It then generates a table that guides the parsing process. This table typically
shows the action to be taken and the next state of the parser, if it is in a given
state, the top of the stack contains a given symbol and the input contains a given
terminal symbol. This table-driven parser will need a lexical analizer to call when
each next non-terminal from the input is needed.

You're suggesting this to be done dynamically at transformation time.

Not that this is impossible, but this is something much more complex and less
efficient than matching regular expressions.

Dimitre Novatchev.

Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.