Stylus Studio XML Editor

Table of contents

Appendices

F Regular Expressions

Regular Expressions

A regular expression R is a sequence of characters that denote a set of strings  L(R). When used to constrain a lexical space, a regular expression  R asserts that only strings in L(R) are valid literals for values of that type.

A regular expression is composed from zero or more branches, separated by | characters.

Regular Expression
F    regExp   ::=    branch ( '|' branch )*

1 For all branches S, and for all regular expressions T, valid regular expressions R are: Denoting the set of strings L(R) containing:
center11(empty string) center11the set containing just the empty string
center11S center11all strings in L(S)
center11S|T center11all strings in L(S) and all strings in L(T)

A branch consists of zero or more pieces, concatenated together.

Branch
F    branch   ::=   nt-piece*

1 For all pieces S, and for all branches T, valid branches R are: Denoting the set of strings L(R) containing:
center11S center11all strings in L(S)
center11ST center11all strings st with s in L(S) and t in L(T)

A piece is an atom, possibly followed by a quantifier.

Piece
F    piece   ::=   nt-atom nt-quantifier?

1 For all atoms S and non-negative integers n, m such that n <= m, valid pieces R are: Denoting the set of strings L(R) containing:
center11S center11all strings in L(S)
center11S? center11the empty string, and all strings in L(S).
center11S* center11 All strings in L(S?) and all strings st with s in L(S*) and t in L(S). ( all concatenations of zero or more strings from L(S) )
center11S+ center11 All strings st with s in L(S) and t in L(S*). ( all concatenations of one or more strings from L(S) )
center11S{n,m} center11 All strings st with s in L(S) and t in L(S{n-1,m-1}). ( All sequences of at least n, and at most m, strings from L(S) )
center11S{n} center11 All strings in L(S{n,n}). ( All sequences of exactly n strings from L(S) )
center11S{n,} center11 All strings in L(S{n}S*) ( All sequences of at least n, strings from L(S) )
center11S{0,m} center11 All strings st with s in L(S?) and t in L(S{0,m-1}). ( All sequences of at most m, strings from L(S) )
center11S{0,0} center11 The set containing only the empty string
NOTE: 

The regular expression language in the Perl Programming Language [Perl] does not include a quantifier of the form S{,m), since it is logically equivalent to S{0,m}. We have, therefore, left this logical possibility out of the regular expression language defined by this specification. We welcome further input from implementors and schema authors on this issue.

A quantifier is one of ?, *, +, {n,m} or {n,}, which have the meanings defined in the table above.

Quanitifer
F    quantifier   ::=   [?*+] | ( '{' nt-quantity '}' )
F    quantity   ::=   nt-quantRange | nt-quantMin | nt-QuantExact
F    quantRange   ::=   nt-QuantExact ',' nt-QuantExact
F    quantMin   ::=   nt-QuantExact ','
F    QuantExact   ::=   [0-9]+

An atom is either a normal character, a character class, or a parenthesized regular expression.

Atom
F    atom   ::=   nt-Char | nt-charClass | ( '(' nt-regExp ')' )

1 For all normal characters c, character classes C, and regular expressions S, valid atoms R are: Denoting the set of strings L(R) containing:
center11c center11the single string consisting only of c
center11C center11all strings in L(C)
center11(S) center11all strings in L(S)

A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ]. These characters have special meanings in regular expressions, but can be escaped to form atoms that denote the sets of strings containing only themselves, i.e., an escaped metacharacter behaves like a normal character.

A normal character is any XML character that is not a metacharacter. In regular expressions, a normal character is an atom that denotes the singleton set of strings containing only itself.

Normal Character
F    Char   ::=   [^.\?*+()|#x5B#x5D]

Note that a normal character can be represented either as itself, or with a [character reference].