[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Need an XPath expression which returns all xs:patt
"Willem Van Lishout willemvanlishout@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes: > Is this even possible, theoretically speaking? As soon as you start > using lookaheads, square brackets, and so on, your patterns will > likely fail. I don't think regex can parse regex. Good observation, quite correct. Since the language of regular expressions in XSD (and XPath) is context free, not a regular language, it cannot be recognized in full by a regular expression. There are, I think, two ways to deal with that fact: 1 In the general case, if we limit the maximum nesting depth of a context-free language to some finite number, the language becomes regular. So we cannot write a regular expression to match all and only the legal regular expressions of XSD or XPath, but we *can* write one that recognizes all and only the ones with at most one level of nesting for parentheses or square brackets. We can also write one that recognizes all and only the expressions with at most two, or three, or any positive n, levels of nesting. (The result is not guaranteed to be pretty.) For any finite set of inputs, there will be some maximum nesting level observed; if you have written your regular expression to handle nesting that deep, your regex will produce neither false positives nor false negatives for that input. How comforting this is depends on how costly a false positive or negative would be, and how costly it would be to parse the regular expressions in question using something like ixml. (Hmm. Good idea for a sample ixml grammar!) 2 In this case, we can observe that the requirement is only to detect patterns which allow strings of unbounded length, and thus regular expressions which use quantifiers like *, +, or {n,} on any base expression. We can assume that all inputs are syntactically well formed, so we do not need to distinguish well-formed from ill-formed expressions. Nesting of parentheses does not matter and can be ignored. Nesting of square brackets turns out also not to matter, since the characters *, +, {, and } are effectively escaped at every possible nesting level in a character class expression, and once the first right bracket is encountered, no character other than right bracket can be encountered until all open square bracket pairs have been closed. So: no, theoretically speaking, it's not possible to recognize regular expressions with regular expressions. But practically speaking, *for this problem* (a) an approximation is possible, and in fact better than that (b) a correct solution is possible, because of patterns in the language in question which we can see and capture. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|