[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Processing two documents, which order?
This is about the definition of a character class, using '['and ']', defining a set of characters that match a single character if it is one from the defined set. The definition of a class can be done by a union of single characters and ranges, and by subtracting classes from a union. Within [ and ], some characters have special meaning and must be escaped (using '\') in order to be taken literally. The hyphen ('-') permits you to define ranges of characters, but it is also used as an operator for the substraction of sets. It is possible to define sets within sets, but only for the purpose of creating the second operand of a set subtraction. Looking at [a-z\--\-\-:], we see a-z ... a range of 26 characters \--\- ... a range of a single character, '-' \- ... a single character, '-' : ... a single character, ':' The simpler class [a-z\--:] (no spaces are permitted, and neither is a backslash in front of the colon, because it's not a valid single character escape) has been analyzed by Liam; it is the union of two ranges, including lowercase letters, the digits and the period, the solidus, the hyphen and the colon. Among other issues, this thread deals with the question of finding certain words that are delimited by anything except a hyphen: if "left" and "hand" should be found, "left-hand" should not, unless it is itself included in the list of words. Thus, it is sufficient to include the hyphen in the set of characters to match for a word. Thus: regex="[a-z][a-z\-]+" (The colon is more difficult.) -W On 9 April 2011 17:55, Liam R E Quin <liam@xxxxxx> wrote: > On Sat, 2011-04-09 at 08:20 +0100, Dave Pawson wrote: > >> I want to say any lc character, AND not( : | -) > > since : and - are not lowercase characters, just "any lowercase letter" > would work... or by AND do you mean "followed by"? > >> <xsl:analyze-string select="." regex="[a-z][a-z\--\-\-:]+"> >> works. But I don't know how. > > [a-z] is a lower case letter (in ASCII...) > [a-z \- - \:] allows any character in two ranges: > (1) a .. z > (2) - .. : > using the default collation/sorting sequence, this gives (consulting an > ASCII or Unicode chart) > - . / 0123456789 : > > This therefore matches pastry:36-little-pigs but not flat:pan_cake >> >> [a-z-[p]] excepts p from the range a-z >> Is this connected with my misunderstanding? > > It might be, but there are no nested square brackets in your example. > The stylesheet you appended had the range --- in it, rather than --: by > the way. > > Note that we are using here XPath 2 regular expressions, not Java ones. > They are very close (and both are more or less subsets of Perl regular > expressions, which are much more powerful). > > Liam > > > -- > Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ > Pictures from old books: http://www.fromoldbooks.org/ > Occasional blog: http://www.barefootliam.org/ > The barefoot typographer
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|