Subject: Re: Testing for upper and lower case
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 03 Nov 2011 23:09:50 +0000
|
On 03/11/2011 16:23, Houghton,Andrew wrote:
Your string-to-codepoints example only works for ASCII upper/lower case letters. It fails to recognize composed and decomposed diacritical characters such as a combined uppercase A with a grave U+00C1, with an accute U+00C1, with a circumflex U+00C2, etc. Yes you could detect these too with additional logic, but matches() with a character class of \p{Ll}, \p{Lu}, \p{Lt} handles all the messy details of Unicode.
Andy.
If you want to handle both composed and decomposed characters then it's
probably safest to use normalize-unicode() before using matches().
Michael Kay
Saxonica
| Current Thread |
Andrew Welch - 3 Nov 2011 16:16:10 -0000
- Houghton,Andrew - 3 Nov 2011 16:23:30 -0000
- Michael Kay - 3 Nov 2011 23:10:02 -0000 <=
- Mark - 3 Nov 2011 16:29:44 -0000
|
|