Re: [XSLT2.0] xsl:analyze-string@regex syntax too limi
Thanks, good find. The only problem now is that this issue needs to be adressed in java.util.regex. Colin Paul Adams wrote: >>>>>>"Gunther" == Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> writes: > > > Gunther> The boundary matcher matches a zero-width substring > Gunther> between a character matching the character class > Gunther> [A-Za-z_0-9] and a character matching the character class > Gunther> [^A-Za-z_0-9] or vice versa. </quote> > > Gunther> This is pretty clear. It may not make the > Gunther> internationalization people very happy because I can't do > Gunther> word-boundary matches on Hindi text. That's a true > Gunther> concern. > > So address it. Unicode report TR18 says (for Level 1 support): > > RL1.4 Simple Word Boundaries > To meet this requirement, an implementation shall extend the word boundary mechanism so that: > > 1. > > The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData]. See also Annex C: Compatibility Properties. > 2. > > Non-spacing marks are never divided from their base characters, and otherwise ignored in locating boundaries. > > Level 2 provides more general support for word boundaries between > arbitrary Unicode characters which may override this behavior. > > Level 1 support should certainly be met. -- Gunther Schadow, M.D., Ph.D. gschadow@xxxxxxxxxxxxxxx Associate Professor Indiana University School of Informatics Regenstrief Institute, Inc. Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format