[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Flattening characters to plain latin
> My verdict: If the 'lt' of Michael was on purpose, I still > want to grant him the "Best Original Software Snippet Based > On Any XXX* Language" ;-) I think the original problem wasn't especially well specified, and I was well aware that retaining all the characters below 127 while losing those above was a pretty crude cutoff. In the light of that, the decision whether to keep or lose 127 itself is neither here nor there. Almost certainly a better solution solution is to discard only the characters in particular Unicode groups, which should be possible to achieve using replace() with appropriately selected regular expressions. The basic idea I was trying to propose was using normalize-unicode to translate into decomposed normal form and then discarding modifier characters, and I think that's basically a sound approach. In fact a better solution might be replace(normalize-unicode($in, 'NFKD'), '\P{Mn}', '') but I'm sure that could be improved further. Michael Kay http://www.saxonica.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|