[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Flattening characters to plain latin

Subject: RE: Flattening characters to plain latin
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sat, 17 Feb 2007 17:22:31 -0000
RE:  Flattening characters to plain latin
> My verdict: If the 'lt' of Michael was on purpose, I still 
> want to grant him the "Best Original Software Snippet Based 
> On Any XXX* Language" ;-)

I think the original problem wasn't especially well specified, and I was
well aware that retaining all the characters below 127 while losing those
above was a pretty crude cutoff. In the light of that, the decision whether
to keep or lose 127 itself is neither here nor there. Almost certainly a
better solution solution is to discard only the characters in particular
Unicode groups, which should be possible to achieve using replace() with
appropriately selected regular expressions. The basic idea I was trying to
propose was using normalize-unicode to translate into decomposed normal form
and then discarding modifier characters, and I think that's basically a
sound approach.

In fact a better solution might be

replace(normalize-unicode($in, 'NFKD'), '\P{Mn}', '')

but I'm sure that could be improved further.

Michael Kay
http://www.saxonica.com/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.