[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: XSLT script to report Unicode characters and code
I wrote a transformation that uses unparsed-text() and regex processing to create an XML version of the Unicode database; once you've got that, you can easily look up what code block a particular character falls into because it's part of the data for each character. (Well, most of the characters. Some of the non-BMP entries share a single entry for a large group of characters, which needs a bit of care). Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: David Sewell [mailto:dsewell@xxxxxxxxxxxx] > Sent: 29 May 2008 20:45 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: XSLT script to report Unicode characters and > code blocks in file? > > I'm working on a simple XSLT 2.0 script to list all distinct > Unicode characters used in a file. That part of the script > takes very few lines, thanks to distinct-values(), > codepoints-to-string(), and string-to-codepoints(). > > However, I'd also like to group the output by code block: > > http://www.fileformat.info/info/unicode/block/index.htm > > Best way I can see to do it is to write a local function that > tests the codepoint value and uses lots and lots of > <xsl:when> case tests to determine which block the character > falls into. Not hard but a bit tedious. Has anyone invented > this wheel already? > > DS > > -- > David Sewell, Editorial and Technical Manager ROTUNDA, The > University of Virginia Press PO Box 801079, Charlottesville, > VA 22904-4318 USA > Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903 > Email: dsewell@xxxxxxxxxxxx Tel: +1 434 924 9973 > Web: http://rotunda.upress.virginia.edu/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|