[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XSLT function for title capitalization?
On Mon, 2018-04-09 at 20:52 +0000, David Sewell dsewell@xxxxxxxxxxxx wrote: > Wondering if anyone has a serviceable function (preferably in XSLT > 2/3 but v1 is > fine if it works) that takes a string as input and returns it with > title > capitalization according to English-language editorial practice (for > example, > Chicago Manual of Style). I'd use replace() probably, rather than tokenizing, so as to change as little as possible & facilitate regression tests. Some test cases should include * words that do and don't change at the start and at the end of input; * words like o'clock and don't that include apostrophes, both as ' and as b (it doesn't matter whether they are input as entities or literally or numeric character references though, as they all end up the same after XML parsing) * hyphenated proper names like Rees-Mogg * exceptions like Ladies-in-Waiting * punctuation such as em dashes, quotes, commas, semicolons Unfortunately XSLT doesn't give us Perl's wonderful e modifier on substitution, and neither does XQuery (where it'd be more useful), but XSLT does give us xsl:analyze-string. I'd start with David Carlisle's approach and add a lot of test cases and fix the regexp to be something more like (\w)(\w*(?:'\w+)?) maybe. An alternative is to replace (\w)'(\w) with $1E$2 everywhere, where E is some Unicode upper-case letter or sequence of letters that definitely doesn't occur in your input, and change it back at the end. In XSLT 1 i'd cry for a while and then write something recursive that split its input using translate() and substring-before() to find where to split. For https://words.fromoldbooks.org/Chalmers-Biography/ i use Perl, as the input isn't well-formed XML at first, with a table of manual overrides, but there are fewer than 10,000 entries i think. Once it's in XMl my script/Makefile for conversion does use XSLT, taking 46 seconds to process 43MBytes of XML into 9771 separate XML files with Saxon. Liam -- Liam Quin, W3C, http://www.w3.org/People/Quin/ Staff contact for Verifiable Claims WG, SVG WG, XQuery WG Improving Web Advertising: https://www.w3.org/community/web-adv/ Personal: awesome vintage art: http://www.fromoldbooks.org/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|