[Home] [By Thread] [By Date] [Recent Entries]
Recently I needed to merge several back-of-the-book indexes that were marked up
in XML. After experimenting a bit, I decided that given an appropriate
collation, the following sequence of (XSLT 3.0) xsl:sort instructions was an
adequate approximation of what indexers call the "letter-by-letter" style of
alphabetizing:
<xsl:sort select="replace(., '[\s\p{P}-[(,]]', '') ! replace(., ',.*|\(.*','')"/>
<xsl:sort select="matches(., '^[^(]+,')"/>
<xsl:sort select="replace(., '[\s\p{P}-[(,]]+', '')"/>If anyone wants to test it out with the examples that are used in the Chicago Manual of Style to illustrate the system, I've put the full script, data, and relevant chunk of the CMS up here: http://lister.ei.virginia.edu/~drs2n/alpha/ . (Suggested refinements/improvements would be welcome.) I spent a bit of time trying to figure out how one might implement the word-by-word system (described at the above URL) using xsl:sort, but I'm not sure it's possible--it seems that word-by-word would require a full-blown recursive sorting routine. I'm happy to be proven wrong, though, by anyone who has tackled this before or is cleverer than I am about such things. David -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA Email: dsewell@xxxxxxxxxxxx Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/
|

Cart



