RE: Re: text() word lists
On Mon, 9 Feb 2004 David.Pawson@xxxxxxxxxxx wrote: > I said: > Is it possible to remove all numbers too? > Or is that a part of the lexicographers toolset? It can be (I'm reliably informed by a linguist sitting a few desks away), in that someone might be analysing the text of (say) a motoring magazine. "The A1-M1 link road" (for UK readers) or "a V6 Engine...or I could have had a V8". where any comparisons don't make sense without the numbers. So what is the best way to parameterise these to allow turning on/off the removal of numbers? And while we're at it, turning on/off the removal of hyphens or other possibly-word-forming characters? > <xsl:template match="/"> > <frequencies> > <xsl:for-each-group group-by="." select=" > for $w in tokenize(string(.), '[\s.?!,)(]+')[.] return lower-case($w)"> > <xsl:sort select="count(current-group())" order="descending"/> > <xsl:analyze-string select="current-grouping-key()" regex="[0-9]+"> > <xsl:non-matching-substring> > <word><xsl:value-of select="current-grouping-key(), ' - ', > count(current-group())"/></word> > </xsl:non-matching-substring> > <xsl:matching-substring/> > </xsl:analyze-string> > </xsl:for-each-group> > </frequencies> > > </xsl:template> > > Seems to work nicely. > Thanks Michael, very useful. > > regards DaveP --- Dr James Cummings, Oxford Text Archive, University of Oxford James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format