|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: text() word lists
> Hi there,
>
> I'm sure this is a faq, and I've checked the faq and archive.
> I swear I remember someone asking about it, but I couldn't
> find it, so here goes.
>
> I want to take an XML file of unknown elements and create
> a word frequency list / word list. Now, an entry on sorting
> in the xslt faq says this is just what xslt is bad at. (And
> I'm sure there are some that would say 'just go use perl',
> but let's say I want to do it in xslt(1 or 2).
>
> XSLT2 makes the tokenization of strings much easier, so
> assuming I'm using that, if I have:
>
> <foo>
> <blort> This is a <wibble>Test</wibble>, only a test!</blort>
> <blort> This really is a <wibble>great big test</wibble>, only a test!
> </blort>
> </foo>
>
> I don't know that foo|wibble|blort will be the element names.
>
> But I want to produce both:
>
> a -- 4
> test -- 4
> only -- 2
> is -- 2
> this -- 2
> big -- 1
> great -- 1
> really -- 1
>
> Which (unless I've missed something) should be
> a case-insensitive list grouped by frequency
> sorted alphabetically within this, and ignoring
> punctuation.
>
> But also:
>
> a -- 4
> big -- 1
> great -- 1
> is -- 2
> only -- 2
> test -- 4
> this -- 2
> really -- 1
>
> Which is the same list by not grouped
> by frequency.
>
> Suggestions? Solutions?
>
> Many thanks for any help,
> -James
> ---
> Dr James Cummings, Oxford Text Archive, University of Oxford
> James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/
Using FXSL and Saxon 7 (This was intended to be essentially an XSLT 1.0
solution, until I realized that there cannot be references to variables in
xsl:key -- I need to change this a little bit to work in XSLT 1.0) one
would write:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
>
<xsl:import href="strSplit-to-Words.xsl"/>
<xsl:key name="kWordByVal" match="word"
use="translate(., $vUpper, $vLower)"/>
<xsl:variable name="vUpper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="vLower" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:variable name="vwordNodes">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="/"/>
<xsl:with-param name="pDelimiters"
select="', 	 !'"/>
</xsl:call-template>
</xsl:variable>
<xsl:for-each
select="ext:node-set($vwordNodes)/*[normalize-space()]
[generate-id()
=
generate-id(key('kWordByVal',
translate(., $vUpper, $vLower)
)[1])
]">
<xsl:sort select="count(key('kWordByVal',
translate(., $vUpper, $vLower)
)
)"
data-type="number"
order="descending" />
<xsl:value-of
select="concat('
',
translate(., $vUpper, $vLower),
' - ',
count(key('kWordByVal',
translate(., $vUpper, $vLower)
)
)
)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on your source.xml:
<foo>
<blort> This is a <wibble>Test</wibble>, only a test!</blort>
<blort> This really is a <wibble>great big test</wibble>, only a test!
</blort>
</foo>
The wanted result is produced:
a - 4
test - 4
this - 2
is - 2
only - 2
really - 1
great - 1
big - 1
For the other output you just have to change the "select" attribute of
xsl:sort.
Solving this kind of tasks is almost trivial using FXSL.
Cheers,
Dimitre Novatchev
FXSL developer,
http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








