|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: grouping and word counting
Hi Marina,
One can use the string tokeniser from FXSL (the "str-split-to-words"
template) in order to obtain a list of words from a string and then count
them.
This, combined with the Muenchian method for grouping gives us the following
solution.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
exclude-result-prefixes="ext">
<xsl:import href="strSplit-to-Words.xsl"/>
<xsl:output method="text"/>
<xsl:key name="kMsg" match="MESSAGE" use="."/>
<xsl:key name="kByCount" match="m" use="@count"/>
<xsl:template match="/">
<xsl:variable name="vPass1">
<xsl:for-each
select="/*/*/MESSAGE[generate-id()
=
generate-id(key('kMsg',
.
)[1]
)
]">
<xsl:sort select="count(key('kMsg',.))"
data-type="number"/>
<m count="{count(key('kMsg',.))}"
text="{.}"/>
</xsl:for-each>
</xsl:variable>
<xsl:for-each
select="ext:node-set($vPass1)/m
[generate-id()
=
generate-id(key('kByCount',
@count
)[1]
)
]">
<xsl:sort select="count(key('kByCount', @count))"
data-type="number"/>
<xsl:variable name="vAllText">
<xsl:for-each select="key('kByCount', @count)">
<xsl:value-of select="concat(' ', @text, ' ')"/>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="vrtfWords">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="$vAllText"/>
<xsl:with-param name="pDelimiters" select="' '"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="vAvWords"
select="(count(ext:node-set($vrtfWords)/word) - 1)
div
count(key('kByCount', @count))"/>
<xsl:value-of select="concat(count(key('kByCount',
@count
)
),
' ',
@count,
' ',
$vAvWords,
'
'
)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on your source.xml:
<LOG>
<SENT>
<USER> 12345 </USER>
<LOCATION> 55555 </LOCATION>
<TARGET> 1 </TARGET>
<TARGET_LOCATION> 23222 </TARGET_LOCATION>
<MESSAGE> hello Fred </MESSAGE>
</SENT>
<SENT>
<USER> 77777 </USER>
<LOCATION> 76666 </LOCATION>
<TARGET> 3 </TARGET>
<TARGET_LOCATION> 34444 </TARGET_LOCATION>
<MESSAGE> nice weather </MESSAGE>
</SENT>
<SENT>
<USER> 77777 </USER>
<LOCATION> 76666 </LOCATION>
<TARGET> 4 </TARGET>
<TARGET_LOCATION> 67777 </TARGET_LOCATION>
<MESSAGE> nice weather </MESSAGE>
</SENT>
<SENT>
<USER> 33333 </USER>
<LOCATION> 12666 </LOCATION>
<TARGET> 8 </TARGET>
<TARGET_LOCATION> 98765 </TARGET_LOCATION>
<MESSAGE> whats the latest news? </MESSAGE>
</SENT>
<SENT>
<USER> 33333 </USER>
<LOCATION> 12666 </LOCATION>
<TARGET> 9 </TARGET>
<TARGET_LOCATION> 46578 </TARGET_LOCATION>
<MESSAGE> whats the latest news? </MESSAGE>
</SENT>
</LOG>
produces the wanted result:
1 1 2
2 2 3
Hope this helped.
=====
Cheers,
Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
"marina" <marina777uk@xxxxxxxxx> wrote in message
news:20030719075801.60127.qmail@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Hi,
>
> I have an XML document that contains messages sent by
> people to one another. Many of these messages in the
> <MESSAGE> tags are repeated as they are sent by one
> person to many others.
>
> XML Snippet:
> --------------------------------------------------
> <LOG>
> <SENT>
> <USER> 12345 </USER>
> <LOCATION> 55555 </LOCATION>
> <TARGET> 1 </TARGET>
> <TARGET_LOCATION> 23222 </TARGET_LOCATION>
> <MESSAGE> hello Fred </MESSAGE>
> </SENT>
> <SENT>
> <USER> 77777 </USER>
> <LOCATION> 76666 </LOCATION>
> <TARGET> 3 </TARGET>
> <TARGET_LOCATION> 34444 </TARGET_LOCATION>
> <MESSAGE> nice weather </MESSAGE>
> </SENT>
> <SENT>
> <USER> 77777 </USER>
> <LOCATION> 76666 </LOCATION>
> <TARGET> 4 </TARGET>
> <TARGET_LOCATION> 67777 </TARGET_LOCATION>
> <MESSAGE> nice weather </MESSAGE>
> </SENT>
> <SENT>
> <USER> 33333 </USER>
> <LOCATION> 12666 </LOCATION>
> <TARGET> 8 </TARGET>
> <TARGET_LOCATION> 98765 </TARGET_LOCATION>
> <MESSAGE> whats the latest news? </MESSAGE>
> </SENT>
> <SENT>
> <USER> 33333 </USER>
> <LOCATION> 12666 </LOCATION>
> <TARGET> 9 </TARGET>
> <TARGET_LOCATION> 46578 </TARGET_LOCATION>
> <MESSAGE> whats the latest news? </MESSAGE>
> </SENT>
> </LOG>
> --------------------------------------------------
> What I need to do is:-
>
> 1) Find out how many messages over all were sent to 1,
> 2, 3 etc people.
>
> As a duplicated message will always follow the
> original, i.e. be the next <MESSAGE> tag of the
> following sibling node, I'm thinking that the
> stylesheet would start with the first message and keep
> comparing siblings until it found one that was
> different. Then it would just add the previous number
> of sibling nodes? ( I probably need to use keys?)
>
> 2) For each of the total messages per group size,
> calculate the average number of words. No idea on this
> one I'm afraid!
>
> So the desired output from the snippet above would be:
> -
>
> Group Size Number of Messages Av Number Words
> 1 1 2
> 2 2 3
> (up to say 20)
>
> Many thanks in advance for any help,
>
> Marina
>
>
>
>
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








