XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Go to previous topicPrev TopicGo to next topicNext Topic
c doniatSubject: word count for each category
Author: c doniat
Date: 28 Nov 2007 03:17 AM
Originally Posted: 27 Nov 2007 09:41 AM

I'm newbie in XML/XQuery. I have this XML file (below) and I try to extract some data as : I would like this list of unique name of each category and for each category, I'd like the number of words

<category name="general" wordcount="36" />

Please help me or give a link to a tutorial/book. thanks in advance


<?xml version="1.0"?>
<word category="general" name="alcohol" id="156"/>
<word category="risk factor" name="alcohol consumption" id="156"/>
<word category="general" name="amount" id="156"/>
<word category="compound" name="androgen" id="156"/>
<word category="living being" name="animal" id="156"/>
<word category="general" name="association" id="156"/>
<word category="pathology" name="breast cancer" id="156"/>
<word category="process" name="carcinogenesis" id="156"/>
<word category="living organism" name="cell" id="156"/>
<word category="general" name="characteristic" id="156"/>
<word category="general" name="damage" id="156"/>
<word category="general" name="data" id="156"/>
<word category="general" name="decade" id="156"/>
<word category="nucleic acid" name="dna" id="156"/>
<word category="general" name="effect" id="156"/>
<word category="general" name="evidence" id="156"/>
<word category="general" name="factor" id="156"/>
<word category="general" name="finding" id="156"/>
<word category="gland" name="gland susceptibility" id="156"/>
<word category="general" name="habit" id="156"/>
<word category="treatment" name="hormone replacement therapy" id="156"/>
<word category="general" name="information" id="156"/>
<word category="general" name="insight" id="156"/>
<word category="process" name="intake" id="156"/>
<word category="process" name="interaction" id="156"/>
<word category="process" name="investigation" id="156"/>
<word category="general" name="level" id="156"/>
<word category="general" name="magnitude" id="156"/>
<word category="general" name="majority" id="156"/>
<word category="general" name="mechanism" id="156"/>
<word category="general" name="potential" id="156"/>
<word category="unknown" name="processe" id="156"/>
<word category="general" name="progress" id="156"/>
<word category="general" name="reference" id="156"/>
<word category="general" name="review" id="156"/>
<word category="general" name="risk" id="156"/>
<word category="general" name="status" id="156"/>
<word category="general" name="study" id="156"/>
<word category="effect" name="susceptibility" id="156"/>
<word category="receptor protein" name="tumor hormone receptor" id="156"/>
<word category="process" name="understanding" id="156"/>
<word category="unknown" name="use" id="156"/>
<word category="living being" name="women" id="156"/>
<word category="general" name="aberration" id="203"/>
<word category="substance" name="acid" id="203"/>
<word category="nuclear receptor" name="acid receptor" id="203"/>
<word category="general" name="alpha" id="203"/>
<word category="cancer" name="apl" id="203"/>
<word category="acid" name="atra" id="203"/>
<word category="general" name="case" id="203"/>
<word category="macromolecule" name="chromosome" id="203"/>
<word category="general" name="finding" id="203"/>
<word category="gene" name="fish" id="203"/>
<word category="general" name="fluorescent" id="203"/>
<word category="process" name="formation" id="203"/>
<word category="process" name="fusion" id="203"/>
<word category="heridity unit" name="gene" id="203"/>
<word category="characteristic" name="karyotype" id="203"/>
<word category="cancer" name="leukemia" id="203"/>
<word category="process" name="leukemogenesis" id="203"/>
<word category="general" name="patient" id="203"/>
<word category="unclassified" name="pml" id="203"/>
<word category="technique" name="polymerase chain reaction" id="203"/>
<word category="unclassified" name="rara" id="203"/>
<word category="process" name="reinduction therapy" id="203"/>
<word category="unclassified" name="relapse" id="203"/>
<word category="process" name="remission induction" id="203"/>
<word category="general" name="report" id="203"/>
<word category="general" name="response" id="203"/>
<word category="general" name="responsiveness" id="203"/>
<word category="general" name="role" id="203"/>
<word category="process" name="situ hybridization" id="203"/>
<word category="general" name="study" id="203"/>
<word category="process" name="therapy" id="203"/>
<word category="unclassified" name="tran" id="203"/>
<word category="enzyme" name="transcriptase" id="203"/>
<word category="process" name="translocation" id="203"/>
I finally found out a solution:

for $cat in distinct-values(wordset/word/@category)
return <result>
{sum(for $wcount in wordset/word[@category=$cat]
return count($wcount))}

It could be help someone else.


Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.