|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] Count of Distinct elements performance problemMichael Kay mhk at mhk.me.ukTue Aug 22 20:19:51 PDT 2006
A performance question like this can only be answered with respect to a specific product. There aren't many XQuery engines that will handle an 8Gb file, so you're doing quite well. You might find that a multi-pass approach works faster: (a) extract the values (b) sort them (c) use a recursive scan to do positional grouping - depends on your product supporting tail call optimization That's likely to have O(n*log(n)) performance rather than O(n^2). Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: http://xquery.com/mailman/listinfo/talk > [mailto:http://xquery.com/mailman/listinfo/talk] On Behalf Of Kusunam, Srinivas > Sent: 22 August 2006 18:36 > To: http://xquery.com/mailman/listinfo/talk > Subject: Count of Distinct elements performance problem > > I am trying to find count of distinct elements (Model Year). > Here is my XQuery. It takes 4 hrs to get the count from 8GB > file. There are around 3000 distinct Model years in this file. > > let $mdoc := doc('input.xml')/Body > let $sourModelYEAR := $mdoc/Title/ModelYear return <Elements> > <Element name="ModelYear"> > <Distribution> > { > for $dvalue in fn:distinct-values($sourModelYEAR) > let $eachcount := count($mdoc/Title[ModelYear=$dvalue]) > return > <distribution> > <value>{ $dvalue }</value> > <count>{ $eachcount }</count> > </distribution> > } > </Distribution> > </Element> > </Elements> > > This Query seems to loop through the document for each value i.e. > overall 3000 times. I know this should be easily achievable > if we have Group-by in XQuery. Do any XQuery engine supports > (custom) Group-By now? > Or is there any other way to make this query efficient? > > Where as if I add one more element to find the pattern of the > data it finishes the job within 40 minutes? Why is this odd behavior? > > let $mdoc := doc('input.xml')/Body > let $sourModelYEAR := $mdoc/Title/ModelYear return <Elements> > <Element name="ModelYear"> > <Distribution> > { > for $dvalue in fn:distinct-values($sourModelYEAR) > let $eachcount := count($mdoc/Title[ModelYear=$dvalue]) > return > <distribution> > <value>{ $dvalue }</value> > <count>{ $eachcount }</count> > </distribution> > } > </Distribution> > <PatternDistribution> > { > for $phonenum in > distinct-values($sourModelYEAR/translate(., > '0123456789','9999999999')) > return > <pattern> > <type>{ $phonenum }</type> > <count>{count($sourModelYEAR[translate(., > '0123456789', '9999999999') eq $phonenum])}</count> > </pattern> > } > </PatternDistribution> > </Element> > </Elements> > > > Thanks, > Srini > ***************************************************************** > This message has originated from RLPTechnologies, > 26955 Northwestern Highway, Southfield, MI 48033. > > RLPTechnologies sends various types of email communications. > If this email message concerns the potential licensing of an > RLPT product or service, and you do not wish to receive > further emails regarding Polk products, forward this email to > http://xquery.com/mailman/listinfo/talk with the word "remove" in the subject line. > > The email and any files transmitted with it are confidential > and intended solely for the individual or entity to whom they > are addressed. > > If you have received this email in error, please delete this > message and notify the Polk System Administrator at > http://xquery.com/mailman/listinfo/talk > ***************************************************************** > > > _______________________________________________ > http://xquery.com/mailman/listinfo/talk > http://xquery.com/mailman/listinfo/talk
|
Purchase Stylus Studio Online Today!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






