[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

distinct-values() optimization, sorting by frequency

Subject: distinct-values() optimization, sorting by frequency
From: "James Cummings" <cummings.james@xxxxxxxxx>
Date: Fri, 8 Feb 2008 14:27:56 +0000
 distinct-values() optimization
Hiya,

I'm wondering the best way to optimize a distinct-values() based
transformation.  What I'm basically doing is:
======
<xsl:variable name="docs"  select="collection('../../working/xml/files.xml')"/>

<xsl:template name="main" >
 <xsl:variable name="persNames" select="$docs//tei:text//tei:persName"/>
 <xsl:variable name="norm-persNames"
select="$persNames/normalize-space(lower-case(.))"/>
 <xsl:variable name="distinct-persNames"
select="distinct-values($norm-persNames)"/>
<!-- I realize that I could be more specific on the $persNames
variable, but doing so doesn't seem to affect speed much at all. -->
<div type="main">

<!-- Some overall counts -->
<div><head>Overall Counts</head>
<list type="unordered">
  <item>Number of <gi>persName</gi> elements total:
    <xsl:value-of select="count($persNames)"/></item>
  <item>Number of <gi>persName</gi> elements which have a  @key
attribute total: <xsl:value-of
select="count($persNames[@key])"/></item>
<item>Number of distinct-value <gi>persName</gi> elements total:
<xsl:value-of select="count($distinct-persNames)"/></item>
</list></div>

<!-- An Alphabetical List -->
<div><head>Alphabetical List</head>
  <list type="unordered">
    <xsl:for-each select="$distinct-persNames">
      <xsl:sort select="."/>
      <xsl:variable name="current-name" select="."/>
      <xsl:variable name="count-distinct-current-name"
     select="count($persNames[normalize-space(lower-case(.)) =$current-name])"/>
      <item><xsl:value-of select="concat($current-name,
          '  --  ', $count-distinct-current-name)"/></item>
      </xsl:for-each>
   </list>
</div>

<!-- A Frequency Sorted List  -->
<div>
  <head>Frequency List</head>
  <list type="unordered">
    <xsl:for-each select="$distinct-persNames">
      <xsl:sort select="count($persNames[normalize-space(lower-case(.))
        = .])"/>
<!-- I think it is this sort statement which slows things down, since
I have to repeat it twice. -->
      <xsl:variable name="current-name" select="."/>
      <xsl:variable name="count-distinct-current-name"
        select="count($persNames[normalize-space(lower-case(.))
        = $current-name])"/>
      <item><xsl:value-of select="concat($count-distinct-current-name,
          '  --  ', $current-name)"/> </item>
    </xsl:for-each>
  </list>
</div>
</div>
======

I think the real slow-down comes in the second xsl:for-each where I
want to sort by frequency of distinct-value by doing:
<xsl:sort select="count($persNames[normalize-space(lower-case(.)) = .])"/>
I have to have it for the sort, and then I have to re-do it for the
output inside the <item> element.  I'm obviously not allowed a
variable between the for-each and the sort... but I have a feeling I'm
missing some clever optimization here.

Although this is for a pre-generated transformation, it currently
takes a *hugely* long time, and I'm thinking I must be able to
optimize it somehow.

Any suggestions appreciated,

-James

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.