[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: grouping (was: if or template?)

Subject: Re: grouping (was: if or template?)
From: "Steve Muench" <smuench@xxxxxxxxxxxxx>
Date: Tue, 9 May 2000 01:10:53 -0700
tid key
| ><xsl:key name="tid" use="tracker-id" select="."/>
| >
| ><xsl:for-each
| >select="//tracker-id[generate-id(.)=generate-id(key('tid',.)[1])]">
| >
| >I hope Steve will forgive me for announcing this discovery 
| >before he does,
| >I'm quite excited by it because it gives much better performance.
| 
| All it does to me is make me scratch my head!
| Steve/Mike, would you give us the idiots view on this please,
| whats happening? *Why* does it provide the unique tracker-id please?

When you're doing grouping, you basically want to select
exactly one of each unique thing.

   //tracker-id

would select all tracker-id elements in the document.

Declaring a 'tid' key like:

  <xsl:key name="tid" use="tracker-id" select="."/>

The key('tid','tidvalue') function looks up all nodes having 
tracker-id = 'tidvalue'. In order to support this lookup,
the processor will be keeping a list in memory like this:

"tid" Key lookup Table
======================
tracker-id   Ref to tracker-id elements
  value      having that value
-----------  --------------------------
  abc123     node(109),node(344),node(496)
  def456     node(15)
  hij332     node(89),node(101)

Where the notation node(nnn) means "the node whose node-id is nnn"
as defined by generated-id(). To be concrete, the processor
is likely keeping some kind of Hashtable with the tracker-id
*value* as the hash key, and a node-list as the hash value.  

  //tracker-id[generate-id(.)=generate-id(key('tid',.)[1])]

selects all tracker-id elements in the document having
a node-id equal to the node-id of the first node in the
"key lookup table's" list of nodes having the current 
tracker-id.

Said more simply, it selects the first tracker-id element
for each unique tracker-id value.

Or even more simply, it selects a list of distinct tracker-id values.

Here's an example.

Take the "Task.xml" File below...

<Tasks>
   <Task><Desc>Task1</Desc><Owner>Steve</Owner></Task>
   <Task><Desc>Task2</Desc><Owner>Mike</Owner></Task>
   <Task><Desc>Task3</Desc><Owner>Dave</Owner></Task>
   <Task><Desc>Task4</Desc><Owner>Steve</Owner></Task>
   <Task><Desc>Task5</Desc><Owner>Mike</Owner></Task>
   <Task><Desc>Task9</Desc><Owner>Mike</Owner></Task>
</Tasks>

The stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output indent="yes"/>
  <xsl:key name="xxx" match="/Tasks/Task/Owner" use="."/>
  <xsl:template match="/">
  <Tasks>
    <xsl:for-each select="/Tasks/Task/Owner[generate-id(.)=generate-id(key('xxx',.))]">
      <xsl:sort select="."/>
      <Owner name="{.}">
        <xsl:for-each select="key('xxx',.)/..">
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </Owner>
    </xsl:for-each>
  </Tasks>
  </xsl:template>
</xsl:stylesheet>

Produces a sorted, grouped list of tasks by owner and
is much faster than the equivalent "scan-my-preceding" 
approach...

<?xml version = '1.0' encoding = 'UTF-8'?>
<Tasks>
   <Owner name="Dave">
      <Task>
         <Desc>Task3</Desc>
         <Owner>Dave</Owner>
      </Task>
   </Owner>
   <Owner name="Mike">
      <Task>
         <Desc>Task2</Desc>
         <Owner>Mike</Owner>
      </Task>
      <Task>
         <Desc>Task5</Desc>
         <Owner>Mike</Owner>
      </Task>
      <Task>
         <Desc>Task9</Desc>
         <Owner>Mike</Owner>
      </Task>
   </Owner>
   <Owner name="Steve">
      <Task>
         <Desc>Task1</Desc>
         <Owner>Steve</Owner>
      </Task>
      <Task>
         <Desc>Task4</Desc>
         <Owner>Steve</Owner>
      </Task>
   </Owner>
</Tasks>

For testing, here is a slower.xsl stylesheet that
does the same job without using the key() technique:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <!-- Slower.xsl -->
  <xsl:output indent="yes"/>
  <xsl:template match="/">
  <Tasks>
    <xsl:for-each
        select="/Tasks/Task[not(preceding-sibling::Task/Owner=./Owner)]/Owner">
      <xsl:sort select="."/>
      <xsl:variable name="owner" select="."/>
      <Owner for="{.}">
        <xsl:for-each select="/Tasks/Task[Owner = $owner ]">
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </Owner>
    </xsl:for-each>
  </Tasks>
  </xsl:template>
</xsl:stylesheet>

as you scale up the size of the Task.xml input file, the performance
difference can be dramatic. Try copy/pasting the elements
in the Task.xml above to creates a couple thousand <Task>
elements to give it a spin...

______________________________________________________________
Steve Muench, Lead XML Evangelist & Consulting Product Manager
Business Components for Java & XSQL Servlet Development Teams
Oracle Rep to the W3C XSL Working Group


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.