[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Calculating groups of repeating elements

Subject: Re: Calculating groups of repeating elements
From: Quinn Dombrowski <qdombrow@xxxxxxxx>
Date: Wed, 10 Dec 2008 18:42:07 -0600
Re:  Calculating groups of repeating elements
Hi Ken,

Thanks for looking at this! Sorry for not being as clear as I could've been with what I'm looking for. For the example data set, I'm trying to automatically generate an output something like this:

Aa + C + Qqq: 2 places (1, 3)
Aa + C: 3 places (1, 2, 3)
Aa + Zz: 2 places (2, 3)
C + Qqq: 2 places (1, 3)

So it lists all the groups of 2+ words that appear together in 2+ places. This list is sorted by length of the group (3 words is the maximum number of words that occurs in 2+ places in the sample data), but it'd be nice to also be able to sort by number of places:

Aa + C: 3 places (1, 2, 3)
Aa + Zz: 2 places (2, 3)
etc.

I was using intersects to get places with Aa AND C AND Qqq (<xsl:value-of select="count(atlas/place/place_number[../words/word='Aa'] intersect atlas/place/place_number[../words/word='C'])"/>), but got overwhelmed by the number of ways I'd have to plug in all the different words to go through the data exhaustively-- the real data has 250+ places and 75+ words.



G. Ken Holman wrote:
At 2008-12-10 14:15 -0600, Quinn Dombrowski wrote:
I'm trying to calculate all of the groups of 2+ elements (in the sample data below, words) that appear together in more than one place. Ideally, I'd like to be able to sort descending both by length of group (5-word group, 4-word groups, etc), and by number of places the groups occur (100 places, 99 places, etc.) I also need to be able to list the place numbers where they occur.

You don't show how these places are to be listed, so I guessed.


I started doing it manually this way but the number of possible combinations quickly became too big a task:

<xsl:template match="/">
<xsl:value-of select="count(atlas/place/place_number[../words/word='Aa'] intersect atlas/place/place_number[../words/word='C'])"/>
</template>
(adding more "intersects" as necessary, and getting rid of the "count" to see the place numbers)

Not sure where you are going with the intersects, so I approached this as a grouping problem.


Here's a sample of the data. Almost every word appears in multiple places, but each appears only once in the index, which I've used in other applications for matching to avoid re-calculating stats for the word over and over. Any help would be wonderful!

I hope the code below helps, though I am a bit unclear on what you want so my comments should reveal what I think you want.


. . . . . . . Ken


T:\ftemp>type quinn.xml <atlas> <place> <place_number>1</place_number> <words> <word>Aa</word> <word>C</word> <word>Qqq</word> </words> </place>

<place>
<place_number>2</place_number>
<words>
<word>Aa</word>
<word>Bbbb</word>
<word>C</word>
<word>W</word>
<word>Zz</word>
</words>
</place>

<place>
<place_number>3</place_number>
<words>
<word>Aa</word>
<word>C</word>
<word>Bb</word>
<word>Qqq</word>
<word>Wwww</word>
<word>Zz</word>
</words>
</place>
</atlas>

T:\ftemp>type quinn.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output indent="yes"/>

<!--keep track for counting purposes-->
<xsl:key name="words" match="word" use="substring(.,1,1)"/>

<xsl:template match="atlas">
  <!--process the document element as is-->
  <xsl:next-match/>
  <!--add an index at the end-->
  <index>
    <!--basing the "underlying word" as the first character-->
    <xsl:for-each-group select="//word" group-by="substring(.,1,1)">
      <!--sort descending by the number of words in the group-->
      <xsl:sort select="count(key('words',substring(.,1,1)))"
                order="descending"/>
      <!--sort descending by the number of places for the word group-->
      <xsl:sort select="count(key('words',substring(.,1,1))/../..)"
                order="descending"/>
      <!--create the index entry for the word group-->
      <index_entry>
        <!--embed some diagnostics-->
        <xsl:comment select="current-grouping-key(),'=',
                             'Words:',count(current-group()),
                             'Places:',count(current-group()/../..)"/>
        <xsl:text>
</xsl:text>
        <!--what underlying word are we at?-->
        <underlying_word>
          <xsl:value-of select="current-grouping-key()"/>
        </underlying_word>
        <!--which words are related?-->
        <xsl:for-each-group select="current-group()" group-by=".">
          <word><xsl:value-of select="."/></word>
        </xsl:for-each-group>
        <!--where are these words used?-->
        <places>
          <xsl:for-each select="current-group()/../..">
            <place><xsl:value-of select="place_number"/></place>
          </xsl:for-each>
        </places>
      </index_entry>
    </xsl:for-each-group>
  </index>
</xsl:template>

<xsl:template match="@*|node()"><!--identity for all other nodes-->
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>
T:\ftemp>call xslt2 quinn.xml quinn.xsl quinn.out

T:\ftemp>type quinn.out
<?xml version="1.0" encoding="UTF-8"?>
<atlas>
   <place>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Qqq</word>
      </words>
   </place>

   <place>
      <place_number>2</place_number>
      <words>
         <word>Aa</word>
         <word>Bbbb</word>
         <word>C</word>
         <word>W</word>
         <word>Zz</word>
      </words>
   </place>

   <place>
      <place_number>3</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Bb</word>
         <word>Qqq</word>
         <word>Wwww</word>
         <word>Zz</word>
      </words>
   </place>
</atlas>
<index>
   <index_entry><!--A = Words: 3 Places: 3-->
<underlying_word>A</underlying_word>
      <word>Aa</word>
      <places>
         <place>1</place>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--C = Words: 3 Places: 3-->
<underlying_word>C</underlying_word>
      <word>C</word>
      <places>
         <place>1</place>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--Q = Words: 2 Places: 2-->
<underlying_word>Q</underlying_word>
      <word>Qqq</word>
      <places>
         <place>1</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--B = Words: 2 Places: 2-->
<underlying_word>B</underlying_word>
      <word>Bbbb</word>
      <word>Bb</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--W = Words: 2 Places: 2-->
<underlying_word>W</underlying_word>
      <word>W</word>
      <word>Wwww</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--Z = Words: 2 Places: 2-->
<underlying_word>Z</underlying_word>
      <word>Zz</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
</index>


-- Upcoming XSLT/XSL-FO, UBL and code list hands-on training classes: : Sydney, AU 2009-01/02; Brussels, BE 2009-03; Prague, CZ 2009-03 Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video Video sample lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg Video course overview: http://www.youtube.com/watch?v=VTiodiij6gE G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/s/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.