[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Calculating groups of repeating elements

Subject: Re: Calculating groups of repeating elements
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 11 Dec 2008 15:18:30 -0500
Re:  Calculating groups of repeating elements
Hi,

At 12:58 PM 12/11/2008, Michael wrote:
It seems to me that if you are wanting to collect groups of 2+ words
that appear in 2+ places, a useful first step would be to collect the
set of intersections of words occuring in every pairing of places.
This would be a large number, n(n-1)/2 for n places, but not the huge
exponent of 2 cited by Michael, and hence possibly a more direct route
to your goal.

Great! This looks like a much more useful approach to the problem!

Thanks ... I hope so.


BTW, since writing that it has also occurred to me that by declaring a key that would return places based on descendant word elements, one could speed up the generation of this set and avoid empty intersections. So:

<xsl:key name="place-by-word" match="place" use=".//word"/>

<xsl:template match="atlas">
    <collection>
      <xsl:for-each select="place">
        <xsl:variable name="first" select="."/>
        <xsl:for-each select="key('place-by-word',.//word)[. << $first]">
          <xsl:variable name="second" select="."/>
          <common_words>
            <xsl:copy-of select="$first/place_number, $second/place_number"/>
            <words>
              <xsl:copy-of select="$first/words/word[.=$second/words/word]"/>
            </words>
          </common_words>
        </xsl:for-each>
      </xsl:for-each>
    </collection>
</xsl:template>

(This requires testing, of course.)

While this isn't quite what you want, the results you want could be
derived by grouping these lists further, skipping pairings that
contain less than two 'word' elements, and collecting together those
have have the same sets (and thus represent sets of words that occur
in more than two places).

Yes. But I think you must still generate the subsets, because if you have, say, three occurrences of (a,b,c) and two of (a,b,d), you have five occurrences of (a,b), which is interesting, if my understanding of the requirement is correct.

This is a good point; only the OP can say if it's in scope.


(Hm: could this be done by recursing to intersect among the intersections, dropping singleton cases along the way? The overload warning lamp in my brain is now starting to flash.)

This continues to be interesting.

Yes, it does.


Cheers,
Wendell



======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.