[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: Calculating groups of repeating elements
Subject: Re: Calculating groups of repeating elements
From: Quinn Dombrowski <qdombrow@xxxxxxxx>
Date: Thu, 11 Dec 2008 15:06:57 -0600
|
Thanks a ton Wendell, Michael L and Michael K! You've given me quite a
lot to chew on. I'm going to give it a shot on my real data set (a pile
of Cyrillic with extra diacritics and linguistic symbols) and let you
know how it goes.
Wendell Piez wrote:
Hi,
At 12:58 PM 12/11/2008, Michael wrote:
It seems to me that if you are wanting to collect groups of 2+ words
that appear in 2+ places, a useful first step would be to collect the
set of intersections of words occuring in every pairing of places.
This would be a large number, n(n-1)/2 for n places, but not the huge
exponent of 2 cited by Michael, and hence possibly a more direct route
to your goal.
Great! This looks like a much more useful approach to the problem!
Thanks ... I hope so.
BTW, since writing that it has also occurred to me that by declaring a
key that would return places based on descendant word elements, one
could speed up the generation of this set and avoid empty
intersections. So:
<xsl:key name="place-by-word" match="place" use=".//word"/>
<xsl:template match="atlas">
<collection>
<xsl:for-each select="place">
<xsl:variable name="first" select="."/>
<xsl:for-each select="key('place-by-word',.//word)[. << $first]">
<xsl:variable name="second" select="."/>
<common_words>
<xsl:copy-of select="$first/place_number,
$second/place_number"/>
<words>
<xsl:copy-of
select="$first/words/word[.=$second/words/word]"/>
</words>
</common_words>
</xsl:for-each>
</xsl:for-each>
</collection>
</xsl:template>
(This requires testing, of course.)
While this isn't quite what you want, the results you want could be
derived by grouping these lists further, skipping pairings that
contain less than two 'word' elements, and collecting together those
have have the same sets (and thus represent sets of words that occur
in more than two places).
Yes. But I think you must still generate the subsets, because if you
have, say, three occurrences of (a,b,c) and two of (a,b,d), you have
five occurrences of (a,b), which is interesting, if my understanding of
the requirement is correct.
This is a good point; only the OP can say if it's in scope.
(Hm: could this be done by recursing to intersect among the
intersections, dropping singleton cases along the way? The overload
warning lamp in my brain is now starting to flash.)
This continues to be interesting.
Yes, it does.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0 |
|
Atom 0.3 |
|
|