[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Muenchian grouping help - removing 'duplicates' fr

Subject: Re: Muenchian grouping help - removing 'duplicates' from anodeset
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Thu, 09 Oct 2003 09:45:32 -0500
grouping work
Laura@xxxxxxx wrote:

I think they way to do this is via Muenchian grouping. I know what I need to
do: group all the <text> elements by their text() content; and select only
the first one in each group. But I've followed the guidelines on Jeni
Tennison's XSLT pages and I can't seem to get my head around how keys
actually work.

The way to do this is with what I call the "union trick". It took me a long time to finally figure out what was going on and I realized that my barrier had been not fully understanding that the "|" operator is a set union, not a logical OR. [I was trying to understand the code Jenny Tennison had written to do back-of-the-book index processing for Docbook.]


What you do is get the current node and the first node of the current nodes' entry in the key table and then construct a set from them using the union operator ("|"). If the result is a list of length one, then the two nodes must be the same node because if they were different nodes you'd get a set of length 2. The key is that sets, by definition, always contain exactly one copy of each node in the set.

So, given this group spec:

<xsl:key name="text-by-content" match="text" use="normalize-space(.)"
/>

You would do something like this:

<xsl:variable name="text-items"
select="//term[count(.|key('text-by-content',


normalize-space(.))[1]) = 1]"/>

Follow this from the inside out:

1. key('text-by-content',
       normalize-space(.))[1]

This looks up the key table entry for each term selected by the "//term" pattern and then selects the first item in that list, that is, the first instance of a given term value.

2. ".|key(...)[1]"

This creates a set from the current node and the first node of the key table entry that contains the current node.

3. count(.|key(...)[1])

This gets the length of the set.

4. count(...) = 1

This returns true if the length of the set is 1, meaning that the current <term> node is the first node in its containing key table entry. This node will be selected and added to the result node list.

You can test the result by doing this:

<xsl:for-each select="$text-items">
<xsl:message>[<xsl:value-of select="position(.)"/>] = '<xsl:value-of select="."/>'</xsl:message>
</xsl:for-each>


When doing this type of grouping work, I find it really useful to create a "debug" template that just constructs all the different groups and then reports them--makes it easier to work out the details of the key specs and lookups. If you're doing sorting, it also makes it easy to test your collation rules.

Cheers,

Eliot


-- W. Eliot Kimber ISOGEN International, LLC eliot@xxxxxxxxxx www.isogen.com


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.