Re: [xsl] Muenchian grouping help - removing 'duplicates' fr

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: Muenchian grouping help - removing 'duplicates' from anodeset
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Thu, 09 Oct 2003 09:45:32 -0500

Laura@xxxxxxx wrote:

I think they way to do this is via Muenchian grouping. I know what I need to do: group all the <text> elements by their text() content; and select only the first one in each group. But I've followed the guidelines on Jeni Tennison's XSLT pages and I can't seem to get my head around how keys actually work.

The way to do this is with what I call the "union trick". It took me a long time to finally figure out what was going on and I realized that my barrier had been not fully understanding that the "|" operator is a set union, not a logical OR. [I was trying to understand the code Jenny Tennison had written to do back-of-the-book index processing for Docbook.]

What you do is get the current node and the first node of the current nodes' entry in the key table and then construct a set from them using the union operator ("|"). If the result is a list of length one, then the two nodes must be the same node because if they were different nodes you'd get a set of length 2. The key is that sets, by definition, always contain exactly one copy of each node in the set.

So, given this group spec:

<xsl:key name="text-by-content" match="text" use="normalize-space(.)"
/>

You would do something like this:

<xsl:variable name="text-items" select="//term[count(.|key('text-by-content',

normalize-space(.))[1]) = 1]"/>

Follow this from the inside out:

1. key('text-by-content',
       normalize-space(.))[1]

This looks up the key table entry for each term selected by the "//term" pattern and then selects the first item in that list, that is, the first instance of a given term value.

2. ".|key(...)[1]"

This creates a set from the current node and the first node of the key table entry that contains the current node.

3. count(.|key(...)[1])

This gets the length of the set.

4. count(...) = 1

This returns true if the length of the set is 1, meaning that the current <term> node is the first node in its containing key table entry. This node will be selected and added to the result node list.

You can test the result by doing this:

<xsl:for-each select="$text-items"> <xsl:message>[<xsl:value-of select="position(.)"/>] = '<xsl:value-of select="."/>'</xsl:message> </xsl:for-each>

When doing this type of grouping work, I find it really useful to create a "debug" template that just constructs all the different groups and then reports them--makes it easier to work out the details of the key specs and lookups. If you're doing sorting, it also makes it easy to test your collation rules.

Cheers,

Eliot


--
W. Eliot Kimber
ISOGEN International, LLC
eliot@xxxxxxxxxx
www.isogen.com

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list

Current Thread
Muenchian grouping help - removing 'duplicates' from a nodeset Laura - Thu, 9 Oct 2003 09:30:40 -0400 (EDT) W. Eliot Kimber - Thu, 9 Oct 2003 10:47:04 -0400 (EDT) <= Mukul Gandhi - Thu, 9 Oct 2003 11:08:34 -0400 (EDT) Michael Kay - Thu, 9 Oct 2003 11:37:00 -0400 (EDT)

<- Previous	Index	Next ->
Muenchian grouping help - rem, Laura	Thread	Re: Muenchian grouping help -, Mukul Gandhi
RE: Re: Problems , Kaine Varley	Date	Re: RE: Re: , andrew . curry
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >