[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Keys and select distinct - is that the solution ?

Subject: Re: Keys and select distinct - is that the solution ?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 06 Jun 2006 15:06:54 -0400
select distinct times out
Christian,

If I understand your requirements correctly,
solution 1 is nearly there; you just have to add
in the facility of de-duplicating your codes
before you call the key() function for their
names. This could be done either by using a
predicate on your select expression (which would
filter out all but the first occurrences of
values assigned to ManureTypeCollection), or an
explicit xsl:if test inside the for-each.

Solution 2 relies on the key() function itself to
perform the de-duplication of values of
ManureTypeCode. Passing a given value to the
key() function multiple times is fine, since the
resulting set will only have single instances of
whatever nodes are returned (the ManureTypeName
referred to by that value of ManureTypeCode). I
don't see any reason why this shouldn't work just
fine here; in fact it's a fairly elegant approach to the problem.

Cheers,
Wendell

once you have deduplicated At 07:01 AM 6/6/2006, you wrote:
Hi Wendell, (and others)

Thank you very much for a very thorough answer. I think it starts to
fall into place.... However it would still be beneficial for me to go
through - as you suggest - a simple extract from the project.

As suggested, I've included a simple XML instance and XSL stylesheet.
The stylesheet consists of to template matches:

The method 1 seems to me to be the logical approach. Match the
ManureTypeCollection and iterate over each
ManureTypeStructure/ManureTypeCode. For each code use the key to look
up the corresponding ManureTypeName. The problem here is that the same
code is being looked up twice and returned twice, which should only be
once.

in Method 2 (a colleagues tip) the result is actually what I want -
the names are just returned once! but the approach seems not right to
me - it seems to work the other way around, by first matching the
lookup names, and returning them if a corresponding code is found,
that is not optimal is it?

It would be great if someone could describe to me:

1. the best way of returning the ManureTypeNames once. (comment on
method 1 and 2)
2. describe the code line by line especially if it uses the Munchean method

on beforehand thanks a lot!

- Christian



XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"

xmlns:eih="http://rep.oio.dk/glrchr.dk/eih/xml/schemas/2005/03/01/"


xmlns:gr="http://rep.oio.dk/glrchr.dk/goedningsregnskab/xml/schemas/2006/05/
01/"
   >
   <xsl:key name="ManureType" match="gr:ManureTypeName"
use="../gr:ManureTypeCode"/>


<xsl:template match="/"> root is matchet! <xsl:apply-templates select="eih/eih:ManureTypeCollection"/> </xsl:template>

   <!-- METHOD 1 - writes the text twice, returns:    AjleAjleFast
gxdningFast gxdning -->
   <xsl:template match="eih:ManureTypeCollection">
       <xsl:for-each select="eih:ManureTypeStructure/gr:ManureTypeCode">
           <xsl:value-of select="key('ManureType',node())"/>
       </xsl:for-each>
   </xsl:template>

   <!-- METHOD 2 - writes out the text once, as wanted, returns:
Fast gxdning,  Ajle, -->
   <xsl:template match="eih:ManureTypeCollection">
           <xsl:for-each select="key('ManureType',
eih:ManureTypeStructure/gr:ManureTypeCode)">
               <xsl:value-of select="node()"/>
               <xsl:if test="not(position()='last')"><xsl:text>,
</xsl:text></xsl:if>
           </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>



XML instance:
<?xml version="1.0" encoding="UTF-8"?>
<eih    xmlns:eih="http://rep.oio.dk/glrchr.dk/eih/xml/schemas/2005/03/01/"

xmlns:gr="http://rep.oio.dk/glrchr.dk/goedningsregnskab/xml/schemas/2006/05/
01/">
   <!-- Codes and data  -->
   <eih:ManureTypeCollection>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>5</gr:ManureTypeCode>
           <gr:ElementIdentifier>N</gr:ElementIdentifier>
           <gr:ElementQuantity>17.0</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>5</gr:ManureTypeCode>
           <gr:ElementIdentifier>P</gr:ElementIdentifier>
           <gr:ElementQuantity>0.6</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>4</gr:ManureTypeCode>
           <gr:ElementIdentifier>N</gr:ElementIdentifier>
           <gr:ElementQuantity>17.5</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>4</gr:ManureTypeCode>
           <gr:ElementIdentifier>P</gr:ElementIdentifier>
           <gr:ElementQuantity> 6.3</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>3</gr:ManureTypeCode>
           <gr:ElementIdentifier>N</gr:ElementIdentifier>
           <gr:ElementQuantity> 65.3</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>3</gr:ManureTypeCode>
           <gr:ElementIdentifier>P</gr:ElementIdentifier>
           <gr:ElementQuantity> 26.3</gr:ElementQuantity>
       </eih:ManureTypeStructure>
       <eih:ManureTypeStructure>
           <gr:ManureTypeCode>3</gr:ManureTypeCode>
           <gr:ElementIdentifier>P</gr:ElementIdentifier>
           <gr:ElementQuantity> 16.3</gr:ElementQuantity>
       </eih:ManureTypeStructure>
   </eih:ManureTypeCollection>

   <!-- look up information for the codes -->
   <eih:XImanureTypeCollection>
       <eih:XImanureTypeStructure>
           <gr:ManureTypeCode>4</gr:ManureTypeCode>
           <gr:ManureTypeName>Fast gxdning</gr:ManureTypeName>
       </eih:XImanureTypeStructure>
       <eih:XImanureTypeStructure>
           <gr:ManureTypeCode>5</gr:ManureTypeCode>
           <gr:ManureTypeName>Ajle</gr:ManureTypeName>
       </eih:XImanureTypeStructure>
   </eih:XImanureTypeCollection>
</eih>
















On 6/5/06, Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> wrote:
Hi Christian,

At 07:30 PM 6/2/2006, you wrote:
>I have now tried the solutions, but none of them works.

Actually, I kind of doubt that. :-> What you have tried is either an
attempt at solving the problem blind, posted by contributors (me) who
worked with a partial data set and partial problem description, or
attempts of your own at patching such code.

Believe me, "the solution" works just fine. You just haven't figured
out how to write it yet, and neither have we. This doesn't mean that
the solution is not known -- we'ver written it plenty of times
before, just not fitted for your particular problem (which we
nevertheless recognize as a member of the species).

>Actually I dont think I need to use the generic_id, do I?
>Because I don't need to make all the elements unique!!? As far as I
>can see, I only have to pick out all the distinct codes.

The generate-id() idiom I suggested is not for the purposes of
"making an element unique". It is merely a way of checking whether
one node is the same node as another node. Consider this document:

<a>
   <b>100</b>
   <b>100</b>
</a>

Are /a/b[1] and /a/b[2] the same node? No.

How does a stylesheet know this? It can't tell by comparing their
names: they're both named 'b'. Nor by comparing their values, which
are both '100'.

It would be possible to write a template that produced for each node
a unique identifier, which we could compare. For example, it could
generate for the first b node the identifier "/a/b[1]" and for the
second, "/a/b[2]". We could compare these strings to establish the
two nodes are not the same node.

Or, since generate-id() generates, for any node, an identifier that
is unique to the node, we could just use this function, and not have
to write that template.

Or, there's another way to test whether these are the same. Say we have

<xsl:variable name="first-b" select="/descendant::b[1]"/>

<xsl:template match="b">
   <xsl:choose>
   <xsl:when test="count(.|$first-b)=1">This b is the first</xsl:when>
   <xsl:otherwise>This b is not the first</xsl:otherwise>
</xsl:template>

Using generate-id() instead, we could say

<xsl:template match="b">
   <xsl:choose>
   <xsl:when test="generate-id() = generate-id($first-b)">This b is
the first</xsl:when>
   <xsl:otherwise>This b is not the first</xsl:otherwise>
</xsl:template>

which also works.

Either of these can be applied to solve the problem of "am I a unique
representative of a given group of nodes", which is part of the
grouping problem. (And David C is correct: yours is a grouping problem.)

>By doing that I do have to match on the content of the node, and not
>the element name, right!?

Actually you match on a node, not on its content or name.

We do match nodes *by* name. Indeed this is the normal way of doing
it. In XSLT 1.0 it's not possible to match nodes with templates based
on their content.

>If I match on the content/text of the node
>couldn't I say something like take all the elements whose content is
>not in any preeceding sibling content ???

You could match a node and test to see if its content appeared on a
preceding element or preceding-sibling element, yes. And indeed, that
is a solution available to us for grouping. But: it is a slow
solution with poor performance; it doesn't scale well to even
medium-sized data sets.

It's much quicker to do something like

<xsl:template match="b">
   <xsl:variable name="bs-like-this"
     select="/descendant::b[.=current()]">
   <xsl:if test="generate-id()=generate-id($bs-like-this[1])">
     <xsl:text>I'm a b; my content is </xsl:text>
     <xsl:apply-templates/>
   </xsl:if>
</xsl:template>

Instead of using the painful traversal along the preceding axis, this
template works like this:

1. Bind to a variable all the 'b' nodes in the document whose content
    is the same as the b node matched
2. Test to see whether the b node matched is the first of the nodes
    bound to the variable; if it is, report its content

If we can do this, then grouping all the bs by content (*not* by
name) is as simple as processing all the bs bound to the variable in
step 2. This is a trivial tweak to what I just wrote above (which I
leave it to you to figure out).

This is still slow, however, since for every b matched by the
template we have to assemble the set /descendant::b[.=current()],
which entails looking through the entire document. Accordingly, for
this we usually use keys (this was Steve Muench's contribution to the
method), since keys are pre-indexed and hence, fast:

<xsl:variable name="bs-like-this" select="key('bs-by-value',.)"/>

which grabs those nodes without having to traverse the entire tree.

In this case the key 'bs-by-value' would index the 'b' nodes by their
content (value):

<xsl:key name="bs-by-value" match="b" use="."/>

If you really want to pursue a solution based on checking backwards
along the preceding:: axis, we can help with that. By pointing you to
the grouping solutions (which build on what I just showed you above),
we are trying to skip you past that point, since it's not the best
solution available.

If you need more help disentangling this, please feel free to post
again. But when you do, post your sample code again please, so we can
point the way using examples that make sense.

Good luck,
Wendell

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.