[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Grouping elements that have at least one common va

Subject: Re: Grouping elements that have at least one common value
From: "Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 26 Jun 2023 06:05:56 -0000
Re:  Grouping elements that have at least one common va
Hi all,

I go ahead with Martin's solution and have implemented all business rules
around that "grouping".
It's fast but I realized that it can generate duplicated groups on my big
file, which is quite a problem (some people in my company will have to
spend avec 60 days working on that output as an Excel file)

It's not that easy to reproduce but for example when I have this input :
<FORMS>
    <GRCHOIX>
        <CHOIX CODE="choix-10"/>
        <CHOIX CODE="choix-11"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-14"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-15"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-2"/>
        <CHOIX CODE="choix-8"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-3"/>
        <CHOIX CODE="choix-5"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-22"/>
        <CHOIX CODE="choix-3"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-10"/>
        <CHOIX CODE="choix-13"/>
        <CHOIX CODE="choix-18"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-11"/>
        <CHOIX CODE="choix-16"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-16"/>
    </GRCHOIX>
</FORMS>

The output had duplicated GROUP "choix-10/choix-13/choix-18".

<FORMS>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-11"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-14"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-15"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-13"/>
         <CHOIX CODE="choix-18"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-11"/>
         <CHOIX CODE="choix-16"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-16"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-2"/>
         <CHOIX CODE="choix-8"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-3"/>
         <CHOIX CODE="choix-5"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-22"/>
         <CHOIX CODE="choix-3"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-13"/>
         <CHOIX CODE="choix-18"/>
      </GRCHOIX>
   </GROUP>
   <GROUP/>
</FORMS>

I tried to figure out why, but there is something I don't understand in the
algorithm :
- xsl:iterate make an iteration on $groups elements.
- when going into the xsl:otherwise it creates a <GROUP> output.
Does the "grouping" strategy depends on element order ? Or the same GROUP
element might still be fed at the next iteration (unlike xsl:for-each ?)

I also give a try to Michael transitive closure algorithm (see next mail)

Cheers
Matthieu

Le lun. 19 juin 2023 C  22:44, Joel Kalvesmaki director@xxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> a C)crit :

> Hi Matthieu,
>
> Currently TAN is a static download, either through github or the
> website. Making it available through package repos is a future to-do
> item, as well as better organization into subpackages and breaking out
> dependencies. The license was designed to encourage other developers to
> develop their own variations on the code, as needed.
>
> A new function proposed for XPath 4.0, currently transitive-closure()
> (name under discussion, https://github.com/qt4cg/qtspecs/issues/554), is
> likely to make this task more tractable, and concisely expressed.
>
> Best wishes,
>
> jk
>
>
> On 2023-06-19 01:38, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx wrote:
> > Hi Joel,
> >
> > Thanks for the link to Tan library. I'm not sure I can use it for my
> > purpose, because it groups the text content of child nodes. But I
> > guess I could adapt my input or the function code.
> >
> > BTW it looks like TAN functions use a lot of other TAN functions,
> > which means I should get the whole TAN lib to make it work on my
> > project.
> >
> > How is it distributed ? Using http might probably work but it's not
> > that safe when running on a server of my company that might not be
> > connected to the internet (or with proxy restrictions for example). Is
> > TAN library published as a Maven artifact of something like that ?
> >
> > Anyway Martin's solution works really fine and performances are really
> > good so I guess I will stay on this solution for my project.
> >
> > Thanks again Martin !
> >
> > Now I have to deal with business rules around this "grouping" :)
> >
> > Thank you all for your time,
> >
> > Cheers
> >
> > Matthieu
> >
> > Le ven. 16 juin 2023 C  16:54, Joel Kalvesmaki director@xxxxxxxxxxxxx
> > <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> a C)crit :
> >
> >> Hi Matthieu,
> >>
> >> You may want to look at tan:group-elements-by-shared-node-values().
> >>
> >> Overview:
> >>
> >
>
https://textalign.net/release/TAN-2021/guidelines/xhtml/ch13s02.xhtml#functio
n-group-elements-by-shared-node-values
> >>
> >> Code (starting line 272):
> >>
> >
>
https://github.com/textalign/TAN-2021/blob/master/functions/nodes/TAN-fn-node
s-standard.xsl
> >>
> >> Joel
> >>
> >> On 2023-06-16 05:09, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx
> >> wrote:
> >>> Hi all,
> >>>
> >>> I need to group elements that have at least one common value :
> >>>
> >>> <FORMS>
> >>>
>
> --
> Joel Kalvesmaki
> Director, Text Alignment Network
> http://textalign.net
>
>
>

--
Matthieu Ricaud-Dussarget
+33 6.63.25.95.58

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.