select immediately following siblings with constraints

Play the video

Subject: select immediately following siblings with constraints?
From: "James A. Robinson" <jim.robinson@xxxxxxxxxxxx>
Date: Wed, 22 Feb 2006 23:23:14 -0800

Hello,

This is a very long message, and I apologize for that.  I've split
this question into a short-ish version followed by a very long
version.  I am using XSL 2.0 draft syntax (as implemented by Saxon).

The short version of my question: Is it possible to write an XPath
expression which selects the immediately adjacent following-siblings
which adhere to a specific criteria?  I'm using a combination of
xsl:if and xsl:for-each-group (with group-adjacent), but I am
wondering if there is a way to do it more directly.

Say I'm processing a list of elements and I want the 'a' elements
which are adjacent to my current context:

  <a id="1"/>
  <b id="2"/>
  <a id="3"/>
  <a id="4"/>
  <d id="5"/>
  <a id="6"/>

if current() is a[@id="1"], I want my XPath expression to evaluate to
nothing since the element following current()
(following-sibling::element()[1]) is '<b id="2"/>'.

If my input sequence was

  <a id="1"/>
  <a id="3"/>
  <a id="4"/>
  <d id="5"/>
  <a id="6"/>

and current() is a[@id="1"], I'd want the expression to select the
sequence

  <a id="3"/>
  <a id="4"/>

I would not want it to include <a id="6"/>, since the chain was
interrupted by '<d id="5"/>'.  As I mentioned above, I'm currently
handling this using combination of xsl:if and xsl:for-each-group.  Is
there another way?

If you have time to be looking over the long-winded version, I've
tried to outline what I'm working on and where I'm getting confused on
how to Do Things Properly.

I'm working out a stylesheet to perform a rather nasty operation:
turning two sets of information selected from an SQL database into a
hierarchy.  I'm dealing with the aspect of performing a merge of
adjacent nodes, and was hoping folks here would be willing to look my
logic over and tell me if there are flaws, or if there is a much
simpler way to handle the problem.

I've got the following sort of input (simplified), which as you can
see is in a rather haphazard order.

  <input>
    <sections>
      <!-- randomly ordered sections, we do know that all tier_num
           values greater than 1 will have a parent defined.  -->
      <section id="1.2" tier_num="2" parent_id="1">Section CDG</section>
      <section id="1.1" tier_num="2" parent_id="1">Section ABC</section>
      <section id="2" tier_num="1">Section XYZ</section>
      <section id="1" tier_num="1">Section ZXY</section>
    </sections>
    <articles>
      <!-- articles are ordered as they should appear in output -->
      <article id="1" section="1.2"/>
      <article id="2" section="2"/>
      <article id="3" section="1.1"/>
      <article id="4" section="1"/>
      <article id="5" section="1.2"/>
    </articles>
  </input>

The real input can actually have an arbitrary number of tiered
sections going to an arbitrary depth.  Given the sample input, I need
to output the articles in the listed order with the proper section
information:

  <output>
    <section>
      <heading>Section ZXY</heading>
      <section>
        <heading>Section CDG</heading>
        <article id="1"/>
      </section>
    </section>
    <section>
      <heading>Section XYZ</heading>
      <article id="2"/>
    </section>
    <section>
      <heading>Section ZXY</heading>
      <section>
        <heading>Section ABC</heading>
        <article id="3"/>
      </section>
      <article id="4"/>
      <section>
        <heading>Section CDG</heading>
        <article id="5"/>
      </section>
    </section>
  </output>

I've written a stylesheet which does this transformation, but I'm
worried about both efficiency and about whether or not I'm missing a
more obvious way to solve the problem.

My current working solution involves creating a series of node sets as
variables, each variable annotated with more information and/or
nesting than the previous, I've broken it down into the following
variable creation steps:

  1) generate list of sections with article children,
  2) fill in any parent sections which are missing,
  3) nest sections per parent_id,
  4) merge sequential sections which share a section id,

The first 2 steps are dealing with lists of sections.  I'm using
xsl:for-each-group with the group-adjacent selector to merge adjacent
sets of articles into single groups, and wrapping each group in its
appropriate section.  I'm then taking this flat list of sections and
inserting any missing sections (ones which did not get created on the
first pass because they didn't happen to have an associated article in
them).  I'm left with a flat list of sections:

 <section id="1" tier_num="1"/>
 <section id="1.2" tier_num="2" parent_id="1">
   <article id="1"/>
 </section>
 <section id="2" tier_num="1">
   <article id="2"/>
 </section>
 <section id="1" tier_num="1"/>
 <section id="1.1" tier_num="2" parent_id="1">
   <article id="3"/>
 </section>
 <section id="1" tier_num="1">
   <article id="4"/>
 </section>
 <section id="1.2" tier_num="2" parent_id="1">
   <article id="1.2"/>
 </section>

My next step is to nest them:
  
  <xsl:variable name="nested_sections">
    <xsl:apply-templates select="$all_sections/section[@tier_num = 1]" mode="nest_sections" />
  </xsl:variable>
  ...
  <xsl:template match="section" mode="nest_sections" as="element(section)">
    <xsl:copy>
      <xsl:sequence select="@*|*" />
      <xsl:apply-templates
        select="following-sibling::section[ 
                  (@tier_num = (current()/@tier_num + 1))
                   and (current() is preceding-sibling::section[@tier_num=current()/@tier_num][1])]"
        mode="#current" />
    </xsl:copy>
  </xsl:template>

which leaves me with

 <section id="1" tier_num="1">
   <section id="1.1" tier_num="2" parent_id="1">
     <article id="1"/>
   </section>
 </section>
 <section id="2" tier_num="1">
   <article id="2"/>
 </section>
 <section id="1" tier_num="1">
   <section id="1.1" tier_num="2" parent_id="1">
     <article id="3"/>
   </section>
 </section>
 <section id="1" tier_num="1">
   <article id="4"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="1.2"/>
   </section>
 </section>

I then merge adjacent sections which share the same section id.  I
came up with two solutions.  I believe the first one is flawed, and
I'm unsure about the efficiency of the second.  I'm calling both
solutions with a named template which recurses if it detects that
adjacent sections with identical ids remain:

  <xsl:variable name="merged_sections">
    <xsl:call-template name="merge_sections">
      <xsl:with-param name="sections" select="$nested_sections"/>
    </xsl:call-template>
  </xsl:variable>
  ...
 <xsl:template name="merge_sections">
    <xsl:param name="sections" as="document-node()"/>
    <xsl:choose>
      <xsl:when test="exists($sections//section[@id = preceding-sibling::element()[1][self::section]/@id])">
        <xsl:call-template name="merge_sections">
          <xsl:with-param name="sections">
            <xsl:apply-templates select="$sections" mode="merge_sections"/>
          </xsl:with-param>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="$sections"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

As you can see, I basically tried to construct a "while" loop which
tested for adjacent sections with identical ids.  I used the XPath
expression

  //section[@id = preceding-sibling::element()[1][self::secton]/@id]

instead of

  //section[@id = preceding-sibling::section[1]/@id]

to handle a case where two sections with identical ids are separated
by one or more articles.  For example:

 <section id="1" tier_num="1">
   <article id="1"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="2"/>
   </section>
   <article id="3"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="4"/>
   </section>
 </section>

This is a case where I do not want to perform a merge, as it would
result in the articles being listed out of order (article 3 would end
up coming after article 4).

To handle the actual merging, I came up with two solutions, the first
has same sort of problem I was trying to deal with in my "while" loop
test (using [1][self::section]).  I'm at a loss at how to only select
immediately adjacent following-siblings (note the plural).  My 1st
solution fails to account for that when calling xsl:apply-templates:

  <!-- solution #1, discarded as being incorrect -->
  <xsl:template match="section" mode="merge_sections" as="element(section)?">
    <!--
      If the immediately preceding element is a section which has the
       same id, it should have already been handled. Otherwise we want
       to copy it.
    -->
    <xsl:if test="not(preceding-sibling::element()[1][self::section][@id=current()/@id])">
      <xsl:copy>
        <xsl:sequence select="@*" />
        <xsl:apply-templates select="*" mode="#current"/>
          <!--
            Merge in any adjacent sections which have the same id
          -->
         <xsl:apply-templates select="
           following-sibling::section[
             (@id = current()/@id)
             and (current() is preceding-sibling::section[@id != current()/@id][1]/following-sibling::section[1])]/*"
          mode="#current"/>
    </xsl:copy>
    </xsl:if>
  </xsl:template>

My second solution combines xsl:if and xsl:for-each-group to try and
ensure we only operate on the following-sibling section(s) if they are
immediately adjacent and share the same section id:

  <!-- solution #2, but is it inefficient? -->
  <xsl:template match="section" mode="merge_sections" as="element(section)?">
    <!--
      If the immediately preceding element is a section which has the
      same id, it should have already been handled. Otherwise we want
      to copy it.
    -->
    <xsl:if test="not(preceding-sibling::element()[1][self::section][@id=current()/@id])">
      <xsl:copy>
        <xsl:sequence select="@*" />
        <xsl:apply-templates select="*" mode="#current"/>
        <xsl:variable name="context" select="current()"/>
          <!--
            Merge in any adjacent sections which have the same id
          -->
        <xsl:if test="following-sibling::element()[1][self::section][@id = current()/@id]">
          <xsl:for-each-group select="following-sibling::section" group-adjacent="@id">
            <xsl:if test="$context is current-group()[1]/preceding-sibling::element()[1]
                          and $context/@id = current-group()[1]/@id">
              <xsl:apply-templates select="current-group()/*" mode="#current"/>
            </xsl:if>
          </xsl:for-each-group>
        </xsl:if>
    </xsl:copy>
    </xsl:if>
  </xsl:template>

So, if I understand what I am doing correctly, I am capturing the
groups of following-sibling sections, then testing each grouping to
see if the sequence is immediately adjacent to my $context, and that
it has the same id.

Is there a better way, along the lines of

  ./following-sibling::element()[1][self::section]
  
where we are not limited to just the first element?  Where we can
select all of the consecutive following-sibling section elements?  Is
it wasteful to be running this grouping operation when all I really
want are just the next handful of section elements?

Do you folks see a simpler way to approach the problems?  Am I going
about this the wrong way?

Thank you for your time,

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       jim.robinson@xxxxxxxxxxxx
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)

Current Thread
select immediately following siblings with constraints? James A. Robinson - 23 Feb 2006 07:23:40 -0000 <= Wendell Piez - 23 Feb 2006 18:08:05 -0000 Michael Kay - 23 Feb 2006 19:13:47 -0000 James A. Robinson - 23 Feb 2006 18:55:03 -0000 Wendell Piez - 23 Feb 2006 23:37:35 -0000

<- Previous	Index	Next ->
RE: Fwd: Five xdt:* types ren, Michael Kay	Thread	Re: select immediately follow, Wendell Piez
Fwd: Five xdt:* types renamed, Dimitre Novatchev	Date	Grouping and calculating a sc, v . witzel
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >