[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: exercise in complex grouping

Subject: Re: exercise in complex grouping
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 12 May 2020 09:45:25 -0000
Re:  exercise in complex grouping
Am 12.05.2020 um 11:34 schrieb Syd Bauman s.bauman@xxxxxxxxxxxxxxxx:
I have a moderately sizable TEI file (~31,000 text nodes with ~100,400
"words" or ~688,000 characters; ~20,000 elements, ~15,000 attributes).
Somewhere in all that mess there are a few pairs of elements for which
I need some special processing.

Say each pair is an <A> and a <B>. I can find each <B> by XPath quite
trivially. In addition, for every pair, <B> has a @target that points
to the corresponding <A> via a bare name identifier URL. Furthermore,
every <B> in the document is part of such a pair. (Which is why it is
so trivial to find them via XPath. The same can not be said for <A>:
there are *lots* of <A> elements that are not part of an <A>-<B>
pair; but none, of course, that bear that particular @xml:id, so they
can be found by XPath. It's just easy, not trivial. :-)

In general, there can be other nodes between <A> and <B>, and there
will be cases in which <B> precedes rather than follows the <A> it
points to. E.g.,

    blah blah blah
    <d><e>blah</e> blah
    <B target="#A1">blort</B>
    <f>monkey</f> shines
    <A xml:id="A1">snort</A>
    blah</d>

I want to be able to handle these cases, too.

For the foreseeable future, there will never be another <B> in between
a <B> and the <A> it points to, and each <B> will be a child of the
same element as the <A> it points to. (I.e., no overlap problems.) But
as soon as I say these complications will never happen, the very next
day the editors will gleeful send e-mail saying they have found such a
case. But for now, if needed, I'm willing to write code that presumes
it won't happen.

What I want for output is to be able to wrap the <B> with the <A> it
points to, *and everything in between* in a <C>.

    blah blah blah
    <d><e>blah</e> blah
    <C xml:id="A1Container">
      <B target="#A1">blort</B>
      <f>monkey</f> shines
      <A xml:id="A1">snort</A>
    </C>
    blah</d>

I am 90% confident I can write some messy XSLT 1.0 Muenchian grouping
code that does this. (Although I suspect it would take two passes,
one for <A> precedes <B>, another for <B> precedes <A>; but I don't
care about two passes at all, and would not even care if it took N
passes.[1]) But I am equally confident there is a much better
<xsl:for-each-group> method that, at the moment, I simply can't wrap
my head around.

In XSLT 2 and later you have <xsl:for-each-group select="node()" group-starting-with="B[@target]"> Furthermore using `id(substring(@target, 2)` would give you the A element so you can use the << operator or you can use a nested group-ending-with to identify the A and the items in between.

I have not understood what you want to do for input where the B follows
the A element.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.