[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: breaking up XML on page break element

Subject: Re: breaking up XML on page break element
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 7 Jul 2014 17:55:32 -0000
Re:  breaking up XML on page break element
Uhoh I wasn't reading ...

... compare my solution here:

https://github.com/wendellpiez/MITH_XSLT/blob/master/xslt/p-promote.xsl

plus there's an older version here:

http://piez.org/wendell/projects/Interedition2011/lib/p5o-browser-html.xsl

In Luminescent (my "hobby" LMNL processing framework) there's a fair
amount of this stuff (reducing and promoting hierarchies). The fact
that we can generalize methods to do this in XSLT 2.0 is fantastic.
:-)

Cheers, Wendell


On Mon, Jul 7, 2014 at 4:53 AM, Geert Bormans
geert@xxxxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
> Hi Gerrit,
>
> First my congratulations to the German team
> (I admit they should have scored an extra goal...
> given I made a bet at the office for 2-0, that would have brought me some
> cash :-)
>
> Thanks very much for this solution.
> It is exactly what I was looking for.
> It seems robust and elegant, and I love patterns with a name ;-)
>
> Thanks a ton
>
> Geert
>
>
> At 20:20 4/07/2014, you wrote:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>>   version="2.0">
>>
>>   <xsl:output indent="yes"/>
>>
>>   <xsl:template match="* | @*" mode="#default">
>>     <xsl:copy>
>>       <xsl:apply-templates select="@*, node()" mode="#current"/>
>>     </xsl:copy>
>>   </xsl:template>
>>
>>   <xsl:template match="book" mode="#default">
>>     <xsl:variable name="context" select="." as="element(book)" />
>>     <xsl:copy>
>>       <xsl:for-each-group select="descendant::node()[not(node())]"
>> group-starting-with="pb">
>>         <xsl:copy-of select="self::pb"/>
>>         <xsl:apply-templates select="$context/*" mode="split">
>>           <xsl:with-param name="restricted-to"
>> select="current-group()/ancestor-or-self::node()" tunnel="yes"/>
>>         </xsl:apply-templates>
>>       </xsl:for-each-group>
>>     </xsl:copy>
>>   </xsl:template>
>>
>>   <xsl:template match="node()" mode="split">
>>     <xsl:param name="restricted-to" as="node()+" tunnel="yes" />
>>     <xsl:if test="exists(. intersect $restricted-to)">
>>       <xsl:copy>
>>         <xsl:copy-of select="@*" />
>>         <xsl:apply-templates mode="#current" />
>>       </xsl:copy>
>>     </xsl:if>
>>   </xsl:template>
>>
>>   <xsl:template match="pb" mode="split"/>
>>
>> </xsl:stylesheet>
>>
>> On 04.07.2014 18:31, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>>>
>>> Thanks Gerrit,
>>> (I admit I need to read this twice to get it, but that might be caused
>>> by the 0-1 and me not trying to miss all of the fun in Rio)
>>> I will look into it after the match
>>>
>>>
>>> At 17:18 4/07/2014, you wrote:
>>>>
>>>> I tackle it by what I call CB"b,Eupward projectionCB"b,C :
>>
>> :
>>>>
>>>>
>>>> When processing the top-level element, do a for-each-group of all
>>>> descendants that are terminal nodes (those without children), with a
>>>> group-starting-with at the splitting points.
>>>>
>>>> For each group, process the book (or the HTML body, or whatever common
>>>> ancestor there is) once in another mode, with a tunneled parameter
>>>> 'restricted-to' that contains, for each group, the terminal nodes and
>>>> their ancestors.
>>>>
>>>> When processing each group, for each node that you encounter, test
>>>> whether the node is contained in the tunneled variable (using
>>>> intersect). If it is, reproduce the node and continue in this mode, if
>>>> it isnCB"b,b"t contained, do nothing.
>>
>> .
>>>>
>>>>
>>>> There may be an option to discard or to reproduce the splitting
>>>> elements.
>>>>
>>>> Examples for this technique are in
>>>> https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes
>>>> hub:split-at-tab and hub:split-at-br
>>>>
>>>> They are a bit more complex than your case because they split
>>>> paragraphs that may contain tables or footnotes that in turn can
>>>> contain other paragraphs. I introduced the function
>>>> hub:same-scope($splitting-element, $containing-element) to split only
>>>> at splitting elements that are contained within the paragraph that
>>>> should be split, rather than in a paragraph that is contained in a
>>>> footnote or table cell that is somehow contained in the given paragraph.
>>>>
>>>> I might prepare a synthetic standalone example if anyone is
>>>> interested, and furthermore on the condition that interested parties
>>>> root for Germany instead of France today.
>>>>
>>>> Gerrit
>>>>
>>>> On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Here is a fun one I thought I could share
>>>>>
>>>>> I have a nicely nested XML (a bit TEI like)
>>>>> and markers for page breaks can happen everywhere in the document (as
>>>>> empty elements)
>>>>>
>>>>> Now I want to break the document per page, reconstructing the structure
>>>>> So in a first step, I want to isolate the pagebreak to the highest
>>>>> level
>>>>>
>>>>> <book>
>>>>> <title>...</title>
>>>>> <section>
>>>>> <para>aaa<pb/>bbb</para>
>>>>> </section>
>>>>> </book>
>>>>>
>>>>> to become
>>>>>
>>>>> <book>
>>>>> <title>...</title>
>>>>> <section>
>>>>> <para>aaa</para>
>>>>> </section>
>>>>> <pb/>
>>>>> <section>
>>>>> <para>bbb</para>
>>>>> </section>
>>>>> </book>
>>>>>
>>>>> Bearing in mind I need a generic solution
>>>>> and pagebreaks can happen at every level
>>>>>
>>>>> Any thoughts?
>>>>> I am not looking for code, just curious on how people would attack this
>>>>>
>>>>> Thanks
>>>>>
>>>>> Geert
>>>>
>>>>
>>>> --
>>>> Gerrit Imsieke
>>>> GeschCFCB$ftsfCFCB<hrer / Managing Director
>>>>
>>>> le-tex publishing services GmbH
>>>> Weissenfelser Str. 84, 04229 Leipzig, Germany
>>>> Phone +49 341 355356 110, Fax +49 341 355356 510
>>>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>>>>
>>>> Registergericht / Commercial Register: Amtsgericht Leipzig
>>>> Registernummer / Registration Number: HRB 24930
>>>>
>>>> GeschCFCB$ftsfCFCB<hrer: Gerrit Imsieke, Svea Jelonek,
>>>> Thomas Schmidt, Dr. Reinhard VCFCB6ckler
>>
>>
>> --
>> Gerrit Imsieke
>> GeschCB$ftsfCB<hrer / Managing Director
>> le-tex publishing services GmbH
>> Weissenfelser Str. 84, 04229 Leipzig, Germany
>> Phone +49 341 355356 110, Fax +49 341 355356 510
>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>>
>> Registergericht / Commercial Register: Amtsgericht Leipzig
>> Registernummer / Registration Number: HRB 24930
>>
>> GeschCB$ftsfCB<hrer: Gerrit Imsieke, Svea Jelonek,
>> Thomas Schmidt, Dr. Reinhard VCB6ckler
>>
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.