[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Moving element up hierarchy unless text nodes

Subject: Re: Moving element up hierarchy unless text nodes
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 6 Apr 2015 19:07:05 -0000
Re:  Moving element up hierarchy unless text nodes
Dear James,

I am relieved it seems to have passed all the tests so far!

One thing that might shed light on the operation of this is the single
edge case for which I think its behavior would be ... interesting,
namely:

<div><lg><l><pb/></l></lg></div>

I hope and trust this never happens in your data.

Cheers, Wendell


On Mon, Apr 6, 2015 at 9:22 AM, James Cummings james@xxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I _finally_ had a chance to test and make sure I think I understand the
> clever solution Wendell came up with for moving <pb/> elements before or
> after nodes with no text content and/or whitespace-only nodes. I must
> apologise to him for delaying so long in doing so. Mea culpa.
>
> I've added some comments to the XSL to ensure I understood what was going
> on. Although I've never really been good with key()s the bits that confused
> me most were:
> ===
>     <!-- copy pb if it is both leading and trailing, thus stays put -->
>     <xsl:template match="pb">
>         <xsl:if test="(. is key('leading-pb',generate-id())) and
>             (. is key('trailing-pb',generate-id()))">
>             <xsl:copy-of select="."/>
>         </xsl:if>
>     </xsl:template>
> ===
> Where if I understand it, a <pb/> is only copied if its generate-id is equal
> to be leading-pb and trailng-pb key. (i.e. it is in the middle some elements
> with text, or a text node, or similar, so it stays where it is.)
>
> The other confusing bit for me was the test in the leading/trailing-pb mode
> matching any element but closer inspection I think means I understand it.
> (Though never would have thought of it...) This tests for trailing-pb mode
> that the result is empty for the follow-sibling nodes or text that isn't
> just whitespace.  Otherwise it generates an id.
> ===
>    <xsl:choose>
>             <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>                 following-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="trailing-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
> ===
>
> I think I understand all the individual bits to this but still have
> difficulty thinking through the whole thing.
>
> It does seem to work on all the tests I've tried.
>
> Thanks Wendell!
>
> -James
>
> =====full xslt===
>   <!-- comments, processing instructions, text nodes and attributes -->
>     <xsl:template match="comment() | processing-instruction() | text() |
> @*">
>         <xsl:copy-of select="."/>
>     </xsl:template>
>
>     <!-- copy elements separately so can move pb elements -->
>     <xsl:template match="*">
>         <!-- copy the pb only if no ancestor considers it leading or
> trailing -->
>         <xsl:copy-of select="key('leading-pb',generate-id())"/>
>         <!-- copy the element, attributes, and process nodes -->
>         <xsl:copy>
>             <xsl:apply-templates select="@* | node()"/>
>         </xsl:copy>
>         <xsl:copy-of select="key('trailing-pb',generate-id())"/>
>     </xsl:template>
>
>     <!-- copy pb if it is both leading and trailing, thus stays put -->
>     <xsl:template match="pb">
>         <xsl:if test="(. is key('leading-pb',generate-id())) and
>             (. is key('trailing-pb',generate-id()))">
>             <xsl:copy-of select="."/>
>         </xsl:if>
>     </xsl:template>
>
>     <!-- key for leading pb applying templates in leading-pb mode -->
>     <xsl:key name="leading-pb" match="pb">
>         <xsl:apply-templates select="." mode="leading-pb"/>
>     </xsl:key>
>     <!-- key for trailing pb applying templates in trailing-pb mode -->
>     <xsl:key name="trailing-pb" match="pb">
>         <xsl:apply-templates select="." mode="trailing-pb"/>
>     </xsl:key>
>
>     <!-- everything directly under body generate an id -->
>     <xsl:template match="body/*" mode="leading-pb trailing-pb">
>         <xsl:sequence select="generate-id()"/>
>     </xsl:template>
>
>     <!-- when the preceding-sibling is empty or not whitespace
> apply-templates in leading-pb to the parent -->
>     <xsl:template match="*" mode="leading-pb">
>         <xsl:choose>
>             <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
>                 preceding-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="leading-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
>
>     <!-- when the preceding-sibling is empty or not whitespace
> apply-templates in leading-pb to the parent -->
>     <xsl:template match="*" mode="trailing-pb">
>         <xsl:choose>
>             <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>                 following-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="trailing-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
>  =====
>
>
>
>
>
>
> On Wed, Mar 4, 2015 at 12:36 AM, James Cummings james@xxxxxxxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> Cool Wendell!
>>
>> I've not had a chance to test this out yet, I may have to come back to you
>> with some questions as I'm really not sure I understand that match pattern.
>> I'll have a play with it.
>>
>> Many thanks!
>>
>> -James
>>
>> On Tue, Mar 3, 2015 at 7:48 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi again James,
>>>
>>> So in the code I posted yesterday I realized at least one more
>>> interesting improvement is possible.
>>>
>>> Instead of
>>>
>>> <xsl:template match="pb">
>>>   <!-- Only copy the pb if no ancestor considers it 'leading' or
>>> 'trailing'. -->
>>>   <xsl:if test="empty(ancestor::*/
>>>         (key('leading-pb',generate-id()) |
>>>          key('trailing-pb',generate-id())) intersect . )  ">
>>>     <xsl:copy-of select="."/>
>>>   </xsl:if>
>>> </xsl:template>
>>>
>>> We could have more directly and efficiently
>>>
>>>   <xsl:template match="pb">
>>>     <xsl:if test="(. is key('leading-pb',generate-id())) and
>>>             (. is key('trailing-pb',generate-id()))">
>>>       <xsl:copy-of select="."/>
>>>     </xsl:if>
>>>   </xsl:template>
>>>
>>>
>>> Or even (if you are crazy for match patterns, and who isn't)
>>>
>>> <xsl:template match="pb[empty(key('leading-pb',generate-id())) or
>>>       empty(key('trailing-pb',generate-id()))]"/>
>>>
>>> These work because the keys bind pb elements to themselves when they
>>> are not 'leading' or 'trailing' (i.e. correctly outside not inside
>>> their parent).
>>>
>>> Cheers, Wendell
>>>
>>> On Mon, Mar 2, 2015 at 2:11 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
>>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> > Hi James,
>>> >
>>> > So, try this. It works by assigning 'pb' elements to ancestors that
>>> > consider them 'leading' (start the element off) or 'trailing'. They
>>> > can be retrieved from (for) said ancestor using a key.
>>> >
>>> > Lightly tested.
>>> >
>>> > <xsl:template match="comment() | processing-instruction() | text() |
>>> > @*">
>>> >   <xsl:copy-of select="."/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*">
>>> >   <xsl:copy-of select="key('leading-pb',generate-id())"/>
>>> >   <xsl:copy>
>>> >     <xsl:apply-templates select="@* | node()"/>
>>> >   </xsl:copy>
>>> >   <xsl:copy-of select="key('trailing-pb',generate-id())"/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="pb">
>>> >   <!-- Only copy the pb if no ancestor considers it 'leading' or
>>> > 'trailing'. -->
>>> >   <xsl:if test="empty(
>>> >     ancestor::*/(key('leading-pb',generate-id()) |
>>> > key('trailing-pb',generate-id())) intersect . )  ">
>>> >     <xsl:copy-of select="."/>
>>> >   </xsl:if>
>>> > </xsl:template>
>>> >
>>> > <xsl:key name="leading-pb" match="pb">
>>> >   <xsl:apply-templates select="." mode="leading-pb"/>
>>> > </xsl:key>
>>> >
>>> > <xsl:key name="trailing-pb" match="pb">
>>> >   <xsl:apply-templates select="." mode="trailing-pb"/>
>>> > </xsl:key>
>>> >
>>> > <xsl:template match="body/*" mode="leading-pb trailing-pb">
>>> >   <xsl:sequence select="generate-id()"/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*" mode="leading-pb">
>>> >   <xsl:choose>
>>> >     <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
>>> > preceding-sibling::text()[matches(.,'\S')])">
>>> >       <xsl:apply-templates select=".." mode="leading-pb"/>
>>> >     </xsl:when>
>>> >     <xsl:otherwise>
>>> >       <xsl:sequence select="generate-id()"/>
>>> >     </xsl:otherwise>
>>> >   </xsl:choose>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*" mode="trailing-pb">
>>> >   <xsl:choose>
>>> >     <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>>> > following-sibling::text()[matches(.,'\S')])">
>>> >       <xsl:apply-templates select=".." mode="trailing-pb"/>
>>> >     </xsl:when>
>>> >     <xsl:otherwise>
>>> >       <xsl:sequence select="generate-id()"/>
>>> >     </xsl:otherwise>
>>> >   </xsl:choose>
>>> > </xsl:template>
>>> >
>>> > Feel free to ask for any explanation needed. It *seems* to work
>>> > (although I often do not trust my lying eyes) ... :-)
>>> >
>>> > Cheers, Wendell
>>> >
>>> > On Fri, Feb 27, 2015 at 6:51 PM, James Cummings
>>> > james@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> Hi there.
>>> >>
>>> >> We've been looking at canonicalising use of <pb/> in a large
>>> >> collection of
>>> >> TEI P5 XML texts. What we want to do is move this up the hierarchy
>>> >> unless
>>> >> there is text before or after it only stopping when there is a sibling
>>> >> element with textual content or when it hits the body/back/front
>>> >> elements.
>>> >> i.e. someone might have encoded:
>>> >>
>>> >>
>>> >> ====input====
>>> >> <body>
>>> >>     <div>
>>> >>         <lg>
>>> >>             <l><pb n="1"/> some text here</l>
>>> >>             <l>some text here <pb n="2"/></l>
>>> >>         </lg>
>>> >>         <lg>
>>> >>             <l>some text <pb n="3"/> some text</l>
>>> >>             <anchor xml:id="test"/>
>>> >>             <l><pb n="4"/>some text here</l>
>>> >>             <l>some text here <pb n="5"/></l>
>>> >>             <anchor xml:id="test2"/>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <div>
>>> >>         <head>Some Text</head>
>>> >>         <lg>
>>> >>             <!-- A comment here -->
>>> >>             <l><pb n="6"/>Some text</l>
>>> >>             <l>Some text<pb n="7"/></l>
>>> >>         </lg>
>>> >>     </div>
>>> >> </body>
>>> >> =====
>>> >>
>>> >> And what we'd want to end up with is:
>>> >>
>>> >> =====
>>> >> <body>
>>> >>     <pb n="1"/>
>>> >>     <div>
>>> >>         <lg>
>>> >>             <l> some text here</l>
>>> >>             <l>some text here </l>
>>> >>         </lg>
>>> >>         <pb n="2"/>
>>> >>         <lg>
>>> >>             <l>some text <pb n="3"/> some text</l>
>>> >>             <pb n="4"/>
>>> >>             <anchor xml:id="test"/>
>>> >>             <l>some text here</l>
>>> >>             <l>some text here </l>
>>> >>             <anchor xml:id="test2"/>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <pb n="5"/>
>>> >>     <div>
>>> >>         <head>Some Text</head>
>>> >>         <pb n="6"/>
>>> >>         <lg>
>>> >>             <!-- A comment here -->
>>> >>             <l>Some text</l>
>>> >>             <l>Some text</l>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <pb n="7"/>
>>> >> </body>
>>> >> =====
>>> >>
>>> >> So as the <pb/> has text before/after it, it stays where it is. It
>>> >> should
>>> >> move to the level in the hierarchy where its
>>> >> preceding-sibling::node()[1]
>>> >> has text, passing over other empty elements or comments.  (Of course,
>>> >> as you
>>> >> might expect) the markup could be any element names, I just use
>>> >> div/lg/l
>>> >> here because it is short and nicely hierarchicial as an example. My
>>> >> approach
>>> >> so far has been, on every element to try to test if there is text()
>>> >> between
>>> >> where I currently am and the following::pb[1] by selecting everything
>>> >> between the start and the pb and looking at its normalised
>>> >> string-length.
>>> >> But so far these tests aren't working right, and I haven't even got my
>>> >> head
>>> >> round how to do it in reverse for <pb/> at the end.
>>> >>
>>> >> Has anyone done something like this before that I could look at? Any
>>> >> suggestions?
>>> >>
>>> >> Thanks for any help!
>>> >>
>>> >> -James Cummings
>>> >> XSL-List info and archive
>>> >> EasyUnsubscribe (by email)
>>> >
>>> >
>>> >
>>> > --
>>> > Wendell Piez | http://www.wendellpiez.com
>>> > XML | XSLT | electronic publishing
>>> > Eat Your Vegetables
>>> > _____oo_________o_o___ooooo____ooooooo_^
>>> >
>>>
>>>
>>>
>>> --
>>> Wendell Piez | http://www.wendellpiez.com
>>> XML | XSLT | electronic publishing
>>> Eat Your Vegetables
>>> _____oo_________o_o___ooooo____ooooooo_^
>>>
>>
>> XSL-List info and archive
>> EasyUnsubscribe (by email)
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)



-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.