[Home] [By Thread] [By Date] [Recent Entries]
I'm trying to post-process the HTML produced via Adobe Acrobat's PDF
export. (Actually, XHTML via Tidy from Acrobat's HTML 4.01.) Acrobat
does something very funky with end-of-line hyphens that it deems "soft",
namely wrapping the preceding and following text nodes inside a styled
<span> and removing the hyphen. To simplify the situation, if the input
text was
The volumes of the Docu- mentary History of the Rati- fication of the Consitution are heavy. the output would be something like <p>The volumes of the <i>Docu</i><i>mentary History of the Rati</i><i>cation of the Constitution</i> are heavy.</p> Now there are various reasons why it would be nice to transform these constructs so that all consecutive <i> elements are wrapped in a single element. I've come up with the following XSLT 2.0 templates that rely on the '>>' operator to group consecutive sibling <i>'s for processing. It works on some sample data, but it is a risky transform because if the logic is not perfect, there could be dropped <i>'s. Can anyone see a potential case where this would fail? <xsl:template match="i">
<xsl:choose>
<xsl:when test="preceding-sibling::node()[1][self::i]">
<!-- omit, the next when-clause handles me -->
</xsl:when>
<xsl:when test="following-sibling::node()[1][self::i]">
<xsl:variable name="stopNode"
select="following-sibling::node()[not(self::i)][1]"/>
<xsl:copy>
<xsl:apply-templates/>
<xsl:apply-templates
select="following-sibling::i[not(. >> $stopNode)]"
mode="copy"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy><xsl:apply-templates/></xsl:copy>
</xsl:otherwise>
</xsl:choose> </xsl:template>
<xsl:template match="i" mode="copy">
<xsl:apply-templates/>
</xsl:template>DS -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 801079, Charlottesville, VA 22904-4318 USA Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903 Email: dsewell@xxxxxxxxxxxx Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/
|

Cart



