Subject: RE: Another tokenize() question
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Tue, 10 Aug 2004 19:08:00 +0100
|
> Ok. This *basically* works, but with a line like:
>
> <l>Why ha<supplied>l</supplied>dest þu were agaynes me</l>
>
> it turns it into:
>
> <l><w>Why</w> <w>ha</w><supplied>l</supplied><w>dest</w>
> <w>þu</w>
> <w>were</w> <w>agaynes</w> <w>me</w></l>
>
> or if I change it to l//text()
>
> <l><w>Why</w> <w>ha</w><supplied><w>l</w></supplied><w>dest</w>
> <w>þu</w> <w>were</w> <w>agaynes</w> <w>me</w></l>
>
> When really:
>
> <l><w>Why</w> <w>ha<supplied>l</supplied>dest</w> <w>þu</w>
> <w>were</w> <w>agaynes</w> <w>me</w></l>
>
> is what is wanted.
Presumably you have confidence that if an element starts in the middle of a
word, then it ends within the same word? Otherwise you have an interleaving
problem.
You could start by replacing all the spaces with <sp/> elements, and then
process the structure along the lines:
<xsl:template match="*">
<xsl:for-each-group select="child::node()" group-starting-with="sp">
<xsl:choose>
<xsl:when test="self::sp">
<w><xsl:apply-templates select="current-group() except ."/></w>
</
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</
</
</xsl:for-each-group>
</xsl:template>
Michael Kay
|