[Home] [By Thread] [By Date] [Recent Entries]
David Carlisle wrote:
Highly likely.I'm looking to parse sentences out of paras. You need to define a sentence. I tried with the worst examples in the source text. So perhaps a sentence is terminated by . followed by end of string or whitespace but this would of course still fail if the sentence were to contain ". " coming from "D. P. Carlisle" or "dr. " or ... <para>Sentance containing Dr. Michael Kay and D.P. Carlisle</para> <grin/> I'd expect that to break most regexen :-) <xsl:template match="para">
<para>
<xsl:analyze-string select="." regex="([^.]|\.[^ \n\r\t])*\.(\s+|$)">
<xsl:matching-substring>
<s> <xsl:value-of select="normalize-space(.)"/></s>
</xsl:matching-substring>
<xsl:non-matching-substring>
<error> <xsl:value-of select="normalize-space(.)"/> </error>
</xsl:non-matching-substring>
</xsl:analyze-string>
</para>
</xsl:template>Thanks David. That's better than my improvement. No 'error' elements in 12000 lines. Much appreciated. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk
|

Cart



