[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Splitting a paragraph into sentences and keep mark
Hi David, Yes, there shouldn't be any cross-paragraph elements. Rick -----Original Message----- From: David Carlisle d.p.carlisle@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Sunday, November 24, 2019 9:33 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: Splitting a paragraph into sentences and keep markup can we assume the easy case (as in your example) where all the sentences end at the top level? a more challenging example is <root> <p>This has one <span class="zzz">sentence? Actually, it has <emphasis>two</emphasis>. No,</span> it has three.</p> </root> as then you need to force-close any open elements at the sentence end and re-open them in the new sentence so something like <p>This has one <span class="zzz">sentence?</span></p> <p><span class="zzz">Actually, it has <emphasis>two</emphasis>.</span></p> <p><span class="zzz">No,</span> it has three.</p> David On Sun, 24 Nov 2019 at 13:34, Rick Quatro rick@xxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi All, > > > > I have a situation where I want to split a short paragraph into sentences and use them in different parts of my output. I am using <xsl:analyze-string> because I want to account for a sentence ending with a . or ?. This will work except if there are any children of the paragaph, like the <emphasis> in the second sentence. Can I split a paragraph into sentences and still keep the markup? > > > > Here is my input document: > > > > <?xml version="1.0" encoding="UTF-8"?> > > <root> > > <p>This has one sentence? Actually, it has > <emphasis>two</emphasis>. No, it has three.</p> > > </root> > > > > My stylesheet: > > > > <?xml version="1.0" encoding="UTF-8"?> > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > > xmlns:xs="http://www.w3.org/2001/XMLSchema" > > xmlns:rq="http://www.frameexpert.com" > > exclude-result-prefixes="xs rq" > > version="2.0"> > > > > <xsl:output indent="yes"/> > > <xsl:strip-space elements="root"/> > > > > <xsl:template match="/root"> > > <xsl:copy> > > <xsl:apply-templates/> > > </xsl:copy> > > </xsl:template> > > > > <xsl:template match="p"> > > <xsl:variable name="sentences" > select="rq:splitParagraphIntoSentences(.)"/> > > <p><xsl:value-of select="$sentences[1]"/></p> > > <note>Something in between.</note> > > <p><xsl:value-of select="$sentences[position()>1]"/></p> > > </xsl:template> > > > > <xsl:function name="rq:splitParagraphIntoSentences"> > > <xsl:param name="paragraph"/> > > <xsl:analyze-string select="$paragraph" > regex=".+?[\.\?](\s+|$)"> > > <xsl:matching-substring> > > <sentence><xsl:value-of > select="replace(.,'\s+$','')"/></sentence> > > </xsl:matching-substring> > > </xsl:analyze-string> > > </xsl:function> > > </xsl:stylesheet> > > > > My output: > > > > <?xml version="1.0" encoding="UTF-8"?> > > <root> > > <p>This has one sentence?</p> > > <note>Something in between.</note> > > <p>Actually, it has two. No, it has three.</p> > > </root> > > > > What I want is this: > > > > <?xml version="1.0" encoding="UTF-8"?> > > <root> > > <p>This has one sentence? </p> > > <note>Something in between.</note> > > <p>Actually, it has <emphasis>two</emphasis>. No, it has three. > </p> > > </root> > > > > Any suggestions will be appreciated. > > > > Rick > > XSL-List info and archive > EasyUnsubscribe (by email)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|