[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: segmenting a paragraph

Subject: Re: segmenting a paragraph
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 02 Oct 2007 10:34:59 +0200
Re:  segmenting a paragraph
At 2007-10-02 17:05 +0900, Christian Wittern wrote:
In trying to solve the following problem I am seeking your help:
I want to segment paragraphs in a text, so that sentences are enclosed in a <s> element and within the sentences, words between interpunction are within <seg> elements.


So far, I have been capturing the content of <p> in a string and then using two nested <xsl:analyze-string> blocks with regexes, which work nicely and do what I want. Now I discovered that there are <note> elements with additional markup in some paragraphs, which get lost in this process. However, I really want to leave these notes alone, as they are. So:

<p>Some text. Some more text, with a comma. <note>This stuff, how boring</note></p>

should look like:

<p><s><seg>Some text.</seg></s><s><seg>Some more text,</seg><seg> with a comma.</seg></s><note>This stuff, how boring</note></p>

I wonder how I tell the processor to leave the note stuff alone?

From your comment "capturing the content in a string and then..." I'm assuming you have something like:


  <xsl:template match="p">
    <xsl:analyze-string select="." .....
  </xsl:template>

If you break this into pieces you can work on each text bit in turn:

  <xsl:template match="p">
    <xsl:apply-templates mode="in-p" select="node()"/>
  </xsl:template>
  <xsl:template mode="in-p" match="*">
    <xsl:apply-templates select="."/> <!--reapply in the default mode-->
  </xsl:template>
  <xsl:template mode="in-p" match="text()">
    <xsl:analyze-string select="." .....


I hope this helps.


. . . . . . . . . . . . Ken

--
Upcoming public training: UBL and code lists Oct 1/5; Madrid Spain
World-wide corporate, govt. & user group XML, XSL and UBL training
RSS feeds:     publicly-available developer resources and training
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Jul'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.