[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

segmenting a paragraph

Subject: segmenting a paragraph
From: Christian Wittern <cwittern@xxxxxxxxx>
Date: Tue, 02 Oct 2007 17:05:07 +0900
 segmenting a paragraph
Dear XSL-list readers,

In trying to solve the following problem I am seeking your help:
I want to segment paragraphs in a text, so that sentences are enclosed in a <s> element and within the sentences, words between interpunction are within <seg> elements.


So far, I have been capturing the content of <p> in a string and then using two nested <xsl:analyze-string> blocks with regexes, which work nicely and do what I want. Now I discovered that there are <note> elements with additional markup in some paragraphs, which get lost in this process. However, I really want to leave these notes alone, as they are. So:

<p>Some text. Some more text, with a comma. <note>This stuff, how boring</note></p>

should look like:

<p><s><seg>Some text.</seg></s><s><seg>Some more text,</seg><seg> with a comma.</seg></s><note>This stuff, how boring</note></p>

I wonder how I tell the processor to leave the note stuff alone?

Any help appreciated,

Christian

--
 Christian Wittern
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.