[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: segmenting a paragraph

Subject: RE: segmenting a paragraph
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Oct 2007 10:18:28 -0400
RE:  segmenting a paragraph
Christian,

At 04:36 AM 10/2/2007, you wrote:
When you need to apply regex matching to text that crosses node boundaries,
in the past two approaches have been proposed:

(a) create a string in which the node boundaries are represented by some
recognizable textual markup (you could use saxon:serialize()), then apply
the regex processing, then reinstate the node structure (e.g. by using
saxon:parse()).

(b) do a deep copy, while processing each of the text nodes to replace the
significant features (such as end of sentence) by nodes (e.g. an
<end-of-sentence/> element). Then apply positional grouping techniques to
transform this into your target structure.

Neither is particularly easy, I'm afraid.

This is because (yay) this requirement introduces an overlap problem. Indicators (in this case, punctuation) within text content are being taken to be structural features, which may overlap with other structures already in place.


Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread

Back To School Sale!

Save 30% off all Stylus Studio 2008 Products when you purchase from our Online Shop.

Offer ends August 31, 2008.

Coupon Code
TRTY-C4JV-OFF

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.