|
[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
RE: segmenting a paragraph
Subject: RE: segmenting a paragraph
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Oct 2007 10:18:28 -0400
|
Christian,
At 04:36 AM 10/2/2007, you wrote:
When you need to apply regex matching to text that crosses node boundaries,
in the past two approaches have been proposed:
(a) create a string in which the node boundaries are represented by some
recognizable textual markup (you could use saxon:serialize()), then apply
the regex processing, then reinstate the node structure (e.g. by using
saxon:parse()).
(b) do a deep copy, while processing each of the text nodes to replace the
significant features (such as end of sentence) by nodes (e.g. an
<end-of-sentence/> element). Then apply positional grouping techniques to
transform this into your target structure.
Neither is particularly easy, I'm afraid.
This is because (yay) this requirement introduces an overlap problem.
Indicators (in this case, punctuation) within text content are being
taken to be structural features, which may overlap with other
structures already in place.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

|
Back To School Sale!
Save 30% off all Stylus Studio 2008 Products when you purchase from our Online Shop.
Offer ends August 31, 2008.
Coupon Code TRTY-C4JV-OFF
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
| RSS 2.0 |
|
| Atom 0.3 |
|
|