|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Truncating output of a node
Hi Jim,
At 10:35 AM 4/19/01, Mike wrote: > I am trying to output the first n sentences of a node. I have > tried using > for-each with a conditional to stop output but have had no luck. > > Given the following XML fragment, what is the best way to > output only the > first n sentences? Note that the node has both text and child nodes. > Write a recursive template that takes the text and n as parameters; in this template, if n>0, output the first sentence (using substring-before), then make a recursive call on the the same template, passing the remaining text (using substring-after) and n-1 as the parameters. This will work assuming you have identified some dependable way to delimit sentences in your data. You might assume that the presence of a character "." will indicate the end of a sentence. This is fine ... but what about sentences that happen to contain the string "...", or that end with a question mark? (Or what about sentences that appear with other kinds of punctuation?!) Identifying what is actually a "sentence" is actually a difficult question in text processing, not easily tractable, which is why applications that require processing based on sentences will be much easier if you have markup embedded that tells you what's a sentence, and what's not. Your problem would be fairly trivial in XSLT if your input were something like: <summary> <s>It is best to start a new <span class="highlight">message</span> for a new thread.</s> <s>Do not start a new thread by replying to an unrelated <span class="highlight">message</span> and just changing the subject line, since the header of your <span class="highlight">message</span> will contain references to the previous <span class="highlight">message</span> and your new <span class="highlight">message</span> will appear in the archive as one of the replies to the original <span class="highlight">message</span>.</s> </summary> If you don't have the option of changing the way your input is structured, Mike's solution of processing text content recursively is the only option -- and might be "good enough for government work" (as is sometimes said). But the presence of element nodes in mixed content (such as your embedded <span> elements) makes this much harder, unless you can just throw them away. In theory I suppose it could be done, but the code is going to be pretty ugly, especially if you allow for the possibility that a "sentence" could end *inside* one of the <span> children.... Any intrepid XSLT coders want to tackle that? Good luck, Wendell ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








