[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: detect sentence surrounding a tag

Subject: Re: detect sentence surrounding a tag
From: "Terry Badger terry_badger@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 27 Jul 2016 13:19:42 -0000
Re:  detect sentence surrounding a tag
Dorothy,
This will do it and you can clean out the start and end tags of the text. <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    <!-- turn text nodes into elements with and without sentence endings and save in a variable-->
    <xsl:template match="root">
        <xsl:variable name="stage-1">
            <xsl:copy>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:variable>
        <!-- see variable -->
        <xsl:result-document href="output-01.xml">
            <xsl:copy-of select="$stage-1"/>
        </xsl:result-document>
        <!-- create final output with a grouping by start text - this assumes B is embedded not at start or end -->
        <xsl:result-document href="output-02.xml">
            <root>
                <xsl:for-each-group select="$stage-1/root/node()" group-starting-with="start">
                    <sentence>
                        <xsl:copy-of select="current-group()"/>
                    </sentence>
                </xsl:for-each-group>
            </root>
        </xsl:result-document>
    </xsl:template>
    <!-- pass through B -->
    <xsl:template match="B">
        <xsl:copy-of select="."/>
    </xsl:template>
    <!-- determin what kind of text with regex -->
    <xsl:template match="text()">
<!-- assumes a space follows each end of sentence marker -->
        <xsl:analyze-string select="." regex="(.*)(\. |\? )">
            <xsl:matching-substring>
                <end>
                    <xsl:copy-of select="."/>
                </end>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <start>
                    <xsl:copy-of select="."/>
                </start>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

Terry


On Tuesday, July 26, 2016 4:37 PM, "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:



I don't think there's a "reliable" way to recognize sentences in English text, but let's not go there... Not today. 

Generally I think there are two approaches:

(a) convert the markup (start and end of B) to text delimiters and then use regular expressions.

(b) convert the text delimiters (full stops and other punctuation) to markup (empty milestone tags?) and then use XSLT positional grouping or sibling recursion.

Neither is easy enough for me to attempt without a spare half-an-hour to devote to it.

Michael Kay
Saxonica 


On 26 Jul 2016, at 21:21, Dorothy Hoskins dorothy.hoskins@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>HI, in the case of the element A containing multiple sentences (assuming "." as end of sentence punctuation), is there a reliable way to find the sentence that surrounds the child element B wherever it occurs in A?
>
>I think that the solution (regex?) will have to look backwards from the start tag of B and past the end tag of A to the nearest "."
>
>I recognize that if there is some abbreviation or decimal number in the sentence that will be interpreted as the end of sentence. That's OK as a limitation.
>
>Thanks for your help.
>- Dorothy
>
>XSL-List info and archive 
>EasyUnsubscribe (by email) 

XSL-List info and archive 
EasyUnsubscribe (by email) 

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.