[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

regex, shortest match

Subject: regex, shortest match
From: Dave Pawson <davep@xxxxxxxxxxxxx>
Date: Fri, 01 Aug 2008 08:14:37 +0100
 regex
I'm looking to parse sentences out of paras.

Input

<para>It is sometimes desired to have a specific heading which should not be numbered. This corresponds to unnumbered list headers in lists (see sections 4.3). To facilitate this, an optional attribute text:is-list-header can be used. If true, the given header will not be numbered, even if an explicit list-style is given. </para>
<para>A text:style-name attribute references a paragraph style, while a text:cond-style-name attribute references a conditional-style, that is, a style that contains conditions and maps to other styles (see section 14.1.1). If a conditional style is applied to a paragraph, the text:style-name attribute contains the name of the style that was the result of the conditional style evaluation, while the conditional style name itself is the value of the text:cond-style-name attribute. This XML structure simplifies [XSLT] transformations because XSLT only has to acknowledge the conditional style if the formatting attributes are relevant. The referenced style can be a common style or an automatic style.</para>
<para>A text:class-names attribute takes a whitespace separated list of paragraph style names. The referenced styles are applied in the order they are contained in the list. If both, text:style-name and text:class-names are present, the style referenced by the text:style-name attribute is as the first style in the list in text:class-names. If a conditional style is specified together with a style:class-names attribute, but without the text:style-name attribute, then the first style in the style list is used as the value of the missing text:style-name attribute. </para>
<para>A page sequence element &lt;text:page-sequence> specifies a sequence of master pages that are instantiated in exactly the same order as they are referenced in the page sequence. If a text document contains a page sequence, it will consist of exactly as many pages as specified. Documents with page sequences do not have a main text flow consisting of headings and paragraphs as is the case for documents that do not contain a page sequence. Text content is included within text boxes for documents with page sequences. The only other content that is permitted are drawing objects. </para>


This 'works', but hits the longest match. I can't come up with
a regex that has a sufficiently broad range, yet matches on the shortest
match.

Any suggestions please.

TIA DaveP


<xsl:template match="para"> <para> <xsl:variable name='contents' select="normalize-space(.)"/> <xsl:copy-of select="dp:sentence($contents)"/> </para> </xsl:template>

<!-- Isolate sentences within para's -->
<xsl:function name="dp:sentence">
  <xsl:param name="nd" as='xs:string'/>
  <xsl:analyze-string regex="((.+).) |$ " select="$nd">
    <xsl:matching-substring>
          <s>
            <xsl:value-of select="regex-group(1)"/>
          </s>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
          <p2><xsl:value-of select="."/></p2>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:function>


regards


--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.