[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: xsl:analyze-string problem

Subject: RE: xsl:analyze-string problem
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 8 Feb 2007 17:00:55 -0000
RE:  xsl:analyze-string problem
I would tackle this as follows:

Step 1: classify the element. Use xsl:choose and matches() to decide which
of the four categories it belongs to, and copy the element adding an
attribute to indicate the category.

Step 2: do the grouping (concatenation of adjacent elements according to
your rule C). Probably using xsl:for-each-group group-adjacent, but I'm not
entirely clear of the criteria.

Step 3: use analyze-string on the contents of the grouped elements to insert
<ordinal> and <text> element children.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Yves Forkl [mailto:Y.Forkl@xxxxxx] 
> Sent: 08 February 2007 16:48
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  xsl:analyze-string problem
> 
> Hi XSLT 2.0 wizards,
> 
> while the syntax and semantics of xsl:analyze-string have 
> become clear to me, I am now in search of an idiom implying 
> it which it could help me solve this problem. (Or maybe of an 
> alternative...)
> 
> In the input I find elements like these:
> 
> 1) <e> def ghi</e>
> 2) <e> abc 22 def 3 ghi 1. </e>
> 3) <e> 2. </e>
> 4) <e> 3. def 35 78 ghi </e>
> 
> The possible contents fit into exactly 4 classes:
> 
> 1) just some words and/or numbers
> 2) like 1), but followed by a number and a period
> 3) just a number and a period
> 4) like 3), but followed by some words and/or numbers
> 
> In each case, spaces may or may not appear at beginning and 
> end of the content and must be preserved (no matter to which 
> group they get attached).
> 
> The problem consists of replacing the original "e" element by 
> creating new elements according to these rules:
> 
> A) A number followed by a period goes into a "ordinal" element.
> B) Words and numbers go into a "text" element.
> C) In cases 1) and 4), where words and numbers appear at the 
> end, the content of the current "e" element must be 
> concatenated with all adjacent "e" elements of type 1) and 2) 
> before putting it all into the "text" element. By contrast, 
> in cases 2) and 3) which are ended by a number and a period 
> the contents of the following "e" instance should never be appended.
> 
> My approach is to use the following templates:
> 
> <xsl:template match="e">
> 
>    <xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">
> 
>        <xsl:for-each select="regex-group(1)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" select="'text'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>        <xsl:for-each select="regex-group(2)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" 
> select="'ordinal'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>        <xsl:for-each select="regex-group(3)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" select="'text'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>      </xsl:matching-substring>
> 
>    </xsl:analyze-string>
> 
>    <xsl:apply-templates select="following-sibling::e[1]"/>
> 
> </xsl:template>
> 
> 
> <!-- helper template for squeezing spaces out into mixed 
> content --> <xsl:template name="create_element_and_space">
>    <xsl:param name="new_element_name"/>
> 
>    <xsl:analyze-string select="." regex="^\s+|\s+$">
> 
>      <xsl:matching-substring>
>        <xsl:value-of select="."/>
>      </xsl:matching-substring>
> 
>      <xsl:non-matching-substring>
>        <xsl:element name="{$new_element_name}">
>          <xsl:value-of select="."/>
>        </xsl:element>
>      </xsl:non-matching-substring>
> 
>    </xsl:analyze-string>
> 
> </xsl:template>
> 
> 
> What is not clear to me is:
> 
> - whether the regex actually suffices to match the rules
> 
> - if it is a good idea to use xsl:for-each there
> 
> - how to assure concatenation of all the "e" instances' 
> contents in cases 1) and 4) without processing them 
> repeatedly - i.e.: how can I restrict the call to 
> xsl:apply-templates to cases 2) and 3)?
> 
> Any comments would be greatly appreciated.
> 
>    Yves

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.