[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Using analyze-string to catch roman numerals?

Subject: RE: Using analyze-string to catch roman numerals?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 9 Oct 2008 23:05:57 +0100
RE:  Using analyze-string to catch roman numerals?
The two things wrong with your solution are:

(a) you're matching any sequence of letters that could be a roman numeral,
without looking at the context, hence matching the IX in APPENDIX.

(b) you're only matching the first thing in each element that looks like a
roman numeral

The second is easily fixed: don't use an anchored regex in analyze-string
like this

regex="^(.*?)([IVXL]+)(.*?)$"

Instead use an unanchored regex

regex="([IVXL]+)"

and add an xsl:non-matching-substring element that copies unmatched
substrings across unchanged (or case-converted if you want).

Problem (a) is much harder. You can get a fair way by requiring the sequence
of IVXL to have non-letters before and after it. But you'll still be
matching the word "ILL" as a roman numeral when it clearly isn't. Like all
up-conversion tasks, though, it's very much up to you how much time you want
to spend fine-tuning the patterns and rules that you define.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Tony Zanella [mailto:tony.zanella@xxxxxxxxx] 
> Sent: 09 October 2008 20:18
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Using analyze-string to catch roman numerals?
> 
> Hello all,
> 
> Given the following input:
> 
> <root>
>     <head>CHAPTER II. THE WRECKED FOUNDATIONS OF DOMESTICITY</head>
>     <head>PROBLEMA. HELOISE XXIX.</head>
>     <head>Selected Letters</head>
>     <head>The Second Part of Henry IV.</head>
>     <head>VIII</head>
>     <head>APPENDIX VII</head>
>     <head>Appendix VII</head>
>     <head>APPENDIX</head>
>     <head>CALVIN XVII</head>
>     <head>ILLUSTRATION</head>
> </root>
> 
> and the following template:
> 
> <xsl:template match="head">
>         <xsl:choose>
>             <xsl:when test="not(matches(.,'^(.*?)([IVXL]+)(.*?)$'))">
>                 <xsl:value-of select="lower-case(.)"/>
>             </xsl:when>
>             <xsl:when test="matches(.,'^(.*?)([IVXL]+)(.*?)$')">
>                 <xsl:analyze-string select="." 
> regex="^(.*?)([IVXL]+)(.*?)$">
>                     <xsl:matching-substring>
>                         <xsl:value-of 
> select="lower-case(regex-group(1))"/>
>                         <xsl:value-of 
> select="upper-case(regex-group(2))"/>
>                         <xsl:value-of 
> select="lower-case(regex-group(3))"/>
>                     </xsl:matching-substring>
>                 </xsl:analyze-string>
>             </xsl:when>
>             <xsl:otherwise/>
>         </xsl:choose>
>     </xsl:template>
> 
> I'm trying to use analyze-string to do the following:
> Test for a roman numeral. If there isn't one, lower-case(.). 
> If there is one, break (.) into its roman numeral and 
> non-roman numeral parts, lower-case()ing the latter.
> 
> The output I get is:
> 
>     chapter II. the wrecked foundations of domesticity
>     probLema. heloise xxix.
>     selected Letters
>     the second part of henry IV.
>     VIII
>     appendIX vii
>     appendix VII
>     appendIX
>     caLVIn xvii
>     ILLustration
> 
> When what I want is this:
> 
> 	chapter II. the wrecked foundations of domesticity
> 	problema. heloise XXIX.
> 	selected letters
> 	the second part of henry IV.
> 	VIII
> 	appendix VII
> 	appendix VII
> 	appendix
> 	calvin XVII
> 	illustration
> 
>  Between my relative inexperience with both regexes and XSLT, 
> thanks for any help!
> Tony

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.