RE: Using analyze-string to catch roman numerals?

Play the video

Subject: RE: Using analyze-string to catch roman numerals?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 9 Oct 2008 23:05:57 +0100

The two things wrong with your solution are:

(a) you're matching any sequence of letters that could be a roman numeral,
without looking at the context, hence matching the IX in APPENDIX.

(b) you're only matching the first thing in each element that looks like a
roman numeral

The second is easily fixed: don't use an anchored regex in analyze-string
like this

regex="^(.*?)([IVXL]+)(.*?)$"

Instead use an unanchored regex

regex="([IVXL]+)"

and add an xsl:non-matching-substring element that copies unmatched
substrings across unchanged (or case-converted if you want).

Problem (a) is much harder. You can get a fair way by requiring the sequence
of IVXL to have non-letters before and after it. But you'll still be
matching the word "ILL" as a roman numeral when it clearly isn't. Like all
up-conversion tasks, though, it's very much up to you how much time you want
to spend fine-tuning the patterns and rules that you define.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Tony Zanella [mailto:tony.zanella@xxxxxxxxx] 
> Sent: 09 October 2008 20:18
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Using analyze-string to catch roman numerals?
> 
> Hello all,
> 
> Given the following input:
> 
> <root>
>     <head>CHAPTER II. THE WRECKED FOUNDATIONS OF DOMESTICITY</head>
>     <head>PROBLEMA. HELOISE XXIX.</head>
>     <head>Selected Letters</head>
>     <head>The Second Part of Henry IV.</head>
>     <head>VIII</head>
>     <head>APPENDIX VII</head>
>     <head>Appendix VII</head>
>     <head>APPENDIX</head>
>     <head>CALVIN XVII</head>
>     <head>ILLUSTRATION</head>
> </root>
> 
> and the following template:
> 
> <xsl:template match="head">
>         <xsl:choose>
>             <xsl:when test="not(matches(.,'^(.*?)([IVXL]+)(.*?)$'))">
>                 <xsl:value-of select="lower-case(.)"/>
>             </xsl:when>
>             <xsl:when test="matches(.,'^(.*?)([IVXL]+)(.*?)$')">
>                 <xsl:analyze-string select="." 
> regex="^(.*?)([IVXL]+)(.*?)$">
>                     <xsl:matching-substring>
>                         <xsl:value-of 
> select="lower-case(regex-group(1))"/>
>                         <xsl:value-of 
> select="upper-case(regex-group(2))"/>
>                         <xsl:value-of 
> select="lower-case(regex-group(3))"/>
>                     </xsl:matching-substring>
>                 </xsl:analyze-string>
>             </xsl:when>
>             <xsl:otherwise/>
>         </xsl:choose>
>     </xsl:template>
> 
> I'm trying to use analyze-string to do the following:
> Test for a roman numeral. If there isn't one, lower-case(.). 
> If there is one, break (.) into its roman numeral and 
> non-roman numeral parts, lower-case()ing the latter.
> 
> The output I get is:
> 
>     chapter II. the wrecked foundations of domesticity
>     probLema. heloise xxix.
>     selected Letters
>     the second part of henry IV.
>     VIII
>     appendIX vii
>     appendix VII
>     appendIX
>     caLVIn xvii
>     ILLustration
> 
> When what I want is this:
> 
> 	chapter II. the wrecked foundations of domesticity
> 	problema. heloise XXIX.
> 	selected letters
> 	the second part of henry IV.
> 	VIII
> 	appendix VII
> 	appendix VII
> 	appendix
> 	calvin XVII
> 	illustration
> 
>  Between my relative inexperience with both regexes and XSLT, 
> thanks for any help!
> Tony

Current Thread
Using analyze-string to catch roman numerals? Tony Zanella - 9 Oct 2008 19:18:42 -0000 Syd Bauman - 9 Oct 2008 19:32:04 -0000 Message not available G. Ken Holman - 9 Oct 2008 19:51:04 -0000 Michael Kay - 9 Oct 2008 22:06:24 -0000 <=

<- Previous	Index	Next ->
Re: Using analyze-string to c, G. Ken Holman	Thread	Parameters into variables, Joe Barwell
Re: Using analyze-string to c, G. Ken Holman	Date	Re: Usage of XSLT in the fiel, J. S. Rawat
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >