|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Using analyze-string to catch roman numerals?
The two things wrong with your solution are: (a) you're matching any sequence of letters that could be a roman numeral, without looking at the context, hence matching the IX in APPENDIX. (b) you're only matching the first thing in each element that looks like a roman numeral The second is easily fixed: don't use an anchored regex in analyze-string like this regex="^(.*?)([IVXL]+)(.*?)$" Instead use an unanchored regex regex="([IVXL]+)" and add an xsl:non-matching-substring element that copies unmatched substrings across unchanged (or case-converted if you want). Problem (a) is much harder. You can get a fair way by requiring the sequence of IVXL to have non-letters before and after it. But you'll still be matching the word "ILL" as a roman numeral when it clearly isn't. Like all up-conversion tasks, though, it's very much up to you how much time you want to spend fine-tuning the patterns and rules that you define. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Tony Zanella [mailto:tony.zanella@xxxxxxxxx] > Sent: 09 October 2008 20:18 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Using analyze-string to catch roman numerals? > > Hello all, > > Given the following input: > > <root> > <head>CHAPTER II. THE WRECKED FOUNDATIONS OF DOMESTICITY</head> > <head>PROBLEMA. HELOISE XXIX.</head> > <head>Selected Letters</head> > <head>The Second Part of Henry IV.</head> > <head>VIII</head> > <head>APPENDIX VII</head> > <head>Appendix VII</head> > <head>APPENDIX</head> > <head>CALVIN XVII</head> > <head>ILLUSTRATION</head> > </root> > > and the following template: > > <xsl:template match="head"> > <xsl:choose> > <xsl:when test="not(matches(.,'^(.*?)([IVXL]+)(.*?)$'))"> > <xsl:value-of select="lower-case(.)"/> > </xsl:when> > <xsl:when test="matches(.,'^(.*?)([IVXL]+)(.*?)$')"> > <xsl:analyze-string select="." > regex="^(.*?)([IVXL]+)(.*?)$"> > <xsl:matching-substring> > <xsl:value-of > select="lower-case(regex-group(1))"/> > <xsl:value-of > select="upper-case(regex-group(2))"/> > <xsl:value-of > select="lower-case(regex-group(3))"/> > </xsl:matching-substring> > </xsl:analyze-string> > </xsl:when> > <xsl:otherwise/> > </xsl:choose> > </xsl:template> > > I'm trying to use analyze-string to do the following: > Test for a roman numeral. If there isn't one, lower-case(.). > If there is one, break (.) into its roman numeral and > non-roman numeral parts, lower-case()ing the latter. > > The output I get is: > > chapter II. the wrecked foundations of domesticity > probLema. heloise xxix. > selected Letters > the second part of henry IV. > VIII > appendIX vii > appendix VII > appendIX > caLVIn xvii > ILLustration > > When what I want is this: > > chapter II. the wrecked foundations of domesticity > problema. heloise XXIX. > selected letters > the second part of henry IV. > VIII > appendix VII > appendix VII > appendix > calvin XVII > illustration > > Between my relative inexperience with both regexes and XSLT, > thanks for any help! > Tony
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








