[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Formatting string
Abel Braaksma wrote:
Jesper Tverskov wrote:It is impossible to come up with a REGEX that can handle any combination of upper case and lower case. What about PaulMcCartney or JFK? If pascal notation is not used, XxxxXxxxx, or a similar strict pattern, a REGEX solution is only possible if we know all input strings from the start. Perhaps I misunderstood what you are implying (should Mc Cartney be written McCartney? I didn't know). But if you mean that you want a list of exceptions that do not need to be split into words, then you are right: you'll need that list. We know little from the OP, we are only guessing here. I.e., is the string in one field, or is it part of a larger string? Should consecutive capitals be ignored or not? Are there exceptions? Can a string contain non-latin characters, or punctuation? I.e.: 1. O'Reilly >>>> O'Reilly 2. McDonald's >>>> McDonald's 3. Paul McCartney >>>> Paul McCartney 4. J.K.Rowling >>>> J.K. Rowling (?) 5. JKRowling >>>> J K Rowling (?) 6. JFK >>>> JFK 7. BankOfUSA >>>> Bank Of USA 1, 5 and 6 go well with my last regex, using "\{Lu}+". For the rest, I think you need an exceptions list, which you can place as alternates at the start of the regex (which may yield funny results when the OPs text is from a larger corpus). But all I'm doing is guessing on the requirements. Perhaps Babu will enlighten us? ;) Cheers -- Abel Braaksma
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|