[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: sorting a list of titles after removal of stopword

Subject: Re: sorting a list of titles after removal of stopwords and special characters
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Dec 2001 17:09:11 +0000
stop word removal
Trevor Nash wrote:
> What you need is an expression that, given the context of a title
> element, will return a string containing the edited title (stop words
> removed).  This cannot be done with standard XSLT, but you have three
> possibilities:

Actually, it's not *impossible* with standard XSLT, although
admittedly it isn't pretty. Assuming that $punctuation is a string
holding the ignorable punctuation characters and that the list of
stopwords were sorted such that 'an' comes before 'a' rather than
after it, you could use:

 concat(
  substring(
   substring(translate(title, $punctuation, ''),
             string-length(
              $stoplist[starts-with(
                         translate(current()/title,
                                   concat($lowercase, $punctuation),
                                   $uppercase),
                         translate(., $lowercase, $uppercase))]) + 2),
   1 div boolean($stoplist[starts-with(
                            translate(current()/title,
                                      concat($lowercase, $punctuation),
                                      $uppercase),
                            translate(., $lowercase, $uppercase))])),
  substring(
   translate(title, $punctuation, ''),
   1 div not($stoplist[starts-with(
                        translate(current()/title,
                                  concat($lowercase, $punctuation),
                                  $uppercase),
                        translate(., $lowercase, $uppercase))])))

If we were using XPath 2.0, assuming an if statement similar to
that in XQuery, it would look something like:

  if ($stoplist[starts-with(
                 translate(current()/title,
                           concat($lowercase, $punctuation),
                           $uppercase),
                 translate(., $lowercase, $uppercase))])
  then substring(translate(title, $punctuation, ''),
                 string-length(
                   $stoplist[starts-with(
                             translate(current()/title,
                                       concat($lowercase, $punctuation),
                                       $uppercase),
                             translate(., $lowercase, $uppercase))]) + 2)
  else translate(title, $punctuation)

which isn't that much more pleasant.

If the stop words were stored with a space, as:

  <ignore>the </ignore>
  <ignore>an </ignore>
  <ignore>a </ignore>

(which would probably a good idea anyway, given that quite a few
titles might begin with the letter 'A') then you could use simply:

 substring(translate(title, $punctuation, ''),
           string-length(
             $stoplist[starts-with(
                        translate(current()/title,
                                  concat($lowercase, $punctuation),
                                  $uppercase),
                        translate(., $lowercase, $uppercase))]) + 1)
                        
>    1) You are using Saxon, which has an extension saxon:function
>     which lets you write a function in XSLT - more or less the
>     contents of your mode="with-stoplist" template.

Just to mention, you can also use func:function from the EXSLT
namespace http://exslt.org/functions in Saxon, 4XSLT, jd.xslt and
libxslt to achieve this. It's more portable to use func:function than
to use saxon:function (because it's available in those other
processors), but they do basically the same thing. See
http://www.exslt.org/func for details.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.