[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: how to extract words from a text
I decided to take a whack at it and came up with the following XSL file: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:output method="text" omit-xml-declaration="yes" indent="no" /> <xsl:template match="text"> <xsl:call-template name="makeList"> <xsl:with-param name="textIn" select="translate(., ',', '')"/> <xsl:with-param name="wordsSoFar"/> </xsl:call-template> </xsl:template> <xsl:template name="makeList"> <xsl:param name="textIn"/> <xsl:param name="wordsSoFar"/> <xsl:choose> <xsl:when test="contains($textIn, ' ')"> <xsl:variable name="firstWord" select="substring-before($textIn, ' ')"/> <xsl:choose> <xsl:when test="string-length($firstWord)>2 and not(contains($wordsSoFar, $firstWord))"> <xsl:variable name="newString"> <xsl:choose> <xsl:when test="string-length($wordsSoFar)=0"> <xsl:value-of select="$firstWord"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$firstWord"/><xsl:text>, </xsl:text><xsl:value-of select="$wordsSoFar"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:call-template name="makeList"> <xsl:with-param name="textIn" select="substring-after($textIn, ' ')"/> <xsl:with-param name="wordsSoFar" select="$newString"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="makeList"> <xsl:with-param name="textIn" select="substring-after($textIn, ' ')"/> <xsl:with-param name="wordsSoFar" select="$wordsSoFar"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:choose> <xsl:when test="string-length($textIn)>2"> <xsl:choose> <xsl:when test="contains($wordsSoFar, $textIn)"> <xsl:value-of select="$wordsSoFar"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$textIn"/><xsl:text>, </xsl:text><xsl:value-of select="$wordsSoFar"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-of select="$wordsSoFar"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet> When run against the following XML file: <root> <text>This is a text, that is a text</text> </root> it produces the following output: that, text, This Note that it does not handle case, so 'Text' and 'text' are different words. I only have so much time to fiddle, so I didn't get that far. Also, I expect that other, more-experienced, folks around here can produce a better implementation. Still, this one works. Jay Bryant Bryant Communication Services JBryant@xxxxxxxxx 12/10/2004 01:32 PM Please respond to xsl-list@xxxxxxxxxxxxxxxxxxxxxx To xsl-list@xxxxxxxxxxxxxxxxxxxxxx cc Subject Re: how to extract words from a text > And look at substring-after() or substring-before() and a recursive template... Bingo. If I were going to try this, I would write a recursive template that nibbled the first word off the string, checked its length, kept it if 3+ characters or tossed it if too short, and then passed the remaining string to the next instance of the template. Once no spaces remain in the string, it's done. Jay Bryant Bryant Communication Services Antsnio Mota <xptm@xxxxxxx> 12/10/2004 01:05 PM Please respond to xsl-list@xxxxxxxxxxxxxxxxxxxxxx To xsl-list@xxxxxxxxxxxxxxxxxxxxxx cc Subject Re: how to extract words from a text I have no idea too, specially on a friday this hour... But maybe this give _you_ something to think about. It's a "word count" method. <xsl:variable name="txt"><xsl:value-of select="text" /></xsl:variable> <xsl:variable name="x" select="normalize-space($txt)" /> <xsl:variable name="y" select="translate($txt, ' ', '')" /> <xsl:variable name="wc" select="string-length($x) - string-length($y) +1" /> so wc (word count) in your example will be 8... And look at substring-after() or substring-before() and a recursive template... Quoting Jan Limpens <jan.limpens@xxxxxxxxx>: > hello again, > > I hope you can help me with this one just as well, as with my other > question today! :) > > i have a xml document > <root> > <text>This is a text, that is a text</text> > </root> > > and I need to extract every word from it - once, ignoring case, and > ordered by ocurrence, stripping 1-2 letter words - to make a meta > keywords tag from it... > > <meta name="keywords" content="text, that, this"/> > > the horror! the horror! I have no idea how to do this! :) > > thanks again! > -- > Jan > http://www.limpens.com > > Otakoo Saloon Cartoon - newest episode at http://limpens.com/oscredirect > > O SAPO ja esta livre de vmrus com a Panda Software, fique vocj tambim! Clique em: http://antivirus.sapo.pt
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|