[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Count a specific word in a document

Michael Strasser M.Strasser at gpo.com
Thu Jun 14 08:22:31 PDT 2007


  Count a specific word in a document
Michael


Thanks for your thorough response and the warning about text(). I spared 
everyone the source document because it is not very good (and obviously 
it is long). I started with 
http://www.simonandkevin.com/ElijahLibretto.htm and converted its source 
from MS-Worded HTML to XHTML using a text editor. Its markup is visual, 
not structural. (My next XQuery project might be to convert its markup 
to a structural one.)

An excerpt is:

  <td>
    <p>
    <i>Elijah</i>
    <br/>
    Draw near, all ye people, come to me . . .
    </p>
    <p>
    Lord God of Abraham, Isaac and Israel, this day let
    it be known that Thou art God, and that I am Thy
    servant! Lord God of Abraham! Oh show to all this
    people that I have done these things according to
    Thy word.
    Oh hear me, Lord, and answer me!
    Lord God of Abraham, Isaac and Israel, oh hear me
    and answer me, and show this people that Thou art
    Lord God. And let their hearts again be turned!
    </p>
  </td>

So you see that $elijah//td/p/[i = 'Elijah'] will not capture all Elijah 
sings. In fact, I ended up using this to capture all paragraphs of his 
sung text:

  let $td := doc("/db/mjs/ElijahLibretto.xhtml")/html//td[p/i = 'Elijah']
  let $elijah-para := $td/p[i = 'Elijah' or i = 'Both' or count(i) = 0]

(<i>'Both'</i> marks his lines of duet with the Widow.)

Thanks also for fixing up my use of tokenize(). I don't like using 
something I don't understand (especially in a public forum). The results 
were different using "\W+": I got 37 occurrences of 'Lord' instead of 36 
(Jonathan Robie's regexp didn't tokenise 'Lord?' correctly).

Is there a web repository of XQuery questions and answers like Dave 
Pawson's very useful Q&A for XSLT?


Michael Strasser

(P.S. Why did I choose this strange exercise? Last year I sang the part 
of Elijah and wondered how often he uttered the word 'Lord'. Merely 
going through the score and counting is not geeky enough!)



Purchase Stylus Studio Online Today!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.