[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Linenumbering & word index

Subject: Re: Linenumbering & word index
From: James Cummings <James.Cummings@xxxxxxxxxxxxxx>
Date: Fri, 6 Aug 2004 17:39:34 +0100 (BST)
james cummings
On Fri, 6 Aug 2004, David Carlisle wrote:

> You can't do 
> tokenize(l/text(), '\s+')
> because it wants a single string as its first argument and that's
> probably more than one. 

Yup.  And that's one of the places I was getting confuddled. :-(

> You can do
>  select="for $l in l return tokenize($l,'\s+')"
> or same with for-each and tokenize them one at a time.

ok, I think I understand that, and might work for smaller things.

> however you really want to make yourself a tree first something like:
> 
Let's see if I understand the way this works. (I do like getting 
solutions, but also want to learn ;-)   )

> <xsl:template match="/">
> <xsl:variable name="x">
> <xsl:apply-templates mode="a" select="div[@type='poem']"/>
> </xsl:variable>

Creates variable $x from the templates of mode a below for 
only the poem divs.  (See, now *that* is how to avoid the 
stuff I don't want to include.. *doh*)

> [
> <xsl:copy-of  select="$x"/>
> ]

Copy of the temporary tree listing each poem, and word in line 
for that poem.

> <xsl:for-each-group select="$x/div/l/word" group-by=".">

Groups by each word in the temporary tree and sorts them
outputting  the word 
>  <xsl:sort />
>   <xsl:text>&#10;</xsl:text>
>   <xsl:value-of select="."/>

then for each instance of a word (keys always confuse me) it 
outputs the @poem and @n line numbers.

>   <xsl:for-each select="key('w',.)">
>   <xsl:text> </xsl:text>
>   <xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
>   </xsl:for-each>
> </xsl:for-each-group>
> </xsl:template>
> 

Applies the original mode a match for divs only 
to head and lg/l (modes...yes, must use modes more.)
> <xsl:template mode="a" match="div">
> <div poem="{position()}">
> <xsl:apply-templates mode="a" select="head"/>
> <xsl:apply-templates mode="a" select="lg/l"/>
> </div>
> </xsl:template>
> 

When you find a head, tokenize it into a temporary 
tree of <word> elements
> <xsl:template mode="a" match="head">
> <l n="head">
> <xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
> <word><xsl:value-of select="lower-case(.)"/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
> 

When you find a l tokenize it into a temporary tree 
of <word> elements, recording the line's position

> <xsl:template mode="a" match="l">
> <l n="{position()}">
> <xsl:for-each select="tokenize(.,'\s+')">
> <word><xsl:value-of select="."/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
> 

For each <word> element that we've just created 
make a key of name w.
> <xsl:key name="w" match="word" use="."/>

Seems to work absolutely perfectly.  (well, I'll customise 
the tokenize string...)

Many many thanks.

-James

---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk 
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and 
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.