|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: How to parse text into words, phrases, clauses, s
> This is my first problem. How to apply a template match ysing
> the tokenize() function. And which order to apply (from
> paragraph -> word or word -> paragraph).
It's generally easiest to do it top-down, I think.
Something like this:
<xsl:for-each select="tokenize(., $sentence-delimiter)">
<sentence id="{position()}">
<xsl:for-each select="tokenize(., $phrase-delimiter)">
<phrase id="{position()}">
<xsl:for-each select="tokenize(., $word-delimiter)">
<word id="{position()}">
<xsl:value-of select="."/>
>
> > (d) doing the output numbering.
>
I think you just need position() as shown above.
Sometimes you need to work bottom-up if the "sentences" can't be recognized
until you've identified the "words", for example if you want to avoid
treating "." as ending a sentence if it appears in a number. You're then
sometimes in the domain of positional grouping: create a long flat list of
words, and then group it into sentences using some kind of test applied to
the individual words.
Michael Kay
http://www.saxonica.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








