[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Using XSLT to build an index

Subject: Using XSLT to build an index
From: "Mark" <mark@xxxxxxxxxxxx>
Date: Sun, 30 Oct 2011 14:47:34 -0700
 Using XSLT to build an index
The list archives did not seem to contain an XSLT stylesheet that could index an XML file, but I may have missed it. Is it practical to write my own XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want to index. My assumptions are that I must get rid of the punctuation properly, then isolate the words, sort them, remove stop words, and so on. To get started, I need a bit of help. All of the phrases are found in two attributes: @czech and @eng.

Three questions:
(1) I am aware from Michaelbs book that regex expressions may be used in the replace() function, but I do not know how to write that regex expression. I would like to remove all the punctuation from a phrase as follows: for everything except a hyphen [-], replacement should be with an empty string; the hyphen should be replaced with a single space.


(2) I assume that to get rid of extra spaces (if any), I can use a construct like: normalize-space(replace(@czech, bsome regex expressionb)).

(3) I assume that tokenize(normalize-space(replace(@czech, 'some regex expression'))) will permit me to write out a list of the words found in those attributes to an XML document. I am not completely clear as to what tokenize() returns, or how to access that return.

I would appreciate any comments, and especially the construction of the regex expression needed.
Thanks,
Mark


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.