[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Extraction of data using key() and matches()

Subject: Re: Extraction of data using key() and matches()
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Sat, 05 Jun 2010 22:42:34 +0100
Re:  Extraction of data using key() and matches()
On 05/06/2010 20:02, Jakob Fix wrote:

I have a large number of XML data files which contain a table with
rows and data cells each (previously Excel files).

I'm interested in finding out whether in the table's data cells there
is or is not a given country name. If so I want to record in another
file all country names that appear in the data file. The country name
may be the only content of the data cell (<col>United Kingdom</col>),
or it may be surrounded by other text (<col>Data has been provided for
United Kingdom only.</col>). It can also be that more than one country
name appears in a table cell. There won't be other elements in the
cell, just character data.

My current approach is to have an exhaustive lookup files with *all*
country names that are potentially used. For each XML data file, I
loop over all country names and query the contents of each data file
whether it matches the current country name.

You could create an index on all the "words" in the text using

<xsl:key name="words" match="col" use="tokenize(., '\P{L}+')"/>

where a word is defined as a maximal sequence of "letter" characters.

Then to see whether a given country is present you could start by testing whether the first word of the country name is present:

key('words', tokenize($country, '\P{L}+')[1])

and then apply a more sensitive test to the result of this first filter.

Michael Kay

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.