[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Extraction of data using key() and matches()

Subject: Re: Extraction of data using key() and matches()
From: Jakob Fix <jakob.fix@xxxxxxxxx>
Date: Sun, 6 Jun 2010 00:01:54 +0200
Re:  Extraction of data using key() and matches()
On Sat, Jun 5, 2010 at 23:42, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> On 05/06/2010 20:02, Jakob Fix wrote:
>>
>> Hello,
>>
>> I have a large number of XML data files which contain a table with
>> rows and data cells each (previously Excel files).
>>
>> I'm interested in finding out whether in the table's data cells there
>> is or is not a given country name. If so I want to record in another
>> file all country names that appear in the data file. The country name
>> may be the only content of the data cell (<col>United Kingdom</col>),
>> or it may be surrounded by other text (<col>Data has been provided for
>> United Kingdom only.</col>). It can also be that more than one country
>> name appears in a table cell. There won't be other elements in the
>> cell, just character data.
>>
>> My current approach is to have an exhaustive lookup files with *all*
>> country names that are potentially used. For each XML data file, I
>> loop over all country names and query the contents of each data file
>> whether it matches the current country name.
>>
>>
>
> You could create an index on all the "words" in the text using
>
> <xsl:key name="words" match="col" use="tokenize(., '\P{L}+')"/>
>
> where a word is defined as a maximal sequence of "letter" characters.
>
> Then to see whether a given country is present you could start by testing
> whether the first word of the country name is present:
>
> key('words', tokenize($country, '\P{L}+')[1])
>
> and then apply a more sensitive test to the result of this first filter.
>
> Michael Kay
> Saxonica


Thanks Michael, I'll give this a try.

Jakob.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.