[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Similarity metric in XSLT 2?

Subject: Re: Similarity metric in XSLT 2?
From: Martin Holmes <mholmes@xxxxxxx>
Date: Fri, 30 Mar 2012 13:48:54 -0700
 Re: Similarity metric in XSLT 2?
On 12-03-30 01:24 PM, Imsieke, Gerrit, le-tex wrote:
I can only affirm that I'd be interested in such a library, too.

The last time that I needed string similarity metrics (4 yrs ago), I
used Perl with XML::LibXML and String::Similarity.

If there were such a module / extension function for XPath / XSLT, I'd
probably used it more often. If you find a Java library that is easy to
interface with from Java-based XSLT processors, please let me know. I
think that Levenshtein or more advanced algorithms will be too slow when
implemented in XSLT, but may be readily available as an extension function.

I once implemented the Universal Similarity Metric (Normalized Compression Distance) in Pascal and Java:


<http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-693.html>

and found that it was surprisingly effective for short strings, as well as being very fast. I might look at figuring out how to call the Java library from Saxon. Implementing the metric was trivial.

Cheers,
Martin

Gerrit

On 2012-03-30 20:18, Martin Holmes wrote:
Hi all,

I'm faced with a situation in which I have to match an input string
against a set of possible candidates, and I need to find the match which
is most similar to it (I'm trying to identify correspondences between
two sets of files which have similar, but not identical, content).

Has anyone done anything like measuring string similarity in XSLT 2.0?
If so, how did you approach it?

All help appreciated,
Martin

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.