|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [OT] Looking for a text algorithm
David Megginson wrote, > I'm looking for references to a specific kind of text algorithm -- > the algorithm should generate a number (say, 32 or 64 bits) for any > text string of any length, similar to a hash. However, it should be > possible to compare the numbers for different strings to tell how > close they are to each other. For example, the numbers for > > 1. To be or not to be. > > 2. Two bees or not two bees. > > 3. I don't know whether to be or not to be. > > should indicate that three strings are relatively close to each other > (while a hash number would give no indication at all). Umm ... define "close". Judging from your examples it looks like you're after a closeness criterion derived from longest common subsequences. But I don't see how you could use that to usefully construct a single characteristic number for _any_ string of _any_ length: with only 32 or 64 bits to play with, many many completely unrelated (on any criterion) strings will collide on the same code. Cheers, Miles
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








