Hi all,
I seem to recall someone posting a goodie to this ng (quite a while back, 2 years or so?) that did string comparison and returned a 'score' as to how close the strings were alike. For example 'ABC' and 'DEF' have a zero score, but 'ABC and 'ACB' have a fairly high score. I believe the goodie was based on some study or something. I've looked through the archives, but couldn't find it. I'm looking for something like that as I have to do name comparison, and with the many ways of spelling names, you can imagine that the chance of getting it wrong is quite high. Anybody any ideas? Thanks in advance, Ted |
"Ted Bracht" <[hidden email]> wrote in message
news:[hidden email]... > Hi all, > > I seem to recall someone posting a goodie to this ng (quite a while back, 2 > years or so?) that did string comparison and returned a 'score' as to how > close the strings were alike. For example 'ABC' and 'DEF' have a zero score, > but 'ABC and 'ACB' have a fairly high score. I believe the goodie was based > on some study or something. I've looked through the archives, but couldn't > find it. > > I'm looking for something like that as I have to do name comparison, and > with the many ways of spelling names, you can imagine that the chance of > getting it wrong is quite high. > > Anybody any ideas? > I suspect it was "Soundex". It is commonly used for that sort of thing. Its a pretty easy algorithm to implement, and easily found on the web. I keep an old magazine page that describes the algorithm in my desk for a "rainy day" project of implementing a #soundsLike: method on String, but I've never quite got around to it. I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame) posted a Dolphin implementation to the group in Jan 98, but I don't know whether anyone still has it. Regards Blair |
In reply to this post by Ted Bracht-2
Hi Ted,
> I seem to recall someone posting a goodie to this ng (quite a while > back, 2 years or so?) that did string comparison and returned a > 'score' as to how close the strings were alike. For example 'ABC' and > 'DEF' have a zero score, but 'ABC and 'ACB' have a fairly high score. > I believe the goodie was based on some study or something. I've > looked through the archives, but couldn't find it. ----------------------8<-------------------- String>>distance: comparand "It calculates the following: Given two strings, self and comparand, and three operations, adding, subtracting and exchanging single characters, what is the minimal number of steps needed to translate self into comparand?" | matrix size compSize | size := self size. compSize := comparand size. matrix := Array new: size + 1. 1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ]. 0 to: size do: [:i | (matrix at: i+1) at: 1 put: i]. 0 to: compSize do: [:i | (matrix at: 1) at: i+1 put: i]. 1 to: size do: [ :i | 1 to: compSize do: [ :j | | x y z | x := (( matrix at: i) at: j+1 ) + 1. y := (( matrix at: i+1) at: j ) + 1. (self at: i) = (comparand at: j) ifTrue: [ z := (matrix at: i) at: j ] ifFalse: [ z := ((matrix at: i) at: j) + 1 ]. (matrix at: i+1) at: j+1 put: ((x min: y) min: z). ] ]. ^ (matrix at: size+1 ) at: compSize+1 ----------------------8<-------------------- If you need a mathematical model, I'll try to find it. -- Dmitry Zamotkin |
In reply to this post by Blair McGlashan
> I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame)
> posted a Dolphin implementation to the group in Jan 98, but I don't know > whether anyone still has it. I've got copies of all the original messages and attachments posted to the various newsgroups. If anyone wants a copy of this attachment (or any other) then let me know and I will mail it to them. Ian |
Hi Blair and Ian,
"Ian Bartholomew" <[hidden email]> wrote in message news:Ih1M9.2396$4k6.278584@wards... > > I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame) > > posted a Dolphin implementation to the group in Jan 98, but I don't know > > whether anyone still has it. > > I've got copies of all the original messages and attachments posted to the > various newsgroups. If anyone wants a copy of this attachment (or any > other) then let me know and I will mail it to them. > Thanks alot, just what I was looking for. Ted |
In reply to this post by Ted Bracht-2
Ted Bracht wrote:
> I seem to recall someone posting a goodie to this ng (quite a while > back, 2 years or so?) that did string comparison and returned a > 'score' as to how close the strings were alike. You may be remembering my post of 2001/5/16 in the "Fuzzy string comparison" thread, which had an implementation of a Minimum Edit Distance algorithm. Let me know if you can't find it (assuming that's what you're looking for) and I'll dig it out. -- chris |
In reply to this post by Dmitry Zamotkin-4
Dmitry Zamotkin wrote:
> matrix := Array new: size + 1. > 1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ]. BTW, if you arrange your order of scanning cleverly, you can get away with O(N+M) rather than O(N*M) working space. You can also avoid (except as a worst-case) doing O(N*M) work, as the algorithm I just posted a reference to, shows. -- chris |
Hi Chris and Dmitry
"Chris Uppal" <[hidden email]> wrote in message news:3e00bc3e$0$12109$[hidden email]... > Dmitry Zamotkin wrote: > > > matrix := Array new: size + 1. > > 1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ]. > This was the actual thread I remembered. Thanks for pointing me in the right direction. In this particular case (comparing names) Soundex seems to work easier though. Thanks, Ted |
Free forum by Nabble | Edit this page |