String comparison

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

String comparison

Ted Bracht-2
Hi all,

I seem to recall someone posting a goodie to this ng (quite a while back, 2
years or so?) that did string comparison and returned a 'score' as to how
close the strings were alike. For example 'ABC' and 'DEF' have a zero score,
but 'ABC and 'ACB' have a fairly high score. I believe the goodie was based
on some study or something. I've looked through the archives, but couldn't
find it.

I'm looking for something like that as I have to do name comparison, and
with the many ways of spelling names, you can imagine that the chance of
getting it wrong is quite high.

Anybody any ideas?

Thanks in advance,

Ted


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Blair McGlashan
"Ted Bracht" <[hidden email]> wrote in message
news:[hidden email]...
> Hi all,
>
> I seem to recall someone posting a goodie to this ng (quite a while back,
2
> years or so?) that did string comparison and returned a 'score' as to how
> close the strings were alike. For example 'ABC' and 'DEF' have a zero
score,
> but 'ABC and 'ACB' have a fairly high score. I believe the goodie was
based
> on some study or something. I've looked through the archives, but couldn't
> find it.
>
> I'm looking for something like that as I have to do name comparison, and
> with the many ways of spelling names, you can imagine that the chance of
> getting it wrong is quite high.
>
> Anybody any ideas?
>

I suspect it was "Soundex". It is commonly used for that sort of thing. Its
a pretty easy algorithm to implement, and easily found on the web. I keep an
old magazine page that describes the algorithm in my desk for a "rainy day"
project of implementing a #soundsLike: method on String, but I've never
quite got around to it.

I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame)
posted a Dolphin implementation to the group in Jan 98, but I don't know
whether anyone still has it.

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Dmitry Zamotkin-4
In reply to this post by Ted Bracht-2
Hi Ted,

> I seem to recall someone posting a goodie to this ng (quite a while
> back, 2 years or so?) that did string comparison and returned a
> 'score' as to how close the strings were alike. For example 'ABC' and
> 'DEF' have a zero score, but 'ABC and 'ACB' have a fairly high score.
> I believe the goodie was based on some study or something. I've
> looked through the archives, but couldn't find it.

----------------------8<--------------------

String>>distance: comparand
 "It calculates the following: Given two strings, self and comparand, and
three operations, adding, subtracting and exchanging single characters, what
is the minimal number of steps needed to translate self into comparand?"
 | matrix size compSize |

 size := self size.
 compSize := comparand size.

 matrix := Array new: size + 1.
 1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ].

 0 to: size do: [:i | (matrix at: i+1) at: 1 put: i].
 0 to: compSize do: [:i | (matrix at: 1) at: i+1 put: i].

 1 to: size do: [ :i |
  1 to: compSize do: [ :j | | x y z |
   x := (( matrix at: i) at: j+1 ) + 1.
   y := (( matrix at: i+1) at: j ) + 1.

   (self at: i) = (comparand at: j)
    ifTrue: [ z := (matrix at: i) at: j ]
    ifFalse: [ z := ((matrix at: i) at: j) + 1 ].
   (matrix at: i+1) at: j+1 put: ((x min: y) min: z).
  ]
 ].

 ^ (matrix at: size+1 ) at: compSize+1

----------------------8<--------------------

If you need a mathematical model, I'll try to find it.

--
Dmitry Zamotkin


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Ian Bartholomew-18
In reply to this post by Blair McGlashan
> I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame)
> posted a Dolphin implementation to the group in Jan 98, but I don't know
> whether anyone still has it.

I've got copies of all the original messages and attachments posted to the
various newsgroups.  If anyone wants a copy of this attachment (or any
other) then let me know and I will mail it to them.

Ian


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Ted Bracht-2
Hi Blair and Ian,

"Ian Bartholomew" <[hidden email]> wrote in message
news:Ih1M9.2396$4k6.278584@wards...
> > I see from Ian's archive that Andrew Brault (of Pocket Smalltalk fame)
> > posted a Dolphin implementation to the group in Jan 98, but I don't know
> > whether anyone still has it.
>
> I've got copies of all the original messages and attachments posted to the
> various newsgroups.  If anyone wants a copy of this attachment (or any
> other) then let me know and I will mail it to them.
>

Thanks alot, just what I was looking for.

Ted


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Chris Uppal-3
In reply to this post by Ted Bracht-2
Ted Bracht wrote:

> I seem to recall someone posting a goodie to this ng (quite a while
> back, 2 years or so?) that did string comparison and returned a
> 'score' as to how close the strings were alike.

You may be remembering my post of 2001/5/16 in the "Fuzzy string comparison"
thread, which had an implementation of a Minimum Edit Distance algorithm.

Let me know if you can't find it (assuming that's what you're looking for) and
I'll dig it out.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Chris Uppal-3
In reply to this post by Dmitry Zamotkin-4
Dmitry Zamotkin wrote:

>  matrix := Array new: size + 1.
>  1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ].

BTW, if you arrange your order of scanning cleverly, you can get away with
O(N+M) rather than O(N*M) working space.  You can also avoid (except as a
worst-case) doing O(N*M) work, as the algorithm I just posted a reference to,
shows.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: String comparison

Ted Bracht-2
Hi Chris and Dmitry

"Chris Uppal" <[hidden email]> wrote in message
news:3e00bc3e$0$12109$[hidden email]...
> Dmitry Zamotkin wrote:
>
> >  matrix := Array new: size + 1.
> >  1 to: size+1 do: [ :i | matrix at: i put: (Array new: compSize +1) ].
>

This was the actual thread I remembered. Thanks for pointing me in the right
direction. In this particular case (comparing names) Soundex seems to work
easier though.

Thanks,

Ted