Smalltalk - Re: Small benchmark

Smalltalk › Pharo › Pharo Smalltalk Users

Re: Small benchmark

Posted by Andrea Ferretti on Feb 18, 2015; 7:16pm
URL: https://forum.world.st/Small-benchmark-tp4806092p4806394.html

For me there is no problem in including kmeans, but I should warn you
that this implementation is far from optimal, for various reasons.

First, I have reused the class Point, while it would make more sense
to do something that works in any dimension.

Second, these examples are (intentionally) written in a particularly
naive way. The main reason why is that I wanted to benchmark the speed
of Pharo for interactive work. I assumed that during an interactive
session, one would not bother to write an optimized implementation of
an algorithm, but just use the collection's `groupBy:` method; also I
did not want to make anything specific to kmeans, because I wanted the
comparison to be meaningful for other algorithms where kmeans-specific
optmizations would not
apply.

In the specific case of kmeans, there are a few easy optimizations to do:

- first, since sqrt is increasing, all comparisons can be made with
the squared distance, saving the sqrt operation
- moreover, the main step of the algorithm requires to group points by
their nearest centroid. In general a group-by operation requires a
hashmap, but in this case there are a finite, known number of choices
for the nearest centroid. It is more performant to just keep the index
of the nearest centroid for each point, thereby avoiding to build a
hashmap. In fact, it is more performant to accumulate the sum and the
count of the points near the centroid while traversing, and then just
perform a ratio at the end of the step.

2015-02-18 20:09 GMT+01:00 stepharo <[hidden email]>:

> I thought that didier provided k-means clustering already.
>
> Le 18/2/15 11:24, SergeStinckwich a écrit :
>
>> Dear Andrea,
>> thank you for your message. This is something really interesting for the
>> sci-smalltalk project:
>> https://github.com/SergeStinckwich/SciSmalltalk
>>
>> Do you mind to add your k-means code in Sci-Smalltalk distribution?
>>
>> Please feel free to join the sci-smalltalk for future discussions :
>> https://groups.google.com/forum/#!forum/scismalltalk
>>
>> Regards,
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/Small-benchmark-tp4806092p4806309.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>>
>>
>
>