Hello Andrea,
The way you wrote you algorithm is nice but makes extensive
use of closures and iterates a lot over collections. Those are
two aspects where the performance of Pharo have issues. Eliot
Miranda and myself are working especially on those 2 cases to
improve Pharo performance. If you don't mind, I will add your
algorithm to the benchmarks we use because it really makes
extensive uses of cases we are trying to optimize so its
results on the bleeding edge VM are very encouraging.
About your implementation, someone familiar with Pharo may
change #timesRepeat: by #to:do: in the 2 places you use it.
For example:
run: points times: times
1 to: times do:
[ :i | self run: points ].
I don't believe it makes it really harder to read but
depending on the number of times you're using, it may show
some real improvements because #to:do: is optimized at
compile-time, though I tried and I got a -15% on the overall
time to run only in the bleeding edge VM.
Another thing is that #groupedBy: is almost never used in the
system and it's really *not* optimized at all. Maybe another
collection protocol is better and not less readable, I don't
know.
Now about solutions:
Firstly, the VM is getting faster.
The Pharo 4 VM, to be released in July 2015, should
be at least 2x faster than now. I tried it on your benchmark,
and I got 5352.7 instead of 22629.1 on my machine, which is
over x4 performance boost, and which put Pharo between factor
and clojure performance. An alpha release is available here:
https://ci.inria.fr/pharo/view/4.0-VM-Spur/
. You need to use PharoVM-spur32 as a VM and Pharo-spur32 as
an image (Yes, the image changed too). You should be able to
load your code, try your benchmark and have a similar result.
In addition, we're working on making the VM again
much faster on benchmarks like yours in Pharo 5. We hope to
have an alpha release this summer but we don't know if it will
be ready for sure. For this second step, I'm at a point where
I can barely run a bench without a crash, so I can't tell
right now the exact performance you can expect, but except if
there's a miracle it should be somewhere between pypy and
scala performance (It'll reach full performance once it gets
more mature and not at first release anyway). Now I don't
think we'll reach any time soon the performance of languages
such as nim or rust. They're very different from Pharo: direct
compilation to machine code, many low level types, ... I'm not
even sure a Java implementation could compete with them.
Secondly, you can use bindings to native code instead. I
showed here how to write the code in C and bind it with a
simple callout, which may be what you need for your bench:
https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/
. Now this way of calling C does not work on the latest VM.
There are 3 existing frameworks to call C from Pharo, all
having pros and cons, we're trying to unify them but it's
taking time. I believe for the July release of Pharo 4 there
will be an official recommended way of calling C and that's
the one you should use.
I hope I wrote you a satisfying answer :-). I'm glad some
people are deeply interested in Pharo performance.
Best,
Clement