Smalltalk › Pharo › Pharo Smalltalk Users

Small benchmark

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

13 messages Options

Andrea Ferretti

Small benchmark

Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.

I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.

The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here

https://github.com/andreaferretti/kmeans

Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.

To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:

- better algorithms
- clever optimizations, such as, say, invoking native code

I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.

Please, let me know if you find anything
Best,
Andrea

Clément Béra

Re: Small benchmark

Hello Andrea,

The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections. Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.

About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.

For example:

run: points times: times

1 to: times do: [ :i | self run: points ].

I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.

Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.

Now about solutions:

Firstly, the VM is getting faster.

The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance. An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.

In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.

Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.

I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.

Best,

Clement

2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:

Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.

I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.

The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here

https://github.com/andreaferretti/kmeans

Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.

To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:

- better algorithms
- clever optimizations, such as, say, invoking native code

I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.

Please, let me know if you find anything
Best,
Andrea

Sven Van Caekenberghe-2

Re: Small benchmark

> On 17 Feb 2015, at 11:06, Clément Bera <[hidden email]> wrote:
>
> Hello Andrea,
>
> The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections.

I was about to say the same thing.

> Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.
>
>
> About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.
>
> For example:
> run: points times: times
> 1 to: times do: [ :i | self run: points ].
>
> I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.

That is a lot of difference for such a small change.

> Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.
>
>
> Now about solutions:
>
> Firstly, the VM is getting faster.
> The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance.

Super. Thank you, Esteban and of course Eliot for such great work, eventually we'll all be better off thanks to these improvements.

> An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.

I did a quick test (first time I tried Spur) and code loading was spectacularly fast. But the ride is still rough ;-)

> In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.
>
> Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.
>
>
> I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.
>
> Best,
>
> Clement
>
>
>
> 2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
> Hi, a while ago I was evaluating Pharo as a platform for interactive
> data exploration, mining and visualization.
>
> I was fairly impressed by the tools offered by the Pharo distribution,
> but I had a general feeling that the platform was a little slow, so I
> decided to set up a small benchmark, given by an implementation of
> K-means.
>
> The original intention was to compare Pharo to Python (a language that
> is often used in this niche) and Scala (the language that we use in
> production), but since then I have implemented a few other languages
> as well. You can find the benchmark here
>
> https://github.com/andreaferretti/kmeans
>
> Unfortunately, it turns out that Pharo is indeed the slowest among the
> implementations that I have tried. Since I am not an expert on Pharo
> or Smalltalk in general, I am asking advice here to find out if maybe
> I am doing something stupid.
>
> To be clear: the aim is *not* to have an optimized version of Kmeans.
> There are various ways to improve the algorithm that I am using, but I
> am trying to get a feeling for the performance of an algorithm that a
> casual user could implement without much thought while exploring some
> data. So I am not looking for:
>
> - better algorithms
> - clever optimizations, such as, say, invoking native code
>
> I am asking here because there is the real possibility that I am just
> messing something up, and the same naive algorithm, written by someone
> more competent, would show real improvements.
>
> Please, let me know if you find anything
> Best,
> Andrea
>
>

Andrea Ferretti

Re: Small benchmark

Thank you for the quick response! I will try what I get from the 4.0
VM, and of course publish the updated result once Pharo4 is out.

Of course, you can add the benchmark and tweak it for your needs.

Thank you for all the good work you are doing! Reaching a speed near
pypy would be a real game changer!

2015-02-17 11:24 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:

>
>> On 17 Feb 2015, at 11:06, Clément Bera <[hidden email]> wrote:
>>
>> Hello Andrea,
>>
>> The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections.
>
> I was about to say the same thing.
>
>> Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.
>>
>>
>> About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.
>>
>> For example:
>> run: points times: times
>> 1 to: times do: [ :i | self run: points ].
>>
>> I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.
>
> That is a lot of difference for such a small change.
>
>> Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.
>>
>>
>> Now about solutions:
>>
>> Firstly, the VM is getting faster.
>> The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance.
>
> Super. Thank you, Esteban and of course Eliot for such great work, eventually we'll all be better off thanks to these improvements.
>
>> An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.
>
> I did a quick test (first time I tried Spur) and code loading was spectacularly fast. But the ride is still rough ;-)
>
>> In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.
>>
>> Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.
>>
>>
>> I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.
>>
>> Best,
>>
>> Clement
>>
>>
>>
>> 2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
>> Hi, a while ago I was evaluating Pharo as a platform for interactive
>> data exploration, mining and visualization.
>>
>> I was fairly impressed by the tools offered by the Pharo distribution,
>> but I had a general feeling that the platform was a little slow, so I
>> decided to set up a small benchmark, given by an implementation of
>> K-means.
>>
>> The original intention was to compare Pharo to Python (a language that
>> is often used in this niche) and Scala (the language that we use in
>> production), but since then I have implemented a few other languages
>> as well. You can find the benchmark here
>>
>> https://github.com/andreaferretti/kmeans
>>
>> Unfortunately, it turns out that Pharo is indeed the slowest among the
>> implementations that I have tried. Since I am not an expert on Pharo
>> or Smalltalk in general, I am asking advice here to find out if maybe
>> I am doing something stupid.
>>
>> To be clear: the aim is *not* to have an optimized version of Kmeans.
>> There are various ways to improve the algorithm that I am using, but I
>> am trying to get a feeling for the performance of an algorithm that a
>> casual user could implement without much thought while exploring some
>> data. So I am not looking for:
>>
>> - better algorithms
>> - clever optimizations, such as, say, invoking native code
>>
>> I am asking here because there is the real possibility that I am just
>> messing something up, and the same naive algorithm, written by someone
>> more competent, would show real improvements.
>>
>> Please, let me know if you find anything
>> Best,
>> Andrea
>>
>>
>
>

Andrea Ferretti

Re: Small benchmark

Unfortunately, I was not able to run the benchmark at all on the
latest Spur image+VM.

As soon as I try writing anything in the Workspace, I get
PrimitiveFailed: primitive #basicNew: in Array class failed. Since the
stacktrace mentions font rendering, I thought it had something to do
with the fonts not loaded and tried

FreeTypeFontProvider current updateFromSystem

(at least, I was able to paste it) but it seems it has nothing to do
with it. For the records, I am on Ubuntu 14.04 running gnome shell.

Looking forward to the next stable release!

thank you again
Andrea

2015-02-17 11:54 GMT+01:00 Andrea Ferretti <[hidden email]>:

> Thank you for the quick response! I will try what I get from the 4.0
> VM, and of course publish the updated result once Pharo4 is out.
>
> Of course, you can add the benchmark and tweak it for your needs.
>
> Thank you for all the good work you are doing! Reaching a speed near
> pypy would be a real game changer!
>
> 2015-02-17 11:24 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:
>>
>>> On 17 Feb 2015, at 11:06, Clément Bera <[hidden email]> wrote:
>>>
>>> Hello Andrea,
>>>
>>> The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections.
>>
>> I was about to say the same thing.
>>
>>> Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.
>>>
>>>
>>> About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.
>>>
>>> For example:
>>> run: points times: times
>>> 1 to: times do: [ :i | self run: points ].
>>>
>>> I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.
>>
>> That is a lot of difference for such a small change.
>>
>>> Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.
>>>
>>>
>>> Now about solutions:
>>>
>>> Firstly, the VM is getting faster.
>>> The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance.
>>
>> Super. Thank you, Esteban and of course Eliot for such great work, eventually we'll all be better off thanks to these improvements.
>>
>>> An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.
>>
>> I did a quick test (first time I tried Spur) and code loading was spectacularly fast. But the ride is still rough ;-)
>>
>>> In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.
>>>
>>> Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.
>>>
>>>
>>> I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.
>>>
>>> Best,
>>>
>>> Clement
>>>
>>>
>>>
>>> 2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
>>> Hi, a while ago I was evaluating Pharo as a platform for interactive
>>> data exploration, mining and visualization.
>>>
>>> I was fairly impressed by the tools offered by the Pharo distribution,
>>> but I had a general feeling that the platform was a little slow, so I
>>> decided to set up a small benchmark, given by an implementation of
>>> K-means.
>>>
>>> The original intention was to compare Pharo to Python (a language that
>>> is often used in this niche) and Scala (the language that we use in
>>> production), but since then I have implemented a few other languages
>>> as well. You can find the benchmark here
>>>
>>> https://github.com/andreaferretti/kmeans
>>>
>>> Unfortunately, it turns out that Pharo is indeed the slowest among the
>>> implementations that I have tried. Since I am not an expert on Pharo
>>> or Smalltalk in general, I am asking advice here to find out if maybe
>>> I am doing something stupid.
>>>
>>> To be clear: the aim is *not* to have an optimized version of Kmeans.
>>> There are various ways to improve the algorithm that I am using, but I
>>> am trying to get a feeling for the performance of an algorithm that a
>>> casual user could implement without much thought while exploring some
>>> data. So I am not looking for:
>>>
>>> - better algorithms
>>> - clever optimizations, such as, say, invoking native code
>>>
>>> I am asking here because there is the real possibility that I am just
>>> messing something up, and the same naive algorithm, written by someone
>>> more competent, would show real improvements.
>>>
>>> Please, let me know if you find anything
>>> Best,
>>> Andrea
>>>
>>>
>>
>>

Clément Béra

Re: Small benchmark

2015-02-17 12:29 GMT+01:00 Andrea Ferretti <[hidden email]>:

Unfortunately, I was not able to run the benchmark at all on the
latest Spur image+VM.

As soon as I try writing anything in the Workspace, I get
PrimitiveFailed: primitive #basicNew: in Array class failed. Since the
stacktrace mentions font rendering, I thought it had something to do
with the fonts not loaded and tried

FreeTypeFontProvider current updateFromSystem

(at least, I was able to paste it) but it seems it has nothing to do
with it. For the records, I am on Ubuntu 14.04 running gnome shell.

I am on Mac OS X 10.8 and it works for me.

I had once this bug but when I reopened the Workspace it did not happen again and I could not reproduce it anymore.

Well this VM is an alpha version. Esteban is changing it very often so a bug may have been introduced. The stable release should work better for sure...

Looking forward to the next stable release!

thank you again
Andrea

2015-02-17 11:54 GMT+01:00 Andrea Ferretti <[hidden email]>:
> Thank you for the quick response! I will try what I get from the 4.0
> VM, and of course publish the updated result once Pharo4 is out.
>
> Of course, you can add the benchmark and tweak it for your needs.
>
> Thank you for all the good work you are doing! Reaching a speed near
> pypy would be a real game changer!
>
> 2015-02-17 11:24 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:
>>
>>> On 17 Feb 2015, at 11:06, Clément Bera <[hidden email]> wrote:
>>>
>>> Hello Andrea,
>>>
>>> The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections.
>>
>> I was about to say the same thing.
>>
>>> Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.
>>>
>>>
>>> About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.
>>>
>>> For example:
>>> run: points times: times
>>> 1 to: times do: [ :i | self run: points ].
>>>
>>> I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.
>>
>> That is a lot of difference for such a small change.
>>
>>> Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.
>>>
>>>
>>> Now about solutions:
>>>
>>> Firstly, the VM is getting faster.
>>> The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance.
>>
>> Super. Thank you, Esteban and of course Eliot for such great work, eventually we'll all be better off thanks to these improvements.
>>
>>> An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.
>>
>> I did a quick test (first time I tried Spur) and code loading was spectacularly fast. But the ride is still rough ;-)
>>
>>> In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.
>>>
>>> Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.
>>>
>>>
>>> I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.
>>>
>>> Best,
>>>
>>> Clement
>>>
>>>
>>>
>>> 2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
>>> Hi, a while ago I was evaluating Pharo as a platform for interactive
>>> data exploration, mining and visualization.
>>>
>>> I was fairly impressed by the tools offered by the Pharo distribution,
>>> but I had a general feeling that the platform was a little slow, so I
>>> decided to set up a small benchmark, given by an implementation of
>>> K-means.
>>>
>>> The original intention was to compare Pharo to Python (a language that
>>> is often used in this niche) and Scala (the language that we use in
>>> production), but since then I have implemented a few other languages
>>> as well. You can find the benchmark here
>>>
>>> https://github.com/andreaferretti/kmeans
>>>
>>> Unfortunately, it turns out that Pharo is indeed the slowest among the
>>> implementations that I have tried. Since I am not an expert on Pharo
>>> or Smalltalk in general, I am asking advice here to find out if maybe
>>> I am doing something stupid.
>>>
>>> To be clear: the aim is *not* to have an optimized version of Kmeans.
>>> There are various ways to improve the algorithm that I am using, but I
>>> am trying to get a feeling for the performance of an algorithm that a
>>> casual user could implement without much thought while exploring some
>>> data. So I am not looking for:
>>>
>>> - better algorithms
>>> - clever optimizations, such as, say, invoking native code
>>>
>>> I am asking here because there is the real possibility that I am just
>>> messing something up, and the same naive algorithm, written by someone
>>> more competent, would show real improvements.
>>>
>>> Please, let me know if you find anything
>>> Best,
>>> Andrea
>>>
>>>
>>
>>

Thierry Goubier

Re: Small benchmark

Hi Clement, Eliot,

this update prompted me to try as well Spur + Pharo, so I set up some sort of environment for that.

I too had a FreeType related bug, solved by replacing the Source Code Pro font setting by Source Sans Pro in the code panes, but, otherwise it seems fine, opens faster, etc... I'm happy :) (platform is Ubuntu 14.10 64bits)

Now I need to finish a Makefile for downloading all the latest sucessfull things from Jenkins. I have the image, now I need to setup for the vm as well.

Thierry

2015-02-17 13:14 GMT+01:00 Clément Bera <[hidden email]>:

2015-02-17 12:29 GMT+01:00 Andrea Ferretti <[hidden email]>:
Unfortunately, I was not able to run the benchmark at all on the
latest Spur image+VM.

As soon as I try writing anything in the Workspace, I get
PrimitiveFailed: primitive #basicNew: in Array class failed. Since the
stacktrace mentions font rendering, I thought it had something to do
with the fonts not loaded and tried

FreeTypeFontProvider current updateFromSystem

(at least, I was able to paste it) but it seems it has nothing to do
with it. For the records, I am on Ubuntu 14.04 running gnome shell.

I am on Mac OS X 10.8 and it works for me.

I had once this bug but when I reopened the Workspace it did not happen again and I could not reproduce it anymore.

Well this VM is an alpha version. Esteban is changing it very often so a bug may have been introduced. The stable release should work better for sure...

Looking forward to the next stable release!

thank you again
Andrea

2015-02-17 11:54 GMT+01:00 Andrea Ferretti <[hidden email]>:
> Thank you for the quick response! I will try what I get from the 4.0
> VM, and of course publish the updated result once Pharo4 is out.
>
> Of course, you can add the benchmark and tweak it for your needs.
>
> Thank you for all the good work you are doing! Reaching a speed near
> pypy would be a real game changer!
>
> 2015-02-17 11:24 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:
>>
>>> On 17 Feb 2015, at 11:06, Clément Bera <[hidden email]> wrote:
>>>
>>> Hello Andrea,
>>>
>>> The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections.
>>
>> I was about to say the same thing.
>>
>>> Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.
>>>
>>>
>>> About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.
>>>
>>> For example:
>>> run: points times: times
>>> 1 to: times do: [ :i | self run: points ].
>>>
>>> I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.
>>
>> That is a lot of difference for such a small change.
>>
>>> Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.
>>>
>>>
>>> Now about solutions:
>>>
>>> Firstly, the VM is getting faster.
>>> The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance.
>>
>> Super. Thank you, Esteban and of course Eliot for such great work, eventually we'll all be better off thanks to these improvements.
>>
>>> An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.
>>
>> I did a quick test (first time I tried Spur) and code loading was spectacularly fast. But the ride is still rough ;-)
>>
>>> In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.
>>>
>>> Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.
>>>
>>>
>>> I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.
>>>
>>> Best,
>>>
>>> Clement
>>>
>>>
>>>
>>> 2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
>>> Hi, a while ago I was evaluating Pharo as a platform for interactive
>>> data exploration, mining and visualization.
>>>
>>> I was fairly impressed by the tools offered by the Pharo distribution,
>>> but I had a general feeling that the platform was a little slow, so I
>>> decided to set up a small benchmark, given by an implementation of
>>> K-means.
>>>
>>> The original intention was to compare Pharo to Python (a language that
>>> is often used in this niche) and Scala (the language that we use in
>>> production), but since then I have implemented a few other languages
>>> as well. You can find the benchmark here
>>>
>>> https://github.com/andreaferretti/kmeans
>>>
>>> Unfortunately, it turns out that Pharo is indeed the slowest among the
>>> implementations that I have tried. Since I am not an expert on Pharo
>>> or Smalltalk in general, I am asking advice here to find out if maybe
>>> I am doing something stupid.
>>>
>>> To be clear: the aim is *not* to have an optimized version of Kmeans.
>>> There are various ways to improve the algorithm that I am using, but I
>>> am trying to get a feeling for the performance of an algorithm that a
>>> casual user could implement without much thought while exploring some
>>> data. So I am not looking for:
>>>
>>> - better algorithms
>>> - clever optimizations, such as, say, invoking native code
>>>
>>> I am asking here because there is the real possibility that I am just
>>> messing something up, and the same naive algorithm, written by someone
>>> more competent, would show real improvements.
>>>
>>> Please, let me know if you find anything
>>> Best,
>>> Andrea
>>>
>>>
>>
>>

abergel

Re: Small benchmark

In reply to this post by Clément Béra

This is a fantastic answer. Thanks Clement!

Alexandre

Le 17 févr. 2015 à 06:06, Clément Bera <[hidden email]> a écrit :

Hello Andrea,

The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections. Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.

About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.

For example:
run: points times: times
1 to: times do: [ :i | self run: points ].

I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.

Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.

Now about solutions:

Firstly, the VM is getting faster.
The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance. An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.
In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.

Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.

I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.

Best,

Clement

2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:
Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.

I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.

The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here

https://github.com/andreaferretti/kmeans

Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.

To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:

- better algorithms
- clever optimizations, such as, say, invoking native code

I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.

Please, let me know if you find anything
Best,
Andrea

SergeStinckwich

Re: Small benchmark

In reply to this post by Andrea Ferretti

Dear Andrea,
thank you for your message. This is something really interesting for the sci-smalltalk project:
https://github.com/SergeStinckwich/SciSmalltalk

Do you mind to add your k-means code in Sci-Smalltalk distribution?

Please feel free to join the sci-smalltalk for future discussions :
https://groups.google.com/forum/#!forum/scismalltalk

Regards,

stepharo

Re: Small benchmark

I thought that didier provided k-means clustering already.

Le 18/2/15 11:24, SergeStinckwich a écrit :

> Dear Andrea,
> thank you for your message. This is something really interesting for the
> sci-smalltalk project:
> https://github.com/SergeStinckwich/SciSmalltalk
>
> Do you mind to add your k-means code in Sci-Smalltalk distribution?
>
> Please feel free to join the sci-smalltalk for future discussions :
> https://groups.google.com/forum/#!forum/scismalltalk
>
> Regards,
>
>
>
>
> --
> View this message in context: http://forum.world.st/Small-benchmark-tp4806092p4806309.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>
>

Andrea Ferretti

Re: Small benchmark

For me there is no problem in including kmeans, but I should warn you
that this implementation is far from optimal, for various reasons.

First, I have reused the class Point, while it would make more sense
to do something that works in any dimension.

Second, these examples are (intentionally) written in a particularly
naive way. The main reason why is that I wanted to benchmark the speed
of Pharo for interactive work. I assumed that during an interactive
session, one would not bother to write an optimized implementation of
an algorithm, but just use the collection's `groupBy:` method; also I
did not want to make anything specific to kmeans, because I wanted the
comparison to be meaningful for other algorithms where kmeans-specific
optmizations would not
apply.

In the specific case of kmeans, there are a few easy optimizations to do:

- first, since sqrt is increasing, all comparisons can be made with
the squared distance, saving the sqrt operation
- moreover, the main step of the algorithm requires to group points by
their nearest centroid. In general a group-by operation requires a
hashmap, but in this case there are a finite, known number of choices
for the nearest centroid. It is more performant to just keep the index
of the nearest centroid for each point, thereby avoiding to build a
hashmap. In fact, it is more performant to accumulate the sum and the
count of the points near the centroid while traversing, and then just
perform a ratio at the end of the step.

2015-02-18 20:09 GMT+01:00 stepharo <[hidden email]>:

> I thought that didier provided k-means clustering already.
>
> Le 18/2/15 11:24, SergeStinckwich a écrit :
>
>> Dear Andrea,
>> thank you for your message. This is something really interesting for the
>> sci-smalltalk project:
>> https://github.com/SergeStinckwich/SciSmalltalk
>>
>> Do you mind to add your k-means code in Sci-Smalltalk distribution?
>>
>> Please feel free to join the sci-smalltalk for future discussions :
>> https://groups.google.com/forum/#!forum/scismalltalk
>>
>> Regards,
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/Small-benchmark-tp4806092p4806309.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>>
>>
>
>

stepharo

Re: Small benchmark

In reply to this post by Clément Béra

Thanks clement for this great post.
I'm really happy that we could put money aside (thanks synectique) to pay you a PhD.

Stef

Le 17/2/15 11:06, Clément Bera a écrit :

Hello Andrea,

The way you wrote you algorithm is nice but makes extensive use of closures and iterates a lot over collections. Those are two aspects where the performance of Pharo have issues. Eliot Miranda and myself are working especially on those 2 cases to improve Pharo performance. If you don't mind, I will add your algorithm to the benchmarks we use because it really makes extensive uses of cases we are trying to optimize so its results on the bleeding edge VM are very encouraging.

About your implementation, someone familiar with Pharo may change #timesRepeat: by #to:do: in the 2 places you use it.

For example:

run: points times: times

1 to: times do: [ :i | self run: points ].

I don't believe it makes it really harder to read but depending on the number of times you're using, it may show some real improvements because #to:do: is optimized at compile-time, though I tried and I got a -15% on the overall time to run only in the bleeding edge VM.

Another thing is that #groupedBy: is almost never used in the system and it's really *not* optimized at all. Maybe another collection protocol is better and not less readable, I don't know.

Now about solutions:

Firstly, the VM is getting faster.

The Pharo 4 VM, to be released in July 2015, should be at least 2x faster than now. I tried it on your benchmark, and I got 5352.7 instead of 22629.1 on my machine, which is over x4 performance boost, and which put Pharo between factor and clojure performance. An alpha release is available here: https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image changed too). You should be able to load your code, try your benchmark and have a similar result.

In addition, we're working on making the VM again much faster on benchmarks like yours in Pharo 5. We hope to have an alpha release this summer but we don't know if it will be ready for sure. For this second step, I'm at a point where I can barely run a bench without a crash, so I can't tell right now the exact performance you can expect, but except if there's a miracle it should be somewhere between pypy and scala performance (It'll reach full performance once it gets more mature and not at first release anyway). Now I don't think we'll reach any time soon the performance of languages such as nim or rust. They're very different from Pharo: direct compilation to machine code, many low level types, ... I'm not even sure a Java implementation could compete with them.

Secondly, you can use bindings to native code instead. I showed here how to write the code in C and bind it with a simple callout, which may be what you need for your bench: https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/ . Now this way of calling C does not work on the latest VM. There are 3 existing frameworks to call C from Pharo, all having pros and cons, we're trying to unify them but it's taking time. I believe for the July release of Pharo 4 there will be an official recommended way of calling C and that's the one you should use.

I hope I wrote you a satisfying answer :-). I'm glad some people are deeply interested in Pharo performance.

Best,

Clement

2015-02-17 9:03 GMT+01:00 Andrea Ferretti <[hidden email]>:

Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.

I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.

The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here

https://github.com/andreaferretti/kmeans

Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.

To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:

- better algorithms
- clever optimizations, such as, say, invoking native code

I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.

Please, let me know if you find anything
Best,
Andrea

SergeStinckwich

Re: Small benchmark

In reply to this post by stepharo

Yes apparently there is K-cluster algorithm in DHB.
Too much stuff :-)

On Wed, Feb 18, 2015 at 8:09 PM, stepharo <[hidden email]> wrote:

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/