Some Performance Numbers: Java vs. CogVM vs. SOM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi:

Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running).

I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1].

The set contains the following benchmarks:
 - DeltaBlue
 - Richards
 - GraphSearch (search in a graph data structure)
 - Json (a minimal JSON parser benchmark)
 - PageRank (a page rank algorithm implementation)
 -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch
 -- Permute, Queens, Sieve, Storage, Towers

The Java implementations are here [2] and the SOM implementations here [3].

Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs.

The Results
———————————

The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks.

              geomean   min  max
Java 8          1.0      1.0    1.0
latest PharoVM 12.9      2.5  182.4 (not sure which exact version of the CogVM that is)
TruffleSOM      2.3      1.0    4.9
RTruffleSOM     3.0      1.5   11.5

TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework.
RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy).

So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue.
Compared to the CogVM, my SOM implementations are doing a little better :)


Another interesting data point is the pure interpreter performance:

              geomean    min   max
Java 8 interp   1.0      1.0    1.0
PharoVM Stack   1.6      0.5   15.3 (not sure which exact version of the StackVM that is)
TruffleSOM      6.3      1.9   15.7
RTruffleSOM     5.6      1.6   15.7

What we see here is that the StackVM is actually sometimes faster than the Java interpreter.
While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers.


Well, that’s it for the moment.
I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista.


And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;)

Best regards
Stefan


[1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4
[2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som
[3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi:

Now with the numbers for the Spur VM (3306):

             geomean   min  max
Java 8          1.0      1.0    1.0
latest PharoVM 12.9      2.5  182.4 (not sure which exact version of the CogVM that is)
Spur 3306       9.4      2.3  139.2
TruffleSOM      2.3      1.0    4.9
RTruffleSOM     3.0      1.5   11.5

You’re probably interested in the difference between Spur and non-Spur VM.
Here the speedup of Spur for the different benchmarks:

Bounce 9%
BubbleSort 34%
DeltaBlue 10%
Fannkuch 25%
GraphSearch 10%
Json 9%
Mandelbrot 54%
NBody 50%
PageRank 24%
Permute 38%
Queens 20%
QuickSort 28%
Richards -7%
Sieve 40%
Storage 36%
Towers 26%

Best regards
Stefan

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Eliot Miranda-2
In reply to this post by Stefan Marr-3
 
Hi Stefan,

On Mon, Apr 6, 2015 at 3:12 AM, Stefan Marr <[hidden email]> wrote:

Hi:

Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running).

I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1].

The set contains the following benchmarks:
 - DeltaBlue
 - Richards
 - GraphSearch (search in a graph data structure)
 - Json (a minimal JSON parser benchmark)
 - PageRank (a page rank algorithm implementation)
 -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch
 -- Permute, Queens, Sieve, Storage, Towers

The Java implementations are here [2] and the SOM implementations here [3].

Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs.

The Results
———————————

The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks.

              geomean   min  max
Java 8          1.0      1.0    1.0
latest PharoVM 12.9      2.5  182.4 (not sure which exact version of the CogVM that is)
TruffleSOM      2.3      1.0    4.9
RTruffleSOM     3.0      1.5   11.5

TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework.
RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy).

So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue.
Compared to the CogVM, my SOM implementations are doing a little better :)


Another interesting data point is the pure interpreter performance:

              geomean    min   max
Java 8 interp   1.0      1.0    1.0
PharoVM Stack   1.6      0.5   15.3 (not sure which exact version of the StackVM that is)
TruffleSOM      6.3      1.9   15.7
RTruffleSOM     5.6      1.6   15.7

What we see here is that the StackVM is actually sometimes faster than the Java interpreter.
While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers.


Well, that’s it for the moment.
I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista.

No response to specifics yet because Clément and I are busy with Sista internals but YES!  Thank you *very much*!  As soon as possible we'll be looking at this in detail.

And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;)

Best regards
Stefan


[1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4
[2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som
[3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/

--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Levente Uzonyi-2
In reply to this post by Stefan Marr-3
 
I've checked the code, and ran some benchmarks in Squeak. When I tried to
load the code, I got complaints from the system, because the name of class
variables begin with lowercase letters.
ScriptCollector is also missing from Squeak, though it's easy to work this
around.
There are still plenty of #% sends in the code, which I had to rewrite to
#\\.
The PageRank benchmark is so slow that I stopped running it after about 30
minutes. The profiler shows, that it spends over 95% of the time in
SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that
PRNG, but it's still way to slow. One can consider this as a weakness of
the system, but it's also a weak point of the benchmark, that relies so
heavily on a fast PRNG implementation. The code is also pretty bad,
because it uses only a few bits out of the generated 32, and it has to
fight with the signed result. Whoever came up with using that "PRNG"
hasn't really put much thought in it...
I tried it with another PRNG which is another 6x faster (so the overall
speed is 36x of the original version), but that's still way too slow.
Squeak is rather slow here. An optimized PRNG written in C generates about
3 magnitudes more random bits than an optimized PRNG in Squeak at the same
time.


About porting:
I don't know what your original goal was, but I don't see why you would
keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and
this definitely has a negative impact on the Smalltalk results. If you
were to port code from Smalltalk to Java, would you keep the 1-based
indexes?

Another thing is about types:
The [1 - SomPageRank DFactor / n] expression is calculated in an n*n
loop during the PageRank benchmark, where n is a constant of the
benchmark, and SomPageRank DFactor is also a constant - 0.85. Let's see
how this adds to the runtime:

n := 100.
[ 1 - SomPageRank DFactor / n ] bench. '5,110,000 per second. 196 nanoseconds per run.'.
[ 1.0 - SomPageRank DFactor / n ] bench. '23,500,000 per second. 42.5 nanoseconds per run.'.
nFloat := n asFloat.
[ 1.0 - 0.85 / nFloat ] bench. '26,000,000 per second. 38.5 nanoseconds per run.'.
[ 0.15 / nFloat ] bench. '41,400,000 per second. 24.2 nanoseconds per run.'.
[ 0.0015 ] bench. '118,000,000 per second. 8.46 nanoseconds per run.'.
[] bench. '125,000,000 per second. 8.01 nanoseconds per run.'

So the code is calculating the same constant over and over again. Due to
type conversions this is about 25x slower than using a precalculated
constant (and ~5x slower than the code with proper types).
Of course an adaptive optimizer could optimize this, but the same applies
to any programmer who care about performance.

The corresponding java code is:

private static double D_FACTOR = 0.85; // damping factor

and

((1 - D_FACTOR)/n)

If I were to port this, and I would want to stick to the implementation,
then D_FACTOR would be a class variable, and the code would read as: [1.0
- DFactor / n]. But knowing that the constant is not used anywhere else, I
see no problem with precalculating the value.

Levente

On Mon, 6 Apr 2015, Stefan Marr wrote:

>
> Hi:
>
> Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running).
>
> I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1].
>
> The set contains the following benchmarks:
> - DeltaBlue
> - Richards
> - GraphSearch (search in a graph data structure)
> - Json (a minimal JSON parser benchmark)
> - PageRank (a page rank algorithm implementation)
> -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch
> -- Permute, Queens, Sieve, Storage, Towers
>
> The Java implementations are here [2] and the SOM implementations here [3].
>
> Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs.
>
> The Results
> ———————————
>
> The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks.
>
>              geomean   min  max
> Java 8          1.0      1.0    1.0
> latest PharoVM 12.9      2.5  182.4 (not sure which exact version of the CogVM that is)
> TruffleSOM      2.3      1.0    4.9
> RTruffleSOM     3.0      1.5   11.5
>
> TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework.
> RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy).
>
> So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue.
> Compared to the CogVM, my SOM implementations are doing a little better :)
>
>
> Another interesting data point is the pure interpreter performance:
>
>              geomean    min   max
> Java 8 interp   1.0      1.0    1.0
> PharoVM Stack   1.6      0.5   15.3 (not sure which exact version of the StackVM that is)
> TruffleSOM      6.3      1.9   15.7
> RTruffleSOM     5.6      1.6   15.7
>
> What we see here is that the StackVM is actually sometimes faster than the Java interpreter.
> While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers.
>
>
> Well, that’s it for the moment.
> I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista.
>
>
> And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;)
>
> Best regards
> Stefan
>
>
> [1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4
> [2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som
> [3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks
>
> --
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi Levente:

My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code.
It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code.

> On 06 Apr 2015, at 22:24, Levente Uzonyi <[hidden email]> wrote:
>
> I've checked the code, and ran some benchmarks in Squeak. When I tried to load the code, I got complaints from the system, because the name of class variables begin with lowercase letters.
> ScriptCollector is also missing from Squeak, though it's easy to work this
> around.
> There are still plenty of #% sends in the code, which I had to rewrite to #\\.

Yes, I did not commit the Squeak compatible code. It’s only in my image.
If someone wants it, I could probably upload it somewhere.


> The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it...

Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work.

> About porting:
> I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes?

The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes.


> Another thing is about types:
> [1 - SomPageRank DFactor / n]

> private static double D_FACTOR = 0.85; // damping factor
>
> If I were to port this, and I would want to stick to the implementation, then D_FACTOR would be a class variable, and the code would read as: [1.0 - DFactor / n]. But knowing that the constant is not used anywhere else, I see no problem with precalculating the value.

That’s the job of the optimizer/JIT compiler. And, also SOM doesn’t have class variables.
So, I’ll stick with the current code. That’s a puzzle for Sista to solve. Not something I would want to change in the code.

You might be tempted to do such optimizations, because you know the currently limits of the execution mechanisms. But for me, those benchmarks are about measuring how good the VMs optimize and not, how good you know your VMs.

Best regards
Stefan

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Michael Haupt-3

Hi Stefan,

really interesting; thanks.

> Am 06.04.2015 um 22:50 schrieb Stefan Marr <[hidden email]>:
> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code.

it might seem the pure interpreter comparison adds little value then, but it's still interesting. Can you share some more insight about the characteristics of the benchmarks with the "surprising" results in interpreter-only mode?

> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes.

The indexing base really shouldn't matter to the compiler.


Best,

Michael
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi Michael:

> On 06 Apr 2015, at 23:06, Michael Haupt <[hidden email]> wrote:
>
>> Am 06.04.2015 um 22:50 schrieb Stefan Marr <[hidden email]>:
>> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code.
>
> it might seem the pure interpreter comparison adds little value then, but it’s still interesting.

It depends. For the self-optimizing interpreters it is also interesting to see the effect of the optimizations. And, for Sista, and the new bytecode set with quickening, this will also be interesting.
In general, I think interpreter performance is relevant for warmup and average application performance.
But I am still looking for a better way of measuring that more directly.

> Can you share some more insight about the characteristics of the benchmarks with the “surprising" results in interpreter-only mode?

So, the faster than Java benchmarks were DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers.
Honestly, I don’t know whether there is one common characteristic between them. I wouldn’t normally think so.
NBody is mostly object field access and floating point operations. Storage is a tree of arrays, rather a GC benchmark. Permute is very array heavy. It could well be that the Smalltalk bytecode set is just slightly more optimized for the relevant performance critical operations in those benchmarks. However, the average is still about 60% slower. So, on the whole set of benchmarks, the Java interpreter is still better. I would say, overall, it only says that the interpreter is pretty well optimized.


>> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes.
>
> The indexing base really shouldn’t matter to the compiler.

Right, Graal as well as RPython take care of that nicely. Perhaps Sista will some day, too.

Best regards
Stefan



--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Levente Uzonyi-2
In reply to this post by Stefan Marr-3
 
On Mon, 6 Apr 2015, Stefan Marr wrote:

>
> Hi Levente:
>
> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code.
> It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code.

In that case maybe the best is to leave everything as it is, but then
the results of these benchmarks are hardly comparable between the ports.

>
>> On 06 Apr 2015, at 22:24, Levente Uzonyi <[hidden email]> wrote:
>>
>> I've checked the code, and ran some benchmarks in Squeak. When I tried to load the code, I got complaints from the system, because the name of class variables begin with lowercase letters.
>> ScriptCollector is also missing from Squeak, though it's easy to work this
>> around.
>> There are still plenty of #% sends in the code, which I had to rewrite to #\\.
>
> Yes, I did not commit the Squeak compatible code. It’s only in my image.
> If someone wants it, I could probably upload it somewhere.
AFAIK Eliot is using Squeak for the development of Cog, so I guess he'll
need it.
Does Pharo have a #% method?

>
>
>> The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it...
>
> Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work.

It's just 6x faster and uses ThirtyTwoBitRegister. If you're still
interested in it, I'll upload it somewhere, along with the missing method
of ThirtyTwoBitRegister.

>
>> About porting:
>> I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes?
>
> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes.

The methods I found in the PageRank benchmark are easy to port to 1-based
index. I did that in my image.

>
>
>> Another thing is about types:
>> [1 - SomPageRank DFactor / n]
>
>> private static double D_FACTOR = 0.85; // damping factor
>>
>> If I were to port this, and I would want to stick to the implementation, then D_FACTOR would be a class variable, and the code would read as: [1.0 - DFactor / n]. But knowing that the constant is not used anywhere else, I see no problem with precalculating the value.
>
> That’s the job of the optimizer/JIT compiler. And, also SOM doesn’t have class variables.
> So, I’ll stick with the current code. That’s a puzzle for Sista to solve. Not something I would want to change in the code.
>
> You might be tempted to do such optimizations, because you know the currently limits of the execution mechanisms. But for me, those benchmarks are about measuring how good the VMs optimize and not, how good you know your VMs.
Well, then you shouldn't tell Java that the variable is "private static
constant", because the VM should be able to find that out. :)

Levente

>
> Best regards
> Stefan
>
> --
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Levente Uzonyi-2
 
Answering some of my own questions:

Yes, Pharo has a #% method.

Yes, it's possible to reproduce the Jenkins hash based PRNG with Nativeboost[1].
It's about 10x faster than the one with ThirtyTwoBitRegister. So the
overall speedup is ~60x compared to the pure Smalltalk one.

No, the code doesn't work out of the box in Pharo. I had to change
SomAll's superclass from Object to SomBenchmarkHarness.

No, SomPageRank doesn't run on Pharo either. It gets into an infinite
loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n
= 1, and that makes the following loop infinite:

  k := SomJenkinsRandom random abs % n.
         [i = k] whileTrue: [
           k := SomJenkinsRandom random abs % n.
         ].

Levente

[1] http://leves.web.elte.hu/squeak/NBJenkinsRandom.st
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi Levente:

> On 07 Apr 2015, at 22:57, Levente Uzonyi <[hidden email]> wrote:
>
> Answering some of my own questions:

Sorry, didn’t find time earlier.

> No, the code doesn't work out of the box in Pharo. I had to change SomAll’s superclass from Object to SomBenchmarkHarness.

Ehm, I think I did not even port SomAll. That’s an empty class?

> No, SomPageRank doesn't run on Pharo either. It gets into an infinite loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n = 1, and that makes the following loop infinite:

That’s a bug in the original code. Yes. I didn’t fix it.

Not sure how you even execute the code, since I didn’t document anything… Sorry.
But, the general idea is to execute it from the command line.

From within the image, you can execute it with:
   SomBenchmarkHarness new run: { ''. 'PageRank'. '10'. '0'. ‘5' }.

On the command line it would look something like:
   $yourVM $yourImage SomBenchmarkHarness  PageRank 500 0 1

(Assuming that the Scripting package is loaded).
Sorry, am currently traveling. Will try to create a proper configuration when I am back.

Best regards
Stefan


--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3
In reply to this post by Levente Uzonyi-2

Hi Levente:

> On 07 Apr 2015, at 16:58, Levente Uzonyi <[hidden email]> wrote:
>
> On Mon, 6 Apr 2015, Stefan Marr wrote:
>
>> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code.
>> It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code.
>
> In that case maybe the best is to leave everything as it is, but then the results of these benchmarks are hardly comparable between the ports.

Cross-language comparisons are always fishy… However, on platforms with ‘sufficiently smart compilers’, the results are a useful indication for where to look for further optimization potential. On platforms with less extensive optimizations, yes, I agree, one might have the feeling that it is a little unfair.


>>> The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it...
>>
>> Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work.
>
> It's just 6x faster and uses ThirtyTwoBitRegister. If you’re still interested in it, I'll upload it somewhere, along with the missing method of ThirtyTwoBitRegister.

Sure, ThirtyTwoBitRegister is part of Squeak/Pharo, so, seems like a valid approach to me. And, since it is used for performance critical stuff, a compiler writer could decide to recognize it and map it directly on 32bit operations…


>>> About porting:
>>> I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes?
>>
>> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes.
>
> The methods I found in the PageRank benchmark are easy to port to 1-based index. I did that in my image.

Ok, I think I tried and failed. But perhaps that was another benchmark.

> Well, then you shouldn’t tell Java that the variable is “private static constant", because the VM should be able to find that out. :)

Ehm, well, actually, I even forgot the `final` keyword in the Java version :)
But, there is a difference between representing something as a constant with the available language features, and doing a manual constant propagation. The later is arguably rather bad for code readability and maintainability, because the intension gets lost.

Since we don’t have final fields in Smalltalk, a method returning a literal value is as close as we get. And, it is sufficient for good compilers. Putting a precomputed results in some field, that’s modifiable isn’t something I like. Especially with Smalltalk images and reflection, it’s always a little unclear whether the value is actually what you expect it to be.

Anyway, yes, cross-language and cross-dialect comparisons are inherently problematic.

Thanks for the comments
Stefan

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

timfelgentreff
Thanks Stefan for those benchmarks! Just for fun, we took them and ran them in Squeak 4.6 on Cog, the Stack VM, and RSqueak/VM.

This omits PageRank, because I couldn't be bothered to wait that long, and GraphSearch, because there seems to be a bug with RSqueak/VM so it fails with an out-of-bounds.

The numbers are normalized against the interpreter VM. Here are the winners per benchmark:
    mean.norm   benchmark      vm
1  0.02566508  Mandelbrot RSqueak
2  0.05702497 IntegerLoop RSqueak
3  0.06013495   FieldLoop RSqueak
4  0.06748896   WhileLoop RSqueak
5  0.08052232    Richards RSqueak
6  0.09240031      Towers RSqueak
7  0.15941078      Bounce     Cog
8  0.19163855       Sieve RSqueak
9  0.22643333      Queens RSqueak
10 0.25765379       NBody     Cog
11 0.26811460     Storage RSqueak
12 0.27148559    Fannkuch RSqueak
13 0.29138479   DeltaBlue     Cog
14 0.39468583        Json     Cog
15 0.44837529     Permute RSqueak

And here are the numbers for Cog and RSqueak/VM (Interpreter is 1):
            RSqueak           Cog
Bounce             0.44569231     0.1594108
DeltaBlue          0.50761783     0.2913848
Fannkuch           0.27148559     0.4696399
FieldLoop          0.06013495     0.2631032
IntegerLoop        0.05702497     0.1894448
Json               1.48456633     0.3946858
Mandelbrot         0.02566508     0.3334912
NBody              0.73934825     0.2576538
Permute            0.44837529     0.4645503
Queens             0.22643333     0.3669021
Richards           0.08052232     0.1307220
Sieve              0.19163855     0.6768656
Storage            0.26811460     0.4415439
Towers             0.09240031     0.2979555
WhileLoop          0.06748896     0.2240831
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi Tim:

Bye the way, I put the Spur image I used here: http://stefan-marr.de/downloads/Benchmarks-Spur-2015-04-06.zip

> On 10 Apr 2015, at 13:53, timfelgentreff <[hidden email]> wrote:
>
> The numbers are normalized against the interpreter VM. Here are the winners
> per benchmark:

Interesting, but, we should really have a talk about how to present data :-P
I can’t make sense out of your raw R output…

If I didn’t do anything wrong, this should be the speedup numbers over the interpreter.
IMHO, this is actually understandable.

          RSqueak Cog
Bounce 2,2 6,3
DeltaBlue 2,0 3,4
Fannkuch 3,7 2,1
FieldLoop 16,6 3,8
IntegerLoop 17,5 5,3
Json 0,7 2,5
Mandelbrot 39,0 3,0
NBody 1,4 3,9
Permute 2,2 2,2
Queens 4,4 2,7
Richards 12,4 7,6
Sieve 5,2 1,5
Storage 3,7 2,3
Towers 10,8 3,4
WhileLoop 14,8 4,5

I am a little surprised that Json is so bad.
In RTruffleSOM, which also uses meta-tracing, it’s not slower than the interpreter.

Best regards
Stefan

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Levente Uzonyi-2
In reply to this post by Stefan Marr-3
 


On Wed, 8 Apr 2015, Stefan Marr wrote:

>
> Hi Levente:
>
>> On 07 Apr 2015, at 22:57, Levente Uzonyi <[hidden email]> wrote:
>>
>> Answering some of my own questions:
>
> Sorry, didn’t find time earlier.
>
>> No, the code doesn't work out of the box in Pharo. I had to change SomAll’s superclass from Object to SomBenchmarkHarness.
>
> Ehm, I think I did not even port SomAll. That’s an empty class?
>
>> No, SomPageRank doesn't run on Pharo either. It gets into an infinite loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n = 1, and that makes the following loop infinite:
>
> That’s a bug in the original code. Yes. I didn’t fix it.
>
> Not sure how you even execute the code, since I didn’t document anything… Sorry.
I simply tried #run. E.g.: SomPageRank new run. This gave an error,
because SomAll doesn't implement this and that, but it seemed like
SomBenchmarkHarness does, so I simply changed SomAll to be a subclass of
SomBenchmarkHarness.

> But, the general idea is to execute it from the command line.
>
> From within the image, you can execute it with:
>   SomBenchmarkHarness new run: { ''. 'PageRank'. '10'. '0'. ‘5' }.
>
> On the command line it would look something like:
>   $yourVM $yourImage SomBenchmarkHarness  PageRank 500 0 1
>
> (Assuming that the Scripting package is loaded).
> Sorry, am currently traveling. Will try to create a proper configuration when I am back.
Smalltalk at: #ScriptConsole put: String new writeStream. did it for me.

Levente

>
> Best regards
> Stefan
>
>
> --
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

timfelgentreff
In reply to this post by Stefan Marr-3
We guessed JSON may be bad because we never looked at string performance.

I haven't checked RTruffleSOM in a while, but one issue that affects RSqueak (as well as Cog to some extent, I imagine) is that the interrupt timer. For us, it adds a couple of guards and bridges to pretty much every loop. SOM doesn't have an interactive image, does it?
Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

Stefan Marr-3

Hi Tim:

> On 10 Apr 2015, at 15:13, timfelgentreff <[hidden email]> wrote:
>
> We guessed JSON may be bad because we never looked at string performance.

Ok, sounds plausible. Indeed Json uses purely Strings for everything.

> I haven't checked RTruffleSOM in a while, but one issue that affects RSqueak
> (as well as Cog to some extent, I imagine) is that the interrupt timer. For
> us, it adds a couple of guards and bridges to pretty much every loop.

Ok, right. Yes, that’s going to cost performance.
Perhaps, you should have a proper ‘headless/event-less’ execution mode for benchmarking ;)

> SOM doesn’t have an interactive image, does it?

No, no images, no UI, no event handling. It’s really just a straight forward sequential language.

Best regards
Stefan

--
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/



Reply | Threaded
Open this post in threaded view
|

Re: Some Performance Numbers: Java vs. CogVM vs. SOM

timfelgentreff
Stefan Marr-3 wrote
> I haven't checked RTruffleSOM in a while, but one issue that affects RSqueak
> (as well as Cog to some extent, I imagine) is that the interrupt timer. For
> us, it adds a couple of guards and bridges to pretty much every loop.

Ok, right. Yes, that’s going to cost performance.
Perhaps, you should have a proper ‘headless/event-less’ execution mode for benchmarking ;)
We have that, but I wanted a fair comparison in the environment that an actual user of Squeak would run :)