Smalltalk › Gnu

Benchmarks

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

MrGwen

Benchmarks

Hi,

I think that most of you have seen the tinyBechmarks results :
http://fbanados.wordpress.com/2011/02/10/a-tinybenchmark/
In order to understand why we are I last, I've made some benchmarks:

(***WARNING*** my vm was compiled without the generation gc to use vallgrind)

1) Simple optimized byte codes:
========================

x timesRepeat: [ 1 + 1 ]

[1] source code line number 1
[3] push 100
[5] dup stack top
[7] dup stack top
[9] push 1
send 1 args message #>=
[11] pop and if false jump to 21
[13] push 1
send 1 args message #-
[15] push 1
[17] push 1
send 1 args message #+
[19] pop stack top
jump to 7
[21] pop stack top
[23] return stack top

It only sends optimized byte codes and never send a message; I've choosen it
to stress the byte code decoder and also never calls a GC.

I've done it also with other optimized messages like at:, at:put:

We are up to 3 times faster than cog.

2) Some optimized message send:
==========================

SmallInteger [
selfReturn [
^ self
]

literalReturn [
^ Object
]
]

x timesRepeat: [ 1 selfReturn ] or x timesRepeat: [ 1 literalReturn ]

Here we stress another part of the VM _gst_message_send :
1) "selfReturn" call a full lookup is done
2) after the cache does it work

in _gst_message_send_internal the message is optimized too it
will directly return the self or the literals,... we never create a context
and trigger a GC too

Here again we are faster than cog

3) Simple context activation:
=====================

SmallInteger [
foo [
^ 1+1
]
]

Again we stress _gst_message_send_internal but the messsage is really sent, so
what's the difference:
- a context is allocated
- and recycled
- GC is never call

Here again we are faster than cog

4) Now here comes the problem:
=======================

SmallInteger [
foo: anInteger time: aTimeInteger [
anInteger > 0 ifTrue: [
^ self foo: anInteger - 1 time: aTimeInteger
].

ObjectMemory quit.
]
]

Here another part of the vm is stressed :

context activation (they are not recycled here) and this is the problem
(***WARNING*** my vm was compiled without the generation gc to use vallgrind)

1) a GC is called => 76% of the global times of execution, it seems to
be the problem

2) when gst is out of free chunk with long recursions it crashes :
empty_context_stack

3) all the time an oop entry is allocated also gst could be low on oop
and trigger a gc

I hope those tiny simple benchmarks will help the gst community ;-)

Cheers,
Gwen

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

MrGwen

Re: Benchmarks

Sorry I forget the bench file ;-)

Cheers,
Gwen

On Mon, Feb 14, 2011 at 11:50 AM, Gwenaël Casaccio <[hidden email]> wrote:

> Hi,
>
> I think that most of you have seen the tinyBechmarks results :
> http://fbanados.wordpress.com/2011/02/10/a-tinybenchmark/
> In order to understand why we are I last, I've made some benchmarks:
>
> (***WARNING*** my vm was compiled without the generation gc to use vallgrind)
>
> 1) Simple optimized byte codes:
> ========================
>
> x timesRepeat: [ 1 + 1 ]
>
> [1] source code line number 1
> [3] push 100
> [5] dup stack top
> [7] dup stack top
> [9] push 1
> send 1 args message #>=
> [11] pop and if false jump to 21
> [13] push 1
> send 1 args message #-
> [15] push 1
> [17] push 1
> send 1 args message #+
> [19] pop stack top
> jump to 7
> [21] pop stack top
> [23] return stack top
>
> It only sends optimized byte codes and never send a message; I've choosen it
> to stress the byte code decoder and also never calls a GC.
>
> I've done it also with other optimized messages like at:, at:put:
>
> We are up to 3 times faster than cog.
>
> 2) Some optimized message send:
> ==========================
>
> SmallInteger [
> selfReturn [
> ^ self
> ]
>
> literalReturn [
> ^ Object
> ]
> ]
>
> x timesRepeat: [ 1 selfReturn ] or x timesRepeat: [ 1 literalReturn ]
>
> Here we stress another part of the VM _gst_message_send :
> 1) "selfReturn" call a full lookup is done
> 2) after the cache does it work
>
> in _gst_message_send_internal the message is optimized too it
> will directly return the self or the literals,... we never create a context
> and trigger a GC too
>
> Here again we are faster than cog
>
> 3) Simple context activation:
> =====================
>
> SmallInteger [
> foo [
> ^ 1+1
> ]
> ]
>
> Again we stress _gst_message_send_internal but the messsage is really sent, so
> what's the difference:
> - a context is allocated
> - and recycled
> - GC is never call
>
> Here again we are faster than cog
>
> 4) Now here comes the problem:
> =======================
>
> SmallInteger [
> foo: anInteger time: aTimeInteger [
> anInteger > 0 ifTrue: [
> ^ self foo: anInteger - 1 time: aTimeInteger
> ].
>
> ObjectMemory quit.
> ]
> ]
>
> Here another part of the vm is stressed :
>
> context activation (they are not recycled here) and this is the problem
> (***WARNING*** my vm was compiled without the generation gc to use vallgrind)
>
> 1) a GC is called => 76% of the global times of execution, it seems to
> be the problem
>
> 2) when gst is out of free chunk with long recursions it crashes :
> empty_context_stack
>
> 3) all the time an oop entry is allocated also gst could be low on oop
> and trigger a gc
>
> I hope those tiny simple benchmarks will help the gst community ;-)
>
> Cheers,
> Gwen
>

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

test.st (2K) Download Attachment

Paolo Bonzini-2

Re: Benchmarks

On 02/14/2011 11:51 AM, Gwenaël Casaccio wrote:

>> 4) Now here comes the problem:
>> =======================
>>
>> SmallInteger [
>> foo: anInteger time: aTimeInteger [
>> anInteger> 0 ifTrue: [
>> ^ self foo: anInteger - 1 time: aTimeInteger
>> ].
>>
>> ObjectMemory quit.
>> ]
>> ]

You're calling this with anInteger = 90000, and in this case I do expect
GC to be responsible for bad performance.

However, the numbers should be very different for, say, a depth of 50
like in your microbenchmark

(Time millisecondsToRun: [ 1000000 timesRepeat: [
5 recursionWithReturn: 50 ] ]) printNl.

How do gst/cog/squeak compare in this case?

Also, your benchmarks are missing one very important case, namely array
access. I believe this is the cause of the slowdown in the bytecode
benchmark, especially since you proved that everything else is faster.
:) This cannot be helped really, because it's due to the object table.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk