Benchmarks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Benchmarks

MrGwen
Hi,

I think that most of you have seen the tinyBechmarks results :
http://fbanados.wordpress.com/2011/02/10/a-tinybenchmark/
In order to understand why we are I last, I've made some benchmarks:

(***WARNING*** my vm was compiled without the generation gc to use vallgrind)

1) Simple optimized byte codes:
========================

x timesRepeat: [ 1 + 1 ]

    [1] source code line number 1
    [3] push 100
    [5] dup stack top
    [7] dup stack top
    [9] push 1
        send 1 args message #>=
   [11] pop and if false jump to 21
   [13] push 1
        send 1 args message #-
   [15] push 1
   [17] push 1
        send 1 args message #+
   [19] pop stack top
        jump to 7
   [21] pop stack top
   [23] return stack top

It only sends optimized byte codes and never send a message; I've choosen it
to stress the byte code decoder and also never calls a GC.

I've done it also with other optimized messages like at:, at:put:

We are up to 3 times faster than cog.

2) Some optimized message send:
==========================

SmallInteger [
  selfReturn [
    ^ self
  ]

  literalReturn [
    ^ Object
  ]
]

x timesRepeat: [ 1 selfReturn ] or x timesRepeat: [ 1 literalReturn ]

Here we stress another part of the VM _gst_message_send :
  1) "selfReturn" call  a full lookup is done
  2) after the cache does it work

in _gst_message_send_internal the message is optimized too it
will directly return the self or the literals,... we never create a context
and trigger a GC too

Here again we are faster than cog

3) Simple context activation:
=====================

SmallInteger [
  foo [
    ^ 1+1
  ]
]

Again we stress _gst_message_send_internal but the messsage is really sent, so
what's the difference:
  - a context is allocated
  - and recycled
  - GC is never call

Here again we are faster than cog

4) Now here comes the problem:
=======================

SmallInteger [
  foo: anInteger time: aTimeInteger [
    anInteger > 0 ifTrue: [
      ^ self  foo: anInteger - 1 time: aTimeInteger
    ].

    ObjectMemory quit.
  ]
]

Here another part of the vm is stressed :

context activation (they are not recycled here) and this is the problem
(***WARNING*** my vm was compiled without the generation gc to use vallgrind)

1) a GC is called => 76% of the global times of execution, it seems to
be the problem

2) when gst is out of free chunk with long recursions it crashes :
empty_context_stack

3) all the time an oop entry is allocated also gst could be low on oop
and trigger a gc

I hope those tiny simple benchmarks will help the gst community ;-)

Cheers,
Gwen

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Benchmarks

MrGwen
Sorry I forget the bench file ;-)

Cheers,
Gwen

On Mon, Feb 14, 2011 at 11:50 AM, Gwenaël Casaccio <[hidden email]> wrote:

> Hi,
>
> I think that most of you have seen the tinyBechmarks results :
> http://fbanados.wordpress.com/2011/02/10/a-tinybenchmark/
> In order to understand why we are I last, I've made some benchmarks:
>
> (***WARNING*** my vm was compiled without the generation gc to use vallgrind)
>
> 1) Simple optimized byte codes:
> ========================
>
> x timesRepeat: [ 1 + 1 ]
>
>    [1] source code line number 1
>    [3] push 100
>    [5] dup stack top
>    [7] dup stack top
>    [9] push 1
>        send 1 args message #>=
>   [11] pop and if false jump to 21
>   [13] push 1
>        send 1 args message #-
>   [15] push 1
>   [17] push 1
>        send 1 args message #+
>   [19] pop stack top
>        jump to 7
>   [21] pop stack top
>   [23] return stack top
>
> It only sends optimized byte codes and never send a message; I've choosen it
> to stress the byte code decoder and also never calls a GC.
>
> I've done it also with other optimized messages like at:, at:put:
>
> We are up to 3 times faster than cog.
>
> 2) Some optimized message send:
> ==========================
>
> SmallInteger [
>  selfReturn [
>    ^ self
>  ]
>
>  literalReturn [
>    ^ Object
>  ]
> ]
>
> x timesRepeat: [ 1 selfReturn ] or x timesRepeat: [ 1 literalReturn ]
>
> Here we stress another part of the VM _gst_message_send :
>  1) "selfReturn" call  a full lookup is done
>  2) after the cache does it work
>
> in _gst_message_send_internal the message is optimized too it
> will directly return the self or the literals,... we never create a context
> and trigger a GC too
>
> Here again we are faster than cog
>
> 3) Simple context activation:
> =====================
>
> SmallInteger [
>  foo [
>    ^ 1+1
>  ]
> ]
>
> Again we stress _gst_message_send_internal but the messsage is really sent, so
> what's the difference:
>  - a context is allocated
>  - and recycled
>  - GC is never call
>
> Here again we are faster than cog
>
> 4) Now here comes the problem:
> =======================
>
> SmallInteger [
>  foo: anInteger time: aTimeInteger [
>    anInteger > 0 ifTrue: [
>      ^ self  foo: anInteger - 1 time: aTimeInteger
>    ].
>
>    ObjectMemory quit.
>  ]
> ]
>
> Here another part of the vm is stressed :
>
> context activation (they are not recycled here) and this is the problem
> (***WARNING*** my vm was compiled without the generation gc to use vallgrind)
>
> 1) a GC is called => 76% of the global times of execution, it seems to
> be the problem
>
> 2) when gst is out of free chunk with long recursions it crashes :
> empty_context_stack
>
> 3) all the time an oop entry is allocated also gst could be low on oop
> and trigger a gc
>
> I hope those tiny simple benchmarks will help the gst community ;-)
>
> Cheers,
> Gwen
>

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

test.st (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Benchmarks

Paolo Bonzini-2
On 02/14/2011 11:51 AM, Gwenaël Casaccio wrote:

>>  4) Now here comes the problem:
>>  =======================
>>
>>  SmallInteger [
>>    foo: anInteger time: aTimeInteger [
>>      anInteger>  0 ifTrue: [
>>        ^ self  foo: anInteger - 1 time: aTimeInteger
>>      ].
>>
>>      ObjectMemory quit.
>>    ]
>>  ]

You're calling this with anInteger = 90000, and in this case I do expect
GC to be responsible for bad performance.

However, the numbers should be very different for, say, a depth of 50
like in your microbenchmark

   (Time millisecondsToRun: [ 1000000 timesRepeat: [
      5 recursionWithReturn: 50 ] ]) printNl.

How do gst/cog/squeak compare in this case?

Also, your benchmarks are missing one very important case, namely array
access.  I believe this is the cause of the slowdown in the bytecode
benchmark, especially since you proved that everything else is faster.
:)  This cannot be helped really, because it's due to the object table.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk