Smalltalk - Re: RoarVM: The Manycore SqueakVM

Smalltalk › Squeak › Squeak VM

Re: RoarVM: The Manycore SqueakVM

Posted by Levente Uzonyi-2 on Nov 06, 2010; 5:19pm
URL: https://forum.world.st/RoarVM-The-Manycore-SqueakVM-tp3025321p3030183.html

On Sat, 6 Nov 2010, Igor Stasenko wrote:

>
> On 6 November 2010 17:26, Levente Uzonyi <[hidden email]> wrote:
>>
>> On Thu, 4 Nov 2010, Stefan Marr wrote:
>>
>>>
>>> Hi Bert:
>>>
>>> On 04 Nov 2010, at 20:20, Bert Freudenberg wrote:
>>>
>>>>>> So RoarVM is about 4 times slower in sends, even more so for bytecodes.
>>>>>> It needs 8 cores to be faster the regular interpreter on a single core. To
>>>>>> the good news is that it can beat the old interpreter :) But why is it so
>>>>>> much slower than the normal interpreter?
>>>>>
>>>>> Well, one the one hand, we don't use stuff like the GCC label-as-value
>>>>> extension to have threaded-interpretation, which should help quite a bit.
>>>>> Then, the current implementation based on pthreads is quite a bit slower
>>>>> then our version which uses plain Unix processes.
>>>>> The GC is really not state of the art.
>>>>> And all that adds up rather quickly I suppose...
>>>>
>>>> Hmm, that doesn't sound like it should make it 4x slower ...
>>>
>>> Do you know some numbers for the switch/case-based vs. the threaded
>>> version on the standard VM?
>>> How much do you typically gain by it?
>>
>> If threaded means gnuified (jump table instead of the linear search), then
>> it gives ~2x speedup for the standard SqueakVM.
>>
> to my own experience it gives 30%

Right, it depends on what we take into account. According to this mail:
http://lists.squeakfoundation.org/pipermail/vm-dev/2010-January/003761.html
tinyBenchmarks gives
'248543689 bytecodes/sec; 8117987 sends/sec' without gnuification and
'411244979 bytecodes/sec; 10560900 sends/sec' with gnuification.

These aren't fully optimized VMs, so the difference may be smaller or
larger with better optimizations. Anyway in this case in terms of
bytecodes the difference is 65%, for sends it's 30%. So the general
speedup is not 2x, but it's not 30% either.

The actual performance difference may be greater depending on the used
bytecodes (tinyBenchmarks uses only a few) and the compiler's
capabilities. Btw I wonder why gcc can't compile switch statements like
this to jump tables by itself without gnuification.

Levente

>
>>
>> Levente
>>
>> snip
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>