https://forum.world.st/RoarVM-The-Manycore-SqueakVM-tp3025321p3030209.html
>
> On Sat, 6 Nov 2010, Igor Stasenko wrote:
>
>>
>> On 6 November 2010 17:26, Levente Uzonyi <
[hidden email]> wrote:
>>>
>>> On Thu, 4 Nov 2010, Stefan Marr wrote:
>>>
>>>>
>>>> Hi Bert:
>>>>
>>>> On 04 Nov 2010, at 20:20, Bert Freudenberg wrote:
>>>>
>>>>>>> So RoarVM is about 4 times slower in sends, even more so for bytecodes.
>>>>>>> It needs 8 cores to be faster the regular interpreter on a single core. To
>>>>>>> the good news is that it can beat the old interpreter :) But why is it so
>>>>>>> much slower than the normal interpreter?
>>>>>>
>>>>>> Well, one the one hand, we don't use stuff like the GCC label-as-value
>>>>>> extension to have threaded-interpretation, which should help quite a bit.
>>>>>> Then, the current implementation based on pthreads is quite a bit slower
>>>>>> then our version which uses plain Unix processes.
>>>>>> The GC is really not state of the art.
>>>>>> And all that adds up rather quickly I suppose...
>>>>>
>>>>> Hmm, that doesn't sound like it should make it 4x slower ...
>>>>
>>>> Do you know some numbers for the switch/case-based vs. the threaded
>>>> version on the standard VM?
>>>> How much do you typically gain by it?
>>>
>>> If threaded means gnuified (jump table instead of the linear search), then
>>> it gives ~2x speedup for the standard SqueakVM.
>>>
>> to my own experience it gives 30%
>
> Right, it depends on what we take into account. According to this mail:
http://lists.squeakfoundation.org/pipermail/vm-dev/2010-January/003761.html> tinyBenchmarks gives
> '248543689 bytecodes/sec; 8117987 sends/sec' without gnuification and
> '411244979 bytecodes/sec; 10560900 sends/sec' with gnuification.
>
> These aren't fully optimized VMs, so the difference may be smaller or larger with better optimizations. Anyway in this case in terms of bytecodes the difference is 65%, for sends it's 30%. So the general speedup is not 2x, but it's not 30% either.
>
> The actual performance difference may be greater depending on the used bytecodes (tinyBenchmarks uses only a few) and the compiler's capabilities. Btw I wonder why gcc can't compile switch statements like this to jump tables by itself without gnuification.
>
and from them, only few will have a power of two number of cases.
So, its not worth the effort.
non-trivial from compiler perspective, since its indirect.