Re: [squeak-dev] RoarVM: The Manycore SqueakVM
Posted by Bert Freudenberg on Nov 04, 2010; 6:07pm
URL: https://forum.world.st/RoarVM-The-Manycore-SqueakVM-tp3025321p3027493.html
On 03.11.2010, at 14:13, Stefan Marr wrote:
> A small teaser:
> 1 core 66286897 bytecodes/sec; 2910474 sends/sec
> 8 cores 470588235 bytecodes/sec; 19825677 sends/sec
I tried your precompiled OS X VM and the Sly3 image.
1 core: 93,910,491 bytecodes/sec; 4,056,440 sends/sec
2 cores: 91,559,370 bytecodes/sec; 4,007,927 sends/sec
3 cores: can't start
4 cores: 90,844,570 bytecodes/sec; 3,935,516 sends/sec
5 cores: can't start
6 cores: can't start
7 cores: can't start
8 cores: 89,698,668 bytecodes/sec; 3,910,787 sends/sec
So it looks like you have to use a power-of-two cores?
And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?
I tried something myself:
n := 16.
q := SharedQueue new.
time := Time millisecondsToRun:
[n timesRepeat: [[q nextPut: [30 benchFib] timeToRun] fork].
n timesRepeat: [Transcript space; show: q next]].
Transcript space; show: time; cr
1 core: 664 664 665 666 667 662 664 664 668 665 667 665 666 669 666 10700
2 cores: 675 674 672 669 677 669 669 672 678 670 668 669 674 668 668 5425
4 cores: 721 726 729 740 713 728 740 734 731 737 721 737 734 756 788 749 3030
8 cores: 786 807 837 847 865 872 916 840 800 873 792 880 846 865 829 1820
Now that scales pretty nicely :) The overhead is about 25% at 8 cores, 12% for 4 cores.
For our regular interpreter (*) I get:
1 core: 162 159 157 158 158 160 159 159 159 159 159 158 160 158 159 2585
So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :) But why is it so much slower than the normal interpreter?
Btw, user interrupt didn't work on the Mac.
And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.
- Bert -
(*) For comparison, a regular interpreter (not Cog) on this machine gets
789,514,263 bytecodes/sec; 17,199,374 sends/sec
and Cog does
880,481,513 bytecodes/sec; 70,113,306 sends/sec