Re: [squeak-dev] RoarVM: The Manycore SqueakVM

Posted by Stefan Marr on Nov 04, 2010; 6:36pm
Hi Bert:

On 04 Nov 2010, at 19:07, Bert Freudenberg wrote:

> So it looks like you have to use a power-of-two cores?
Yes, that is right. At the moment, the system isn't able to handle other numbers of cores.

> And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?
The code I used to generate the numbers isn't actually in any image yet.
I pasted it below for reference, its just a quick hack to have a parallel tinyBenchmarks version.

> So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :)  But why is it so much slower than the normal interpreter?
Well, one the one hand, we don't use stuff like the GCC label-as-value extension to have threaded-interpretation, which should help quite a bit.
Then, the current implementation based on pthreads is quite a bit slower then our version which uses plain Unix processes.
The GC is really not state of the art.
And all that adds up rather quickly I suppose...

> Btw, user interrupt didn't work on the Mac.
Cmd+. ? Works for me ;) Well, can you be a bit more specific? In which situation did it not work?

> And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.
Yes, same here. Sorry. Any hints where to start looking to fix such issues are appreciated.

Best regards

My tiny Benchmarks:

> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:30'!
> tinyBenchmarksParallel16Processes
> "Report the results of running the two tiny Squeak benchmarks.
> ar 9/10/1999: Adjusted to run at least 1 sec to get more stable results"
> "0 tinyBenchmarks"
> "On a 292 MHz G3 Mac: 22727272 bytecodes/sec; 984169 sends/sec"
> "On a 400 MHz PII/Win98:  18028169 bytecodes/sec; 1081272 sends/sec"
> | t1 t2 r n1 n2 |
> n1 := 1.
> [t1 := Time millisecondsToRun: [n1 benchmark].
> t1 < 1000] whileTrue:[n1 := n1 * 2]. "Note: #benchmark's runtime is about O(n)"
> "now n1 is the value for which we do the measurement"
> t1 := Time millisecondsToRun: [self run: #benchmark on: n1 times: 16].
> n2 := 28.
> [t2 := Time millisecondsToRun: [r := n2 benchFib].
> t2 < 1000] whileTrue:[n2 := n2 + 1].
> "Note: #benchFib's runtime is about O(k^n),
> where k is the golden number = (1 + 5 sqrt) / 2 = 1.618...."
> "now we have our target n2 and r value.
> lets take the time for it"
> t2 := Time millisecondsToRun: [self run: #benchFib on: n2 times: 16].
> ^ { ((n1 * 16 * 500000 * 1000) // t1). " printString, ' bytecodes/sec; ',"
>   ((r * 16 * 1000) // t2) " printString, ' sends/sec'"
>   }! !
> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:29'!
> run: aSymbol on: aReceiver times: nTimes
> | mtx sig n |
> mtx := Semaphore forMutualExclusion.
> sig := Semaphore new.
> n := nTimes.
> nTimes timesRepeat: [
> [ aReceiver perform: aSymbol.
> mtx critical: [
> n := n - 1.
> (n == 0) ifTrue: [sig signal]]
> ] fork
> ].
> sig wait.! !

