RoarVM: The Manycore SqueakVM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RoarVM: The Manycore SqueakVM

Stefan Marr
Dear Smalltalk community:


We are happy to announce, now officially, RoarVM: the first single-image
manycore virtual machine for Smalltalk.


The RoarVM supports the parallel execution of Smalltalk programs on x86
compatible multicore systems and Tilera TILE64-based manycore systems. It is
tested with standard Squeak 4.1 closure-enabled images, and with a stripped
down version of a MVC-based Squeak 3.7 image. Support for Pharo 1.2 is
currently limited to 1 core, but we are working on it.

A small teaser:
  1 core   66286897 bytecodes/sec;  2910474 sends/sec
  8 cores 470588235 bytecodes/sec; 19825677 sends/sec


RoarVM is based on the work of David Ungar and Sam S. Adams at IBM Research.
The port to x86 multicore systems was done by me. They open-sourced their VM,
formerly know as Renaissance VM (RVM), under the Eclipse Public License [1].
Official announcement of the IBM source code release:
  http://soft.vub.ac.be/~smarr/rvm-open-source-release/

The source code of the RoarVM has been released as open source to enable the
Smalltalk community to evaluate the ideas and possibly integrate them into
existing systems. So, the RoarVM is meant to experiment with Smalltalk systems
on multi- and manycore machines.

The open source project, and downloads can also be found on GitHub:
    http://github.com/smarr/RoarVM
    http://github.com/smarr/RoarVM/downloads

For more detailed information, please refer to the README file[3].
Instructions to compile the RoarVM on Linux and OS X can be found at [4].
Windows is currently not supported, however, there are good chances that it
will work with cygwin or pthreads for win32, but that has not be verified in
anyway. If you feel brave, please give it a shot and report back.

If the community does not object, we would like to occupy the
[hidden email] mailinglist for related discussions. So, if
you run into any trouble while experimenting with the RoarVM, do not hesitate
to report any problems and ask any questions.

You can also follow us on Twitter @roarvm [5].

Best regards
Stefan Marr

[1] http://www.eclipse.org/legal/epl-v10.html
[2] http://soft.vub.ac.be/~smarr/rvm-open-source-release/
[3] http://github.com/smarr/RoarVM/blob/master/README.rst
[4] http://github.com/smarr/RoarVM/blob/master/INSTALL.rst
[5] http://twitter.com/roarvm

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Igor Stasenko
On 3 November 2010 15:13, Stefan Marr <[hidden email]> wrote:

[snip]

> If the community does not object, we would like to occupy the
> [hidden email] mailinglist for related discussions. So, if
> you run into any trouble while experimenting with the RoarVM, do not hesitate
> to report any problems and ask any questions.
>

No objections from me! You are welcome there!



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: RoarVM: The Manycore SqueakVM

Yanni Chiu
In reply to this post by Stefan Marr
On 03/11/10 9:13 AM, Stefan Marr wrote:
>
> A small teaser:
>    1 core   66286897 bytecodes/sec;  2910474 sends/sec
>    8 cores 470588235 bytecodes/sec; 19825677 sends/sec

I'm trying to understand what is meant by this benchmark. How does
tinyBenchmarks get run on 8 cores. How does the work get distributed
among the cores. But I cannot find the answer quickly - see below.

> For more detailed information, please refer to the README file[3].

The links take you to the ACM portal. One promising link on the portal:
     DLS '09 Proceedings of the 5th symposium on Dynamic languages
is a dead link to HPI (www.uni-potsdam.de). I doubt I want to subscribe
to ACM portal at this point, just to find out more about RoarVM. And,
given that ACM portal is the only place to find out more, I doubt I'm
going to look at RoarVM for any more than the brief glance through the
github source that I've already done.

Reply | Threaded
Open this post in threaded view
|

Re: RoarVM: The Manycore SqueakVM

Tobias Pape
Am 2010-11-03 um 17:45 schrieb Yanni Chiu:

> On 03/11/10 9:13 AM, Stefan Marr wrote:
>>
>> A small teaser:
>>   1 core   66286897 bytecodes/sec;  2910474 sends/sec
>>   8 cores 470588235 bytecodes/sec; 19825677 sends/sec
>
> I'm trying to understand what is meant by this benchmark. How does tinyBenchmarks get run on 8 cores. How does the work get distributed among the cores. But I cannot find the answer quickly - see below.
>
>> For more detailed information, please refer to the README file[3].
>
> The links take you to the ACM portal. One promising link on the portal:
>    DLS '09 Proceedings of the 5th symposium on Dynamic languages
> is a dead link to HPI (www.uni-potsdam.de).

Note that the home page of the DLS is http://dynamic-languages-symposium.org/
and that of the DLS '09 is at http://dynamic-languages-symposium.org/dls-09/index.html
The proceedings are on ACM.
and if you want to contact the SWA group at HPI, its homepage is
http://www.hpi.uni-potsdam.de/swa/


HTH
So Long,
        -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: RoarVM: The Manycore SqueakVM

Stefan Marr
In reply to this post by Yanni Chiu
Hi Yanni:

On 03 Nov 2010, at 17:45, Yanni Chiu wrote:

> On 03/11/10 9:13 AM, Stefan Marr wrote:
>>
>> A small teaser:
>>   1 core   66286897 bytecodes/sec;  2910474 sends/sec
>>   8 cores 470588235 bytecodes/sec; 19825677 sends/sec
>
> I'm trying to understand what is meant by this benchmark. How does tinyBenchmarks get run on 8 cores. How does the work get distributed among the cores. But I cannot find the answer quickly - see below.
It is an adapted version of the tinyBenchmarks.
I will make the code available soonish...

Anyway, the idea is that the benchmark is started n-times in n-Smalltalk processes.
Then the overall time for execution is measured. If actually more cores are scheduling the started processes for execution, you will see the increase above. Otherwise, you just see your normal sequential performance.

>> For more detailed information, please refer to the README file[3].
>
> The links take you to the ACM portal. One promising link on the portal:
>    DLS '09 Proceedings of the 5th symposium on Dynamic languages
> is a dead link to HPI (www.uni-potsdam.de). I doubt I want to subscribe to ACM portal at this point, just to find out more about RoarVM. And, given that ACM portal is the only place to find out more, I doubt I'm going to look at RoarVM for any more than the brief glance through the github source that I've already done.
I am sorry that there is a pay-wall, however, I can't really do anything about it...
But, you can always ask the authors of a particular paper to give you a copy, I suppose.

So, the question is how the work gets distributed?
Well, like in Multiprocessor Smalltalk:

You have one scheduler, like in a standard image, it is only slightly adapted to accommodate the fact that there can be more than one activeProcess (See http://github.com/smarr/RoarVM/raw/master/image.st/RVM-multicore-support.mvc.st).

Once a core gets to a point where it needs to revaluate its scheduling decision, it goes to the central scheduler and picks a process to execute it.
Nothing fancy here, no sophisticated work stealing or distribution process.

Does that answer your question?

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: RoarVM: The Manycore SqueakVM

Bert Freudenberg
In reply to this post by Stefan Marr
On 03.11.2010, at 14:13, Stefan Marr wrote:

> A small teaser:
>  1 core   66286897 bytecodes/sec;  2910474 sends/sec
>  8 cores 470588235 bytecodes/sec; 19825677 sends/sec

I tried your precompiled OS X VM and the Sly3 image.

1 core:  93,910,491 bytecodes/sec; 4,056,440 sends/sec
2 cores: 91,559,370 bytecodes/sec; 4,007,927 sends/sec
3 cores: can't start
4 cores: 90,844,570 bytecodes/sec; 3,935,516 sends/sec
5 cores: can't start
6 cores: can't start
7 cores: can't start
8 cores: 89,698,668 bytecodes/sec; 3,910,787 sends/sec

So it looks like you have to use a power-of-two cores?

And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?

I tried something myself:

n := 16.
q := SharedQueue new.
time := Time millisecondsToRun:
        [n timesRepeat: [[q nextPut: [30 benchFib] timeToRun] fork].
        n timesRepeat: [Transcript space; show: q next]].
Transcript space; show: time; cr

1 core:  664 664 665 666 667 662 664 664 668 665 667 665 666 669 666 10700
2 cores: 675 674 672 669 677 669 669 672 678 670 668 669 674 668 668 5425
4 cores: 721 726 729 740 713 728 740 734 731 737 721 737 734 756 788 749 3030
8 cores: 786 807 837 847 865 872 916 840 800 873 792 880 846 865 829 1820

Now that scales pretty nicely :) The overhead is about 25% at 8 cores, 12% for 4 cores.

For our regular interpreter (*) I get:
1 core: 162 159 157 158 158 160 159 159 159 159 159 158 160 158 159 2585

So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :)  But why is it so much slower than the normal interpreter?

Btw, user interrupt didn't work on the Mac.

And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.

- Bert -

(*) For comparison, a regular interpreter (not Cog) on this machine gets
    789,514,263 bytecodes/sec; 17,199,374 sends/sec
and Cog does
    880,481,513 bytecodes/sec; 70,113,306 sends/sec


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] RoarVM: The Manycore SqueakVM

Stefan Marr
Hi Bert:


On 04 Nov 2010, at 19:07, Bert Freudenberg wrote:

> So it looks like you have to use a power-of-two cores?
Yes, that is right. At the moment, the system isn't able to handle other numbers of cores.


> And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?
The code I used to generate the numbers isn't actually in any image yet.
I pasted it below for reference, its just a quick hack to have a parallel tinyBenchmarks version.


> So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :)  But why is it so much slower than the normal interpreter?
Well, one the one hand, we don't use stuff like the GCC label-as-value extension to have threaded-interpretation, which should help quite a bit.
Then, the current implementation based on pthreads is quite a bit slower then our version which uses plain Unix processes.
The GC is really not state of the art.
And all that adds up rather quickly I suppose...


> Btw, user interrupt didn't work on the Mac.
Cmd+. ? Works for me ;) Well, can you be a bit more specific? In which situation did it not work?


> And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.
Yes, same here. Sorry. Any hints where to start looking to fix such issues are appreciated.



Best regards
Stefan


My tiny Benchmarks:

> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:30'!
> tinyBenchmarksParallel16Processes
> "Report the results of running the two tiny Squeak benchmarks.
> ar 9/10/1999: Adjusted to run at least 1 sec to get more stable results"
> "0 tinyBenchmarks"
> "On a 292 MHz G3 Mac: 22727272 bytecodes/sec; 984169 sends/sec"
> "On a 400 MHz PII/Win98:  18028169 bytecodes/sec; 1081272 sends/sec"
> | t1 t2 r n1 n2 |
> n1 := 1.
> [t1 := Time millisecondsToRun: [n1 benchmark].
> t1 < 1000] whileTrue:[n1 := n1 * 2]. "Note: #benchmark's runtime is about O(n)"
>
> "now n1 is the value for which we do the measurement"
> t1 := Time millisecondsToRun: [self run: #benchmark on: n1 times: 16].
>
> n2 := 28.
> [t2 := Time millisecondsToRun: [r := n2 benchFib].
> t2 < 1000] whileTrue:[n2 := n2 + 1].
> "Note: #benchFib's runtime is about O(k^n),
> where k is the golden number = (1 + 5 sqrt) / 2 = 1.618...."
>
> "now we have our target n2 and r value.
> lets take the time for it"
> t2 := Time millisecondsToRun: [self run: #benchFib on: n2 times: 16].
>
> ^ { ((n1 * 16 * 500000 * 1000) // t1). " printString, ' bytecodes/sec; ',"
>   ((r * 16 * 1000) // t2) " printString, ' sends/sec'"
>   }! !
> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:29'!
> run: aSymbol on: aReceiver times: nTimes
> | mtx sig n |
>
> mtx := Semaphore forMutualExclusion.
> sig := Semaphore new.
> n := nTimes.
>
> nTimes timesRepeat: [
> [ aReceiver perform: aSymbol.
> mtx critical: [
> n := n - 1.
> (n == 0) ifTrue: [sig signal]]
> ] fork
> ].
> sig wait.! !

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] RoarVM: The Manycore SqueakVM

ungar
Hold on... don't we run on 56 cores on Tilera?


On Nov 4, 2010, at 11:36 AM, Stefan Marr wrote:

>
> Hi Bert:
>
>
> On 04 Nov 2010, at 19:07, Bert Freudenberg wrote:
>
>> So it looks like you have to use a power-of-two cores?
> Yes, that is right. At the moment, the system isn't able to handle other numbers of cores.
>
>
>> And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?
> The code I used to generate the numbers isn't actually in any image yet.
> I pasted it below for reference, its just a quick hack to have a parallel tinyBenchmarks version.
>
>
>> So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :)  But why is it so much slower than the normal interpreter?
> Well, one the one hand, we don't use stuff like the GCC label-as-value extension to have threaded-interpretation, which should help quite a bit.
> Then, the current implementation based on pthreads is quite a bit slower then our version which uses plain Unix processes.
> The GC is really not state of the art.
> And all that adds up rather quickly I suppose...
>
>
>> Btw, user interrupt didn't work on the Mac.
> Cmd+. ? Works for me ;) Well, can you be a bit more specific? In which situation did it not work?
>
>
>> And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.
> Yes, same here. Sorry. Any hints where to start looking to fix such issues are appreciated.
>
>
>
> Best regards
> Stefan
>
>
> My tiny Benchmarks:
>> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:30'!
>> tinyBenchmarksParallel16Processes
>> "Report the results of running the two tiny Squeak benchmarks.
>> ar 9/10/1999: Adjusted to run at least 1 sec to get more stable results"
>> "0 tinyBenchmarks"
>> "On a 292 MHz G3 Mac: 22727272 bytecodes/sec; 984169 sends/sec"
>> "On a 400 MHz PII/Win98:  18028169 bytecodes/sec; 1081272 sends/sec"
>> | t1 t2 r n1 n2 |
>> n1 := 1.
>> [t1 := Time millisecondsToRun: [n1 benchmark].
>> t1 < 1000] whileTrue:[n1 := n1 * 2]. "Note: #benchmark's runtime is about O(n)"
>>
>> "now n1 is the value for which we do the measurement"
>> t1 := Time millisecondsToRun: [self run: #benchmark on: n1 times: 16].
>>
>> n2 := 28.
>> [t2 := Time millisecondsToRun: [r := n2 benchFib].
>> t2 < 1000] whileTrue:[n2 := n2 + 1].
>> "Note: #benchFib's runtime is about O(k^n),
>> where k is the golden number = (1 + 5 sqrt) / 2 = 1.618...."
>>
>> "now we have our target n2 and r value.
>> lets take the time for it"
>> t2 := Time millisecondsToRun: [self run: #benchFib on: n2 times: 16].
>>
>> ^ { ((n1 * 16 * 500000 * 1000) // t1). " printString, ' bytecodes/sec; ',"
>>   ((r * 16 * 1000) // t2) " printString, ' sends/sec'"
>>   }! !
>> !Integer methodsFor: 'benchmarks' stamp: 'sm 10/11/2010 22:29'!
>> run: aSymbol on: aReceiver times: nTimes
>> | mtx sig n |
>>
>> mtx := Semaphore forMutualExclusion.
>> sig := Semaphore new.
>> n := nTimes.
>>
>> nTimes timesRepeat: [
>> [ aReceiver perform: aSymbol.
>> mtx critical: [
>> n := n - 1.
>> (n == 0) ifTrue: [sig signal]]
>> ] fork
>> ].
>> sig wait.! !
>
> --
> Stefan Marr
> Software Languages Lab
> Vrije Universiteit Brussel
> Pleinlaan 2 / B-1050 Brussels / Belgium
> http://soft.vub.ac.be/~smarr
> Phone: +32 2 629 2974
> Fax:   +32 2 629 3525
>


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Stephen Pair
In reply to this post by Stefan Marr
On Wednesday, November 3, 2010, Stefan Marr <[hidden email]> wrote:

>
> Dear Smalltalk community:
>
>
> We are happy to announce, now officially, RoarVM: the first single-image
> manycore virtual machine for Smalltalk.
>
>
> The RoarVM supports the parallel execution of Smalltalk programs on x86
> compatible multicore systems and Tilera TILE64-based manycore systems. It is
> tested with standard Squeak 4.1 closure-enabled images, and with a stripped
> down version of a MVC-based Squeak 3.7 image. Support for Pharo 1.2 is
> currently limited to 1 core, but we are working on it.

What great news!

The main question that comes to mind for me is concurrency.  Does this
VM do anything special to preserve the concurrency semantics of
smalltalk processes scheduled on a single core?  As I'm sure most
people are aware, the existing squeak library isn't written with
thread safety in mind...and as such, even on a single core a naive
implementation can have issues when executing with concurrent
processes.  The solution is generally to try and isolate the objects
running in different, concurrent processes and use message passing and
replication as needed.  If this VM doesn't do anything in this regard,
I would expect that it would present an even greater risk of issues
stemming from concurrency and it would make it all the more important
to keep objects accessible in different processes cleanly separated.

Another question that comes to mind is how much of a performance hit
you might see from the architecture trying to maintain cache
consistency in the face of multiple processes simultaneously updating
shared memory.  Is that something you've found to be an issue?  Is
this something you would have to be careful about when crafting code?
And, if it is a problem, is it something where you'd need to be
concerned not just with shared objects, but also with shared pages
(ie. would you need some measure of control over pages being updated
from multiple concurrent processes to effectively deal with this
issue)?

Lastly, could you summarize how the design of this VM differs from the
designs of other multithreaded VMs that have been implemented in the
past?

Its exciting to see this kind of innovation happening in the squeak community!

-Stephen

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Igor Stasenko
On 6 November 2010 19:49, Stephen Pair <[hidden email]> wrote:

> On Wednesday, November 3, 2010, Stefan Marr <[hidden email]> wrote:
>>
>> Dear Smalltalk community:
>>
>>
>> We are happy to announce, now officially, RoarVM: the first single-image
>> manycore virtual machine for Smalltalk.
>>
>>
>> The RoarVM supports the parallel execution of Smalltalk programs on x86
>> compatible multicore systems and Tilera TILE64-based manycore systems. It is
>> tested with standard Squeak 4.1 closure-enabled images, and with a stripped
>> down version of a MVC-based Squeak 3.7 image. Support for Pharo 1.2 is
>> currently limited to 1 core, but we are working on it.
>
> What great news!
>
> The main question that comes to mind for me is concurrency.  Does this
> VM do anything special to preserve the concurrency semantics of
> smalltalk processes scheduled on a single core?  As I'm sure most
> people are aware, the existing squeak library isn't written with
> thread safety in mind...and as such, even on a single core a naive
> implementation can have issues when executing with concurrent
> processes.  The solution is generally to try and isolate the objects
> running in different, concurrent processes and use message passing and
> replication as needed.  If this VM doesn't do anything in this regard,
> I would expect that it would present an even greater risk of issues
> stemming from concurrency and it would make it all the more important
> to keep objects accessible in different processes cleanly separated.
>
No. Its a double-edged sword. If you turn VM into all-safe sandbox,
then you'll have worst performance,
and much less flexibility for developers.
I like David's message: instead of running away from concurrency and
hiding it in secret niches, we should embrace it instead.

> Another question that comes to mind is how much of a performance hit
> you might see from the architecture trying to maintain cache
> consistency in the face of multiple processes simultaneously updating
> shared memory.  Is that something you've found to be an issue?  Is
> this something you would have to be careful about when crafting code?
> And, if it is a problem, is it something where you'd need to be
> concerned not just with shared objects, but also with shared pages
> (ie. would you need some measure of control over pages being updated
> from multiple concurrent processes to effectively deal with this
> issue)?
>
I think this is too much to take care of. Different architectures may
have different cache organization
(even different processors could), and so, on one arch your code could
run great, on another arch, it will crawl.

Also, if your parallel algorithm relies on too much shared state, then
it is not good parallel algorithm. You'll never get it working fast
or scale well, no matter how hard you try, because this is same as
trying to fit square into circle.


> Lastly, could you summarize how the design of this VM differs from the
> designs of other multithreaded VMs that have been implemented in the
> past?
>
> Its exciting to see this kind of innovation happening in the squeak community!
>
> -Stephen
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Stephen Pair
On Nov 6, 2010, at 2:05 PM, Igor Stasenko <[hidden email]> wrote:
> No. Its a double-edged sword. If you turn VM into all-safe sandbox,
> then you'll have worst performance,
> and much less flexibility for developers.
> I like David's message: instead of running away from concurrency and
> hiding it in secret niches, we should embrace it instead.

Not wanting to hide anything in the VM...just want to understand RoarVM.  In an ideal world it would all be implemented in Smalltalk (or something similar) and dynamically translated directly into machine code.

>
>> Another question that comes to mind is how much of a performance hit
>> you might see from the architecture trying to maintain cache
>> consistency in the face of multiple processes simultaneously updating
>> shared memory.  Is that something you've found to be an issue?  Is
>> this something you would have to be careful about when crafting code?
>> And, if it is a problem, is it something where you'd need to be
>> concerned not just with shared objects, but also with shared pages
>> (ie. would you need some measure of control over pages being updated
>> from multiple concurrent processes to effectively deal with this
>> issue)?
>>
> I think this is too much to take care of. Different architectures may
> have different cache organization
> (even different processors could), and so, on one arch your code could
> run great, on another arch, it will crawl.

What I'm getting at is whether there is a need for finer control over object allocation such that you could avoid a case where an object used in one process lives on the same page as another object used by a different process and you end up with performance issues due to the overhead of maintaining cache consistency (and, yes, I agree, such things would be very architecture dependent).

> Also, if your parallel algorithm relies on too much shared state, then
> it is not good parallel algorithm. You'll never get it working fast
> or scale well, no matter how hard you try, because this is same as
> trying to fit square into circle.

Yep.

- Stephen
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Stefan Marr
In reply to this post by Stephen Pair
Hello Stephen:


On 06 Nov 2010, at 18:49, Stephen Pair wrote:
> The main question that comes to mind for me is concurrency.  Does this
> VM do anything special to preserve the concurrency semantics of
> smalltalk processes scheduled on a single core?  As I'm sure most
> people are aware, the existing squeak library isn't written with
> thread safety in mind...and as such, even on a single core a naive
> implementation can have issues when executing with concurrent
> processes.
The VM doesn't do anything, thus, you have to take care of ensuring the semantic
you want by yourself. The idea is to preserve the standard Smalltalk programming model
without introducing new facilities.
Thus, what you get with the RoarVM is what you had before, i.e., processes + mutexes/semaphores,
and in addition to that your processes can be executed in parallel.

I agree, that is a very low-level programming model, but that is the right foundation
for us to experiment with ideas like Ly and Sly.
Ly aims to be a language in which you program by embracing non-determenism and the goal is to
find ways to benefit from the parallelism and the natural non-determinism of the system.


> The solution is generally to try and isolate the objects
> running in different, concurrent processes and use message passing and
> replication as needed. If this VM doesn't do anything in this regard,
> I would expect that it would present an even greater risk of issues
> stemming from concurrency and it would make it all the more important
> to keep objects accessible in different processes cleanly separated.
There are certain actor implementations for Smalltalk, I guess, you could get them working on the RoarVM.
At the moment, there isn't any special VM support for these programming models. However, there are ideas and plans to enable language designers to build such languages in an efficient manner by providing VM support for a flexible notion of encapsulation.

> Another question that comes to mind is how much of a performance hit
> you might see from the architecture trying to maintain cache
> consistency in the face of multiple processes simultaneously updating
> shared memory. Is that something you've found to be an issue?
Sure, that is a big issue. It is already problematic for performance on x86 systems with a few cores, and even worse on Tilera.

Just imagine a simple counter that needs to be updated atomically by 63 other cores...
There is no way to make such a counter scale on any system.
The only thing you can do is to not use such a counter. In 99% of the cases you don't need it anyway.

As Igor pointed out, if you want performance, you will avoid shared mutable state. The solution for such a counter would be to have local counters, and you synchronize only to get a global sum and only when it is really really necessary. But the optimal solution is very application specific.

> Is this something you would have to be careful about when crafting code?
> And, if it is a problem, is it something where you'd need to be
> concerned not just with shared objects, but also with shared pages
> (ie. would you need some measure of control over pages being updated
> from multiple concurrent processes to effectively deal with this
> issue)?
Locality is how I would name the problem, and together with the notion of encapsulation, it is something I am currently looking into.

A brief description of what I am up to can be found here: http://soft.vub.ac.be/~smarr/2010/07/doctoral-symposium-at-splash-2010/
Or even more fluffy here: http://soft.vub.ac.be/~smarr/2010/08/poster-at-splash10/



> Lastly, could you summarize how the design of this VM differs from the
> designs of other multithreaded VMs that have been implemented in the
> past?
Compared to your standard JVM or .NET CLR, the RoarVM provides a similar programming model, but the implementation is designed to experiment on the TILE architecture. And to reach that goal, the VM is kept simple. The heap is divided up into number_of_core parts which are each owned by a single core, and then split into a read-mostly and a read-write heap.
That is an optimization for the TILE64 chip with its restricted caching scheme.
A few more details have been discussed on this thread already.

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Göran Krampe
Hi all!

Indeed interesting times.

On 11/06/2010 08:30 PM, Stefan Marr wrote:
> I agree, that is a very low-level programming model, but that is the right foundation
> for us to experiment with ideas like Ly and Sly.
> Ly aims to be a language in which you program by embracing non-determenism and the goal is to
> find ways to benefit from the parallelism and the natural non-determinism of the system.
[SNIP]

> As Igor pointed out, if you want performance, you will avoid shared mutable state. The solution for such a counter would be to have local counters, and you synchronize only to get a global sum and only when it is really really necessary. But the optimal solution is very application specific.

I am quite interested in the current crop of NoSQL databases and for
example Riak - being my current "favorite" in that arena - has a
map/reduce mechanism for doing lots of its parallell magic.

So it would be interesting to craft a "map/reduce" system in Squeak
utilizing regular Smalltalk processes. Sure, you would need to know what
the heck you are doing - but hey, I think that will be the general case
for a few years to come until "magical languages" takes care of a lot of
this for us.

And yeah, Igor's atomic stuff couldn't be more timely, right? :)

> A brief description of what I am up to can be found here: http://soft.vub.ac.be/~smarr/2010/07/doctoral-symposium-at-splash-2010/
> Or even more fluffy here: http://soft.vub.ac.be/~smarr/2010/08/poster-at-splash10/

Will try to read between diaper changes. :)

regards, Göran