I'm going to be working on making squeak multi-core as part of
my grad work. I have a bit of work to do before I get to the stage where I actually have to implement a parallel code model, but I am thinking about it in the abstract as a mental exercise for now. I will most likely use the vats/islands system used by SqueakELib and Croquet, as it has been validated. But I have a pressing concern: How could this be used to do what seems (to me and my mentors) to be the cheapest, simplest, most obvious parallelism extraction ever: get Collection>>do:, collect:, etc. to run one element per native thread (= vat = island). However, I may be overlooking copying costs (code, objects, or both) that make this more a naive parallelization than I expect. Assuming the elements are read-only for the duration of the enumeration (the common case), they can be safely copied willy-nilly to whatever vat they need to be in to run (or use shared memory if it is available and fast). Can this be done using vats? Does copying kill it? I get asked about this problem every time I talk about vats. Everyone seems to think this (loop-level parallelism) is the simplest, most fool-proof way to make things faster on a multi-core system, and give me funny looks when I don't have a quick answer on how this works in the vats model. -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
I hope I am not just saying things you already know.
On Jan 24, 2008 12:34 AM, Matthew Fulmer <[hidden email]> wrote: > But I have a pressing concern: How could this be used to do what > seems (to me and my mentors) to be the cheapest, simplest, most > obvious parallelism extraction ever: get Collection>>do:, > collect:, etc. to run one element per native thread (= vat = > island). I'm not sure exactly what you are planning, because vat/island means different things to different people. Croquet Islands, for example, are not perfectly isolated from each other, as they would be if they were running in different address spaces. I'm going to assume that what you mean by "vat" is essentially a Squeak VM with its own garbage collector, running on a single core/processor/thread. Each object is in a vat. An object in one vat refers to an object in another vat by means of a proxy. Proxies are invisible most of the time. When you send a message to an object in another vat, references to objects in your own vat are converted automatically to proxies. Suppose you have a collection of objects, each in its own vat. Just saying objects do: [:each | each doYourStuff] won't make them run in parallel. You'll need to say objects do: [:each | [each doYourStuff] fork] You probably need a #parDo: method that does this automatically. You will need a way of synchronizing. A #parCollect: might be better, or a #parCollectFutures: which is equivalent to #parCollectFutures: aBlock ^objects collect: [:each | Future with: [aBlock value: each] fork] Naturally, there would be more efficient implementations of parDo: or parCollectFutures: because you probably don't want to fork Squeak processes, but instead you want to send asynchronous messages to the other vats and build up a structure of futures. But that is implementation details. The real point is that parallel #do: only makes sense if each element of the collection is in a different vat. It doesn't make sense to parallelize #do: on collections of numbers. The cost of moving the numbers around will outweigh the savings from parallelism. Making #do: parallel might be an easy way for application programmers to add a little parallelism to their program, but for most kinds of applications, you won't get much parallelism this way. And if making a parallel do: means that you have to write a parallel garbage collector, the price is probably too high. I think that the direction you are going is better. -Ralph Johnson |
In reply to this post by Tapple Gao
Hi Matthew,
That's very cool that you are working on multicore Squeak. Dave Ungar and I are just starting an effort in IBM Research on new VMs and programming models for large scale multicore systems (100's-1000's of cores). We are putting together a 1024 core system based on 16 TIlera 64 chips as a test bed. We are not anticipating ending up with a Squeak or Self VM, but something likely quite different. It would be interesting though to exchange ideas and approaches as you proceed in your research. Regards, Sam Sam S. Adams, IBM Distinguished Engineer, IBM Research Asst: Brenda Robinson, tie 444-1497, outside 919-254-1497 Mobile: 919-696-6064, email: [hidden email] <<Hebrews 11:6, Proverbs 3:5-6, Romans 1:16-17, I Corinthians 1:10>> Matthew Fulmer <[hidden email] m> To Sent by: The general-purpose Squeak squeak-dev-bounce developers list [hidden email] <[hidden email]. ndation.org org> cc 01/24/2008 01:34 Subject AM Vats, Islands, and Collections: Finding cheap parallelism Please respond to The general-purpose Squeak developers list <squeak-dev@lists .squeakfoundation .org> I'm going to be working on making squeak multi-core as part of my grad work. I have a bit of work to do before I get to the stage where I actually have to implement a parallel code model, but I am thinking about it in the abstract as a mental exercise for now. I will most likely use the vats/islands system used by SqueakELib and Croquet, as it has been validated. But I have a pressing concern: How could this be used to do what seems (to me and my mentors) to be the cheapest, simplest, most obvious parallelism extraction ever: get Collection>>do:, collect:, etc. to run one element per native thread (= vat = island). However, I may be overlooking copying costs (code, objects, or both) that make this more a naive parallelization than I expect. Assuming the elements are read-only for the duration of the enumeration (the common case), they can be safely copied willy-nilly to whatever vat they need to be in to run (or use shared memory if it is available and fast). Can this be done using vats? Does copying kill it? I get asked about this problem every time I talk about vats. Everyone seems to think this (loop-level parallelism) is the simplest, most fool-proof way to make things faster on a multi-core system, and give me funny looks when I don't have a quick answer on how this works in the vats model. -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
In reply to this post by Tapple Gao
In SqueakElib, each object in the collection would need to be placed in a
different Vat. Then you just run your Collection>>#do: or Collection>>#collect: over the elements and then any msgs sent to each element in the block will evaluate in the various vats. No special syntax is needed. In the case of #collect:, the evaluation will initially return promises, which will later resolve to the result of the msg send, which may be the result of the block. The blocks themselves are not remotely evaluated. Another way to go is to convert the collection to an eventual reference and send msgs to it, while still allocating the elements to different vats: aColl eventual collect: aBlock. These msgs would be evaluated in the current vat, except for those sent to the elements, but you would always get back an eventual ref to a collection holding promises for the results of the block evaluations. It would be more consistent to use this approach. Rob ----- Original Message ----- From: "Matthew Fulmer" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Wednesday, January 23, 2008 10:34 PM Subject: Vats, Islands, and Collections: Finding cheap parallelism > I'm going to be working on making squeak multi-core as part of > my grad work. I have a bit of work to do before I get to the > stage where I actually have to implement a parallel code model, > but I am thinking about it in the abstract as a mental exercise > for now. I will most likely use the vats/islands system used by > SqueakELib and Croquet, as it has been validated. > > But I have a pressing concern: How could this be used to do what > seems (to me and my mentors) to be the cheapest, simplest, most > obvious parallelism extraction ever: get Collection>>do:, > collect:, etc. to run one element per native thread (= vat = > island). > > However, I may be overlooking copying costs (code, objects, or > both) that make this more a naive parallelization than I expect. > Assuming the elements are read-only for the duration of the > enumeration (the common case), they can be safely copied > willy-nilly to whatever vat they need to be in to run (or use > shared memory if it is available and fast). > > Can this be done using vats? Does copying kill it? I get asked > about this problem every time I talk about vats. Everyone seems > to think this (loop-level parallelism) is the simplest, most > fool-proof way to make things faster on a multi-core system, and > give me funny looks when I don't have a quick answer on how this > works in the vats model. > > -- > Matthew Fulmer -- http://mtfulmer.wordpress.com/ > Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 > > |
In reply to this post by Sam Adams-2
Sam Adams wrote on Thu, 24 Jan 2008 10:00:15 -0500:
> It would be interesting though to exchange ideas and approaches as you > proceed in your research. I am interested in this as well. My current work only uses 12 cores but is based on the stuff I did for 64 node machines (and larger) back in the early 1990s. Too bad I have to write all my stuff in Portuguese... About Matthew's question, my suggestion is to implement Concurrent Aggregates. These would have a local representative in each Vat that would know about all of its "brothers". It would also store a subset of the collection locally. So if you have a 12 thousand element ConcurrentArray distributed among 128 Vats, then each part would hold around 94 elements. A message sent to any of the local representatives is repeated to all of the parts with the proper care to serialize messages arriving at different parts at the same time. On top of this you will want to build a programming model very similar to APL (see FScript). The advantage of building the system in layers like this is that neither hiding all the details nor exposing everything work well for all applications. If you can select from different layers as needed for your specific program then you will be tune things for the best performance with the least code. My rule of thumb is to roughly match the number of Vats and available cores. A single Vat per core is not efficient because the whole core will be idle whenever the software blocks, but too many Vats on a core will cause a lot of switching overhead. My particular hardware design supports eight user Vats and eight system ones on each core. -- Jecel |
In reply to this post by Sam Adams-2
Sam and Jecel,
Thank you both for forwarding this to me. And I wanted to second Sam's message to anyone out there working on multicore models or models for massively parallel computation from a Smalltalkish point of view. Let's exchange ideas and approaches as we explore this new territory! - David On Jan 24, 2008, at 7:00 AM, Sam Adams wrote: > Hi Matthew, > That's very cool that you are working on multicore Squeak. > Dave Ungar and I are just starting an effort in IBM Research on new > VMs and > programming models for large scale multicore systems (100's-1000's of > cores). > We are putting together a 1024 core system based on 16 TIlera 64 > chips as a > test bed. We are not anticipating ending up with a Squeak or Self > VM, but > something likely quite different. > It would be interesting though to exchange ideas and approaches as you > proceed in your research. > > Regards, > Sam > > Sam S. Adams, IBM Distinguished Engineer, IBM Research > Asst: Brenda Robinson, tie 444-1497, outside 919-254-1497 > Mobile: 919-696-6064, email: [hidden email] > <<Hebrews 11:6, Proverbs 3:5-6, Romans 1:16-17, I Corinthians 1:10>> > > > > > Matthew Fulmer > <[hidden email] > > m> To > Sent by: The general-purpose Squeak > squeak-dev-bounce developers list > [hidden email] <[hidden email] > . > ndation.org org> > cc > > 01/24/2008 01:34 > Subject > AM Vats, Islands, and Collections: > Finding cheap parallelism > > Please respond to > The > general-purpose > Squeak developers > list > <squeak-dev@lists > .squeakfoundation > .org> > > > > > > > I'm going to be working on making squeak multi-core as part of > my grad work. I have a bit of work to do before I get to the > stage where I actually have to implement a parallel code model, > but I am thinking about it in the abstract as a mental exercise > for now. I will most likely use the vats/islands system used by > SqueakELib and Croquet, as it has been validated. > > But I have a pressing concern: How could this be used to do what > seems (to me and my mentors) to be the cheapest, simplest, most > obvious parallelism extraction ever: get Collection>>do:, > collect:, etc. to run one element per native thread (= vat = > island). > > However, I may be overlooking copying costs (code, objects, or > both) that make this more a naive parallelization than I expect. > Assuming the elements are read-only for the duration of the > enumeration (the common case), they can be safely copied > willy-nilly to whatever vat they need to be in to run (or use > shared memory if it is available and fast). > > Can this be done using vats? Does copying kill it? I get asked > about this problem every time I talk about vats. Everyone seems > to think this (loop-level parallelism) is the simplest, most > fool-proof way to make things faster on a multi-core system, and > give me funny looks when I don't have a quick answer on how this > works in the vats model. > > -- > Matthew Fulmer -- http://mtfulmer.wordpress.com/ > Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 > > > |
In which case I can't help but point out the Hydra VM which is a Squeak
VM that will run on multiple cores ;-) http://jabberwocky.croquetproject.org:8889/HydraVM.html http://squeakvm.org/svn/squeak/branches/qwaq/ The base idea is to take a few steps forward to having both, some immediate benefits from being able to run multiple images in one process as well as having a breeding ground for new ideas. Cheers, - Andreas [hidden email] wrote: > Sam and Jecel, > > Thank you both for forwarding this to me. > And I wanted to second Sam's message to anyone out there working on > multicore models or models for massively parallel computation from a > Smalltalkish point of view. > Let's exchange ideas and approaches as we explore this new territory! > > - David > > On Jan 24, 2008, at 7:00 AM, Sam Adams wrote: > >> Hi Matthew, >> That's very cool that you are working on multicore Squeak. >> Dave Ungar and I are just starting an effort in IBM Research on new >> VMs and >> programming models for large scale multicore systems (100's-1000's of >> cores). >> We are putting together a 1024 core system based on 16 TIlera 64 chips >> as a >> test bed. We are not anticipating ending up with a Squeak or Self VM, but >> something likely quite different. >> It would be interesting though to exchange ideas and approaches as you >> proceed in your research. >> >> Regards, >> Sam >> >> Sam S. Adams, IBM Distinguished Engineer, IBM Research >> Asst: Brenda Robinson, tie 444-1497, outside 919-254-1497 >> Mobile: 919-696-6064, email: [hidden email] >> <<Hebrews 11:6, Proverbs 3:5-6, Romans 1:16-17, I Corinthians 1:10>> >> >> >> >> >> Matthew Fulmer >> <[hidden email] >> m> To >> Sent by: The general-purpose Squeak >> squeak-dev-bounce developers list >> [hidden email] <[hidden email]. >> ndation.org org> >> cc >> >> 01/24/2008 01:34 Subject >> AM Vats, Islands, and Collections: >> Finding cheap parallelism >> >> Please respond to >> The >> general-purpose >> Squeak developers >> list >> <squeak-dev@lists >> .squeakfoundation >> .org> >> >> >> >> >> >> >> I'm going to be working on making squeak multi-core as part of >> my grad work. I have a bit of work to do before I get to the >> stage where I actually have to implement a parallel code model, >> but I am thinking about it in the abstract as a mental exercise >> for now. I will most likely use the vats/islands system used by >> SqueakELib and Croquet, as it has been validated. >> >> But I have a pressing concern: How could this be used to do what >> seems (to me and my mentors) to be the cheapest, simplest, most >> obvious parallelism extraction ever: get Collection>>do:, >> collect:, etc. to run one element per native thread (= vat = >> island). >> >> However, I may be overlooking copying costs (code, objects, or >> both) that make this more a naive parallelization than I expect. >> Assuming the elements are read-only for the duration of the >> enumeration (the common case), they can be safely copied >> willy-nilly to whatever vat they need to be in to run (or use >> shared memory if it is available and fast). >> >> Can this be done using vats? Does copying kill it? I get asked >> about this problem every time I talk about vats. Everyone seems >> to think this (loop-level parallelism) is the simplest, most >> fool-proof way to make things faster on a multi-core system, and >> give me funny looks when I don't have a quick answer on how this >> works in the vats model. >> >> -- >> Matthew Fulmer -- http://mtfulmer.wordpress.com/ >> Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 >> >> >> > > > |
In reply to this post by Tapple Gao
> I'm going to be working on making squeak multi-core as part of
> my grad work. I have a bit of work to do before I get to the > stage where I actually have to implement a parallel code model, > but I am thinking about it in the abstract as a mental exercise > for now. I will most likely use the vats/islands system used by > SqueakELib and Croquet, as it has been validated. > > But I have a pressing concern: How could this be used to do what > seems (to me and my mentors) to be the cheapest, simplest, most > obvious parallelism extraction ever: get Collection>>do:, > collect:, etc. to run one element per native thread (= vat = > island). One thing these questions point out is, what do you want all these cores to be used *for*? If you want to compute heat transfer and stress analysis of some metal thingy then you may be led toward one kind of a multi-core Smalltalk implementation. If you want to compute a world-wide factory and warehouse distribution optimizer then you may be led toward a different multi-core Smalltalk implementation. If you want to compute "ebay" then you may be led toward a third kind of multi-core Smalltalk implementation. So start with some interesting problems then come up with some interesting solutions for one or more of those. Otherwise anything "might" work and nothing will constrain your decisions. |
On Thu, Jan 24, 2008 at 05:02:02PM -0800, Patrick Logan wrote:
> > I'm going to be working on making squeak multi-core as part of > > my grad work. I have a bit of work to do before I get to the > > stage where I actually have to implement a parallel code model, > > but I am thinking about it in the abstract as a mental exercise > > for now. I will most likely use the vats/islands system used by > > SqueakELib and Croquet, as it has been validated. > > > > But I have a pressing concern: How could this be used to do what > > seems (to me and my mentors) to be the cheapest, simplest, most > > obvious parallelism extraction ever: get Collection>>do:, > > collect:, etc. to run one element per native thread (= vat = > > island). > > One thing these questions point out is, what do you want all these > cores to be used *for*? My mentor's prefered demo app is a ray-tracer. Computatation-intensive floating stuff. Scientific applications, image processing, and all that. And maybe a tricksy renderer for Morphic 3 :) -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
In reply to this post by ungar
(I'm working on this as well.) -C -- Craig Latta improvisational musical informaticist www.netjam.org Smalltalkers do: [:it | All with: Class, (And love: it)] |
In reply to this post by Tapple Gao
>
> My mentor's prefered demo app is a ray-tracer. > Computatation-intensive floating stuff. Scientific applications, > image processing, and all that. > > And maybe a tricksy renderer for Morphic 3 :) > Yes, raytracing is a good example, where we can easily see the benefits of parallelism. Interesting, that by increasing number of cores for desktop PC, its now possible to create real-time raytracing renderers which making whole GPU industry look stupid way to go, obsolete and losing perspective, because there is no need in creating specialized hardware for rendering graphics, when CPU can do the same (and even more). -- Best regards, Igor Stasenko AKA sig. |
On Jan 25, 2008 9:17 AM, Igor Stasenko <[hidden email]> wrote:
> > Yes, raytracing is a good example, where we can easily see the > benefits of parallelism. > Interesting, that by increasing number of cores for desktop PC, its > now possible to create real-time raytracing renderers which making > whole GPU industry look stupid way to go, obsolete and losing > perspective, because there is no need in creating specialized hardware > for rendering graphics, when CPU can do the same (and even more). Oh, but I like having different CPUs for different functions. I would like to have a CPU(s) that handled only the file related things (including searching and stuff like that), CPUs for graphics, etc., rather then a central set of CPUs that everything has to pass through. Kind of the way repetitive tasks are pushed down into muscle memory so your brain doesn't need to be burdened with it anymore. |
In reply to this post by ungar
[hidden email] wrote:
> Thank you both for forwarding this to me. > And I wanted to second Sam's message to anyone out there working on > multicore models or models for massively parallel computation from a > Smalltalkish point of view. > Let's exchange ideas and approaches as we explore this new territory! And don't forget that there is many years the multi-code Smalltalk already out there: Gemstone Smalltalk, production ready, 64bit, 1000nds of CPUs/cores etc. I think many Gemstone approaches handling parallelism can be used more broadly too. And now when Gemstone Smalltalk is becoming more and more like Squeak, even more. Janko > On Jan 24, 2008, at 7:00 AM, Sam Adams wrote: > >> Hi Matthew, >> That's very cool that you are working on multicore Squeak. >> Dave Ungar and I are just starting an effort in IBM Research on new >> VMs and >> programming models for large scale multicore systems (100's-1000's of >> cores). >> We are putting together a 1024 core system based on 16 TIlera 64 chips >> as a >> test bed. We are not anticipating ending up with a Squeak or Self VM, but >> something likely quite different. >> It would be interesting though to exchange ideas and approaches as you >> proceed in your research. >> >> Regards, >> Sam >> >> Sam S. Adams, IBM Distinguished Engineer, IBM Research >> Asst: Brenda Robinson, tie 444-1497, outside 919-254-1497 >> Mobile: 919-696-6064, email: [hidden email] >> <<Hebrews 11:6, Proverbs 3:5-6, Romans 1:16-17, I Corinthians 1:10>> >> >> I'm going to be working on making squeak multi-core as part of >> my grad work. I have a bit of work to do before I get to the >> stage where I actually have to implement a parallel code model, >> but I am thinking about it in the abstract as a mental exercise >> for now. I will most likely use the vats/islands system used by >> SqueakELib and Croquet, as it has been validated. >> >> But I have a pressing concern: How could this be used to do what >> seems (to me and my mentors) to be the cheapest, simplest, most >> obvious parallelism extraction ever: get Collection>>do:, >> collect:, etc. to run one element per native thread (= vat = >> island). >> >> However, I may be overlooking copying costs (code, objects, or >> both) that make this more a naive parallelization than I expect. >> Assuming the elements are read-only for the duration of the >> enumeration (the common case), they can be safely copied >> willy-nilly to whatever vat they need to be in to run (or use >> shared memory if it is available and fast). >> >> Can this be done using vats? Does copying kill it? I get asked >> about this problem every time I talk about vats. Everyone seems >> to think this (loop-level parallelism) is the simplest, most >> fool-proof way to make things faster on a multi-core system, and >> give me funny looks when I don't have a quick answer on how this >> works in the vats model. >> >> -- >> Matthew Fulmer -- http://mtfulmer.wordpress.com/ >> Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
In reply to this post by Andreas.Raab
On Thu, Jan 24, 2008 at 11:38:26AM -0800, Andreas Raab wrote:
> In which case I can't help but point out the Hydra VM which is a Squeak VM > that will run on multiple cores ;-) > > http://jabberwocky.croquetproject.org:8889/HydraVM.html > http://squeakvm.org/svn/squeak/branches/qwaq/ > > The base idea is to take a few steps forward to having both, some immediate > benefits from being able to run multiple images in one process as well as > having a breeding ground for new ideas. What is the best image to use with HydraVM, or can any be used? Would a croquet image be best? I'm trying it now. -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
In reply to this post by ungar
David,
> Let's exchange ideas and approaches as we explore this new territory! We should find a good place for this discussion as it will tend to get more and more off-topic for the squeak-dev list. There is now a squeak-hardware list, but for parallel implementations on existing hardware that probably isn't the best place. There is one significant difference between my current work and what I did in the 1990s. Back then I started out with the model of one thread per object and then first relaxed that by allowing read-only objects to execute in their sender's thread (and to be replicated freely among all nodes). The plan was that besides the PICs the optimizing compiler would also have access to information gathered by the "future objects". By knowing which message sends are effectively causing parallel execution, it can decide which ones to inline (converting from parallel to sequential). So objects would be gathered into fewer and fewer threads over time until the best match between the number of nodes/cores and the number of "vats" was reached. In my current project the programmer manually distributes the objects among the vats/islands and that distribution does not change at run time. This was done to make it simpler for children to understand the system (the same object grouping is used for several unrelated purposes: virtual memory, concurrency, protection, reflection and so on), but it is more limiting than the previous design. -- Jecel |
In reply to this post by Tapple Gao
On 25/01/2008, Matthew Fulmer <[hidden email]> wrote:
> On Thu, Jan 24, 2008 at 11:38:26AM -0800, Andreas Raab wrote: > > In which case I can't help but point out the Hydra VM which is a Squeak VM > > that will run on multiple cores ;-) > > > > http://jabberwocky.croquetproject.org:8889/HydraVM.html > > http://squeakvm.org/svn/squeak/branches/qwaq/ > > > > The base idea is to take a few steps forward to having both, some immediate > > benefits from being able to run multiple images in one process as well as > > having a breeding ground for new ideas. > > What is the best image to use with HydraVM, or can any be used? > Would a croquet image be best? I'm trying it now. I tried multiple images with it, including Croquet and Squeak 3.8 - 3.10 images. Everything seems working. Things going different, for images which you can run in non-main interpreter. There are number of limitations: there is about 100% chance, that it may crash, if non-main interpreter try using primitives, which belong to UI and different IO/plugins. The exception is Sockets, Files and FFI, which already prepared for work in multi-threaded environment. Currently i'm doing what is needed to, to able run stock image in non-main thread without crashes (primitives should fail and interpreter should work stable). -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |