On 2/22/08, Stephen Pair <[hidden email]> wrote: I must say, this is a really impressive development. I really think this is the right way to approach multi-core systems. I disagree about it being the right approach in the long term. In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out. In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal. I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
On 23/02/2008, Michael van der Gulik <[hidden email]> wrote:
> > > On 2/22/08, Stephen Pair <[hidden email]> wrote: > > I must say, this is a really impressive development. I really think this > is the right way to approach multi-core systems. > > > > > > > I disagree about it being the right approach in the long term. > > In the short term, the Hydra VM allows the use of multiple cores without > large changes to the core of Squeak, which is good and IMHO the right > decision for a quick and reliable solution (for whoever Igor is doing his > work for... Qwaq?). The disadvantage with the Hydra VM is that all > inter-process communication needs to go through a pipe; this makes sharing > objects and synchronising access while still getting good performance more > difficult. I can't back up my claims yet; we'll see how Hydra VM works out. > > In the long term, a VM that can run its green threads (aka Process) on > multiple OS threads (aka pthreads) should be the long-term goal. > > I can't imagine that hacking the VM with multiple processes, per-process > state and a global VM lock for garbage collection and new object creation > would be too difficult. The global VM lock > would kill scalability and could make > object creation slow, but it should still get some speedup on multi-cored > CPUs. More advanced VMs with per-thread eden space would take a bit longer > to write. > The major challenge with multi-core over single shared object memory is writing GC, because GC is the most complex part of squeak VM. Now imagine adding concurrent-aware features to it.. When you'll have such GC, the rest will look like piece of cake :) P.S. Global lock suck, you need to pick something less disastrous :) I read some papers describing run-time GCs and background GCs, running in separate thread. The question is, that adopting them to current object model may be impossible without changing the model itself. > Gulik. > > -- > http://people.squeakfoundation.org/person/mikevdg > http://gulik.pbwiki.com/ > > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Michael van der Gulik-2
A compromise approach would be to allow something like Erlang's
processes to run on each CPU within the same image. You would still be required to copy any object that pass the process boundary, but the advantages are separate GC's for each process and the only kernel level synchronizations would be via asynch. queues. It would end up very much like the Hydra model (insofar as I understand it), but without the fully IPC context switch. To avoid confusion, these processes would not map 1-to-1 to Squeak processes, which would continue as normal. These would be special, uber-cpu processes. Michael van der Gulik wrote: > In the long term, a VM that can run its green threads (aka Process) on > multiple OS threads (aka pthreads) should be the long-term goal. > > I can't imagine that hacking the VM with multiple processes, > per-process state and a global VM lock for garbage collection and new > object > creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, > but it should still get some speedup on multi-cored CPUs. More > advanced VMs with per-thread eden space would take a bit longer to write. -- Jeffrey Straszheim http://straszheim.50megs.com |
In reply to this post by Igor Stasenko
On 2/23/08, Igor Stasenko <[hidden email]> wrote:
I know. It's a simple and implementable solution and would be a good first attempt at making a multi-threaded VM. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Michael van der Gulik-2
On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:
this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out. Josh
|
On 2/23/08, Joshua Gargus <[hidden email]> wrote:
Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem. Or why would any other concurrent application not have this problem? Real operating systems implement some form of processor affinity[1] to keep cache on a single processor. The same could be done for the Squeak scheduler. I'm sure that the scheduling algorithm could be tuned to minimize cache invalidations. [1] http://en.wikipedia.org/wiki/Processor_affinity Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Michael van der Gulik-2
On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik
<[hidden email]> wrote: > > I disagree about it being the right approach in the long term. The correct mid-term approach is to do what Erlang did: Have one image, and one OS-thread per *schedular*. Then when new processes run they get a particular scheduler. All IO is non-blocking, etc. The long term will be to remove the OS threads, as when we have 100's of cores memory sharing simply wont be possible. > In the short term, the Hydra VM allows the use of multiple cores without > large changes to the core of Squeak, which is good and IMHO the right > decision for a quick and reliable solution (for whoever Igor is doing his > work for... Qwaq?). The disadvantage with the Hydra VM is that all > inter-process communication needs to go through a pipe; this makes sharing > objects and synchronising access while still getting good performance more > difficult. I can't back up my claims yet; we'll see how Hydra VM works out. Fine-grained locking should be considered as obsolete as manual memory management (at least at language level. The VM can do it internally so long as it's hidden. Like memory management). |
In reply to this post by Michael van der Gulik-2
On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:
Because within HydraVM, each VM has it's own ObjectMemory in a single, contiguous chunk of memory. Below, you mention processor-affinity. This is certainly necessary, but is orthogonal to the issue. Let's simplify the discussion by assuming that the number of VMs is <= the number of cores, and that each VM is pinned to a different core. 32-bit CPU caches typically work on 4KB pages of memory. You can fit quite a few objects in 4KB. The problem is that is processor A and processor B are operating in the same ObjectMemory, they don't have to even touch the same object to cause cache contention... they merely have to touch objects on the same memory page. Can you provide a formal characterization of worst-case and average-case performance under a variety of application profiles? I wouldn't know where to start. Happily, HydraVM doesn't have to worry about this, because each thread operates on a separate ObjectMemory. Or why would any other concurrent application not have this problem?
As I described above, the problem is not simply ensuring that each thread tends to run on the same processor. I believe that you're overlooking a crucial aspect of real-world processor-affinity schemes: when a Real Operating System pins a process to a particular processor, the memory for that process is only touched by that processor. I haven't had a chance to take more than a glance at it, but Ulrich Draper from Red Hat has written a paper named "What Every Programmer Should Know About Memory". It's dauntingly comprehensive. (What Every Programmer Should Know About Memory) It might help to think of a multi-core chip as a set of separate computers connected by a network (I don't have the reference off-hand, but I've seen an Intel whitepaper that explicitly takes this viewpoint). It's expensive and slow to send messages over the network to ensure that my cached version of an object isn't stale. In general, it's better to structure our computation so that we know exactly when memory needs to be touched by multiple processors. Cheers, Josh
|
In reply to this post by Jason Johnson-5
On 2/23/08, Jason Johnson <[hidden email]> wrote: On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik I'd agree on that one. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Jason Johnson-5
Jason Johnson wrote:
> On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik > <[hidden email]> wrote: >> I disagree about it being the right approach in the long term. > > The correct mid-term approach is to do what Erlang did: Have one > image, and one OS-thread per *scheduler*. Then when new processes run > they get a particular scheduler. What is the advantage of doing this compared to Hydra? Cheers, - Andreas |
In reply to this post by Joshua Gargus-2
On 2/23/08, Joshua Gargus <[hidden email]> wrote:
Well... we'll revisit this when we actually have a VM capable of running a single image on multiple threads.
Thanks for the link; I'll read it tomorrow. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Andreas.Raab
On 2/23/08, Andreas Raab <[hidden email]> wrote: Jason Johnson wrote: Access to shared objects is much easier. In the above scenario, they're just there - normal objects - that can be used by multiple Processes concurrently. With Hydra, you need some form of inter-image communication, which is a lot more work. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
Michael van der Gulik wrote:
> > > On 2/23/08, *Andreas Raab* <[hidden email] > <mailto:[hidden email]>> wrote: > > Jason Johnson wrote: > > On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik > > <[hidden email] <mailto:[hidden email]>> wrote: > >> I disagree about it being the right approach in the long term. > > > > The correct mid-term approach is to do what Erlang did: Have one > > > image, and one OS-thread per *scheduler*. Then when new > processes run > > > they get a particular scheduler. > > > What is the advantage of doing this compared to Hydra? > > > > Access to shared objects is much easier. In the above scenario, they're > just there - normal objects - that can be used by multiple Processes > concurrently. With Hydra, you need some form of inter-image > communication, which is a lot more work. you forgot that Erlang doesn't even allow for mutable shared objects. It only has processes communicating with each other and variables defined once cannot be changed later on. Furthermore SMP machines don't scale well for the same reasons global locks don't scale well. Thus some sophisticated techniques are needed. NUMA is one of them and starts to completely separate CPUs and their memory but providing a fast message bus between them. So while almost any multiprocess architecture not sharing any memory like Erlang and Hydra (?) will be able to compete with this because they only rely on message passing, shared memory architectures will stuck on SMP machines. However, there they will outperform non-shared, I think. Nevertheless, IMHO shared-memory architectures will always stay more complex to develop and program with. Regards, Martin |
In short: Less sharing - less contention. More sharing - more contention.
If you put 2 points on a line and call them 'no sharing' and 'share everything', then any system which allows you run on multiple cores and operate over single domain (be it single memory or multiple standalone memories) is lies somewhere in the middle. You can pick a starting point from where you moving to that golden point - from 'share everything' or from 'share nothing'. But it's no doubt, no matter from where you started, you will always move towards 'golden' middle point. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Michael van der Gulik-2
Hi Guilk, all,
"correct" is a stong word heavily coupled to the paradigm who has made it born. "Correct" things for procedural processors are procedural languages. We don't have object oriented processors yet. We are not using decent hardware for make this technology to run. We are forced to make trade offs due to lack of better resources. I'm glad to see that simplicity in the object paradigm is prioritized. I'm skeptic on extremely complex machines. Specially for scale matters. What Igor made is to create a network of squeaks working in one machine as one. A network scales well. That's a powerful idea. Its simplicity is its strenght. I think the Hydra concept is a pragmatically brilliant choice, cheers, Sebastian Sastre ________________________________ De: [hidden email] [mailto:[hidden email]] En nombre de Michael van der Gulik Enviado el: Sábado, 23 de Febrero de 2008 00:02 Para: The general-purpose Squeak developers list Asunto: [squeak-dev] The "correct" approach to multi-core systems. On 2/22/08, Stephen Pair <[hidden email]> wrote: I must say, this is a really impressive development. I really think this is the right way to approach multi-core systems. I disagree about it being the right approach in the long term. In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out. In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal. I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Michael van der Gulik-2
On Sat, Feb 23, 2008 at 4:43 AM, Michael van der Gulik <[hidden email]> wrote:
Michael, people here are just trying to help you save a whole lot of work. There is educational value in the work, but you really do need to think about both process affinity and concurrent access to shared memory. Both are equally important (at least for today's architectures). Intel's manuals are all online, all you have to do is read them to get an idea about the cost of concurrent access to shared memory.
- Stephen |
In reply to this post by Andreas.Raab
On Sat, Feb 23, 2008 at 10:39 AM, Andreas Raab <[hidden email]> wrote:
> > > > The correct mid-term approach is to do what Erlang did: Have one > > image, and one OS-thread per *scheduler*. Then when new processes run > > > they get a particular scheduler. > > What is the advantage of doing this compared to Hydra? > > Cheers, > - Andreas Sorry for the delayed response. I'm not familiar with what Hydra is doing and I didn't mean my comment as a comparison. I was simply responding to the comment about what is the best mid/long term approach. As far as what advantage this approach provides in general: it allows the VM to fully take advantage of multiple threads on a system without exposing "real" threading to the language. I said this is the best *mid-term* approach because even this wont be tenable once we reach a certain amount of cores. Everyone keeps finding a way to use more cores under the old model, but it's getting more and more complex, at some point it just wont push any further and then we will have to switch completely away from share-memory. At that point having n-threads per CPU probably wont buy anything anymore. |
In reply to this post by Igor Stasenko
On Sat, Feb 23, 2008 at 1:51 PM, Igor Stasenko <[hidden email]> wrote:
> In short: Less sharing - less contention. More sharing - more contention. > > If you put 2 points on a line and call them 'no sharing' and 'share > everything', then > any system which allows you run on multiple cores and operate over > single domain (be it single memory or multiple standalone memories) is > lies somewhere in the middle. > > You can pick a starting point from where you moving to that golden > point - from 'share everything' or from 'share nothing'. But it's no > doubt, no matter from where you started, you will always move towards > 'golden' middle point. But the question is, where do you make your trade offs. If you take the simple way *for you* then just give access to threading to everyone and let them suffer with the pain of a paradigm too complex to be done correctly. If you take the way that's simply for *everyone else* then you put this sharing inside the VM in the places it makes since and hide it from the language level (e.g. how at least Erlang does it) |
On 02/03/2008, Jason Johnson <[hidden email]> wrote:
> On Sat, Feb 23, 2008 at 1:51 PM, Igor Stasenko <[hidden email]> wrote: > > In short: Less sharing - less contention. More sharing - more contention. > > > > If you put 2 points on a line and call them 'no sharing' and 'share > > everything', then > > any system which allows you run on multiple cores and operate over > > single domain (be it single memory or multiple standalone memories) is > > lies somewhere in the middle. > > > > You can pick a starting point from where you moving to that golden > > point - from 'share everything' or from 'share nothing'. But it's no > > doubt, no matter from where you started, you will always move towards > > 'golden' middle point. > > > But the question is, where do you make your trade offs. If you take > the simple way *for you* then just give access to threading to > everyone and let them suffer with the pain of a paradigm too complex > to be done correctly. > > If you take the way that's simply for *everyone else* then you put > this sharing inside the VM in the places it makes since and hide it > from the language level (e.g. how at least Erlang does it) > > everything else in smalltalk. Any 'magic' should be a code which i can read and change, placed in image, not in VM. No-magic is the spirit of smalltalk, after all. -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko a écrit :
> On 02/03/2008, Jason Johnson <[hidden email]> wrote: >> But the question is, where do you make your trade offs. If you take >> the simple way *for you* then just give access to threading to >> everyone and let them suffer with the pain of a paradigm too complex >> to be done correctly. >> >> If you take the way that's simply for *everyone else* then you put >> this sharing inside the VM in the places it makes since and hide it >> from the language level (e.g. how at least Erlang does it) >> >> > I'd vote for *everyone* - put threading control at language side, as > everything else in smalltalk. Any 'magic' should be a code which i can > read and change, placed in image, not in VM. > No-magic is the spirit of smalltalk, after all. > Yes but the spirit is also to build a VM able to hide some low level details like memory allocation... Smalltalk programmers are released from these release problems... Free to concentrate on higher level problems. Wouldn't this apply to threads too? Nicolas |
Free forum by Nabble | Edit this page |