How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.
|
This is not my area but I imagine that somehow Squeak processes should map
to OS native threads paralellizable by each of the cores. Any chance to Exupery be of some help on that? I ask because if it is then is a must for that future. regards, Sebastian Sastre > -----Mensaje original----- > De: [hidden email] > [mailto:[hidden email]] En > nombre de gruntfuttuck > Enviado el: MiƩrcoles, 17 de Octubre de 2007 06:10 > Para: [hidden email] > Asunto: Multy-core CPUs > > > How is squeak going to handle multy-core CPUs, if at all? If > we see cores of 100 plus in the future and squeak stay as it > is, I would imagine other languages such as erlang, will look > more attractive. > -- > View this message in context: > http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.
What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions. No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery. Steve On 10/17/07,
Sebastian Sastre <[hidden email]> wrote: This is not my area but I imagine that somehow Squeak processes should map |
In reply to this post by gruntfuttuck
Hi all! |
> Have you knowledge of tools to retrieve metrics on Smalltalk code like
> McCabe, coupling, NCSS...? http://moose.unibe.ch/ Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
In reply to this post by Steve Wart
Mmmm original, yeah thats a very different
approach. Hard to say which one is best. But for your comments, maybe the
primitives way is a path *better* for the human beigns that program
the system in terms of easing that pain.
Question: that would be a path that
prioritizes usability at samlltalk developer level? if so, for me is more
interesting even if it is less efficient than the other in terms of a couple of
more or less [whatever measure unit] per second that in one year will be
duplicated with a cpu with 2 more cores for a few u$s.
Not prioritizing
usability and intelectual ergonomy is equal to not geting the point of all this
smalltalk thing, perhaps even more.. all TI thing. Just a
thought.
I'm quite sure that multicore this
is the begining of a new crisis for the industry. But is a good
one!
cheers,
|
In reply to this post by Steve Wart
Steve Wart schrieb:
> I don't know if mapping Smalltalk processes to native threads is the > way to go, given the pain I've seen in the Java and C# space. > > What might be interesting is to develop low-level primitives (along > the lines of the famed map/reduce operations) that provide parallel > processing versions of commonly used collection functions. > > No idea how easy this would be to do, but on the surface seems more > promising than trying to do process/thread jiggery pokery. This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation. IMO, a combination of native threads and green threads would be the best (although it still has the problem of parallel GC): The VM runs a small fixed number of native threads (default: number of available cores, but could be a little more to efficiently handle blocking calls to external functions) which compete for the runnable Smalltalk processes. That way, a number of processes could be active at any one time instead of just one. The synchronization overhead in the process-switching primitives should be negligible compared to the overhead needed for GC synchronization. The simple yet efficient ObjectMemory of current Squeak can not be used with parallel threads (at least not without significant synchronization overhead). AFAIK, efficient algorithms require every thread to have its own object allocation area to avoid contention on object allocations. Tenuring (making young objects old) and storing new objects into old objects (remembered table) require synchronization. In other words, grafting a threadsafe object memory onto Squeak would be a major project. In contrast, for a significant subset of applications (servers) it is orders of magnitudes simpler to run several images in parallel. Those images don't stomp on each other's object memory, so there is absolutely no synchronization overhead. For stateful sessions, a front end can handle routing requests to the image which currently holds a session's state, stateless requests can be handled by any image. Cheers, Hans-Martin |
In reply to this post by Lukas Renggli
Thanks Lukas.
Davide Lukas Renggli wrote: >> Have you knowledge of tools to retrieve metrics on Smalltalk code like >> McCabe, coupling, NCSS...? > > http://moose.unibe.ch/ > > Lukas > |
In reply to this post by Steve Wart
On 10/17/07, Steve Wart <[hidden email]> wrote:
> I don't know if mapping Smalltalk processes to native threads is the way to > go, given the pain I've seen in the Java and C# space. Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard. I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism. -Ralph Johnson |
Hey this sounds a an interesting path to me. If we think in nature and it's
design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain. Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works. Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users. Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc). I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks. Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling. The truth is that this is all speculative and I don't know. But I do trust in nature. Cheers, Sebastian Sastre > -----Mensaje original----- > De: [hidden email] > [mailto:[hidden email]] En > nombre de Ralph Johnson > Enviado el: Jueves, 18 de Octubre de 2007 08:09 > Para: The general-purpose Squeak developers list > Asunto: Re: Multy-core CPUs > > On 10/17/07, Steve Wart <[hidden email]> wrote: > > I don't know if mapping Smalltalk processes to native > threads is the > > way to go, given the pain I've seen in the Java and C# space. > > Shared-memory parallelism has always been difficult. People > claimed it was the language, the environment, or they needed > better training. > They always thought that with one more thing, they could "fix" > shared-memory parallelism and make it usable. But Java has > done a good job with providiing reasonable language > primitives. There has been a lot of work on making threads > efficient, and plenty of people have learned to write > mutli-threaded Java. But it is still way too hard. > > I think that shared-memory parallism, with explicit > synchronization, is a bad idea. Transactional memory might > be a solution, but it eliminates explicit synchronization. I > think the most likely solution is to avoid shared memory > altogether, and go with message passing. > Erlang is a perfect example of this. We could take this > approach in Smalltalk by making minimal images like Spoon, > making images that are designed to be used by other images > (angain, like Spoon), and then implementing our systms as > hundreds or thousands of separate images. > Image startup would have to be very fast. I think that this > is more likely to be useful than rewriting garbage collectors > to support parallelism. > > -Ralph Johnson > |
On 18/10/2007, Sebastian Sastre <[hidden email]> wrote:
> Hey this sounds a an interesting path to me. If we think in nature and it's > design, that images could be analog to cells of a larger body. Fragmentation > keep things simple without compromising scalability. Natural facts concluded > that is more efficient not to develop few supercomplex brain cells but to > develop zillions of a far simpler brain cells, this is, that are just > complex enough, and make them able to setup in an inimaginable super complex > network: a brain. > > Other approach that also makes me conclude this is interesting is that we > know that one object that is too smart smells bad. I mean it easily starts > to become less flexible so less scalable in complexity, less intuitive (you > have to learn more about how to use it), more to memorize, maintain, > document, etc. So it is smarter but it could happen that it begins to become > a bad deal because of beign too costly. Said so, if we think in those > flexible mini images as objects, each one using a core we can scale > enourmusly and almost trivially in this whole multicore thing and in a way > we know it works. > > Other interesting point is faul tolerance. If one of those images happen to > pass a downtime (because a power faliure on the host where they where > running or whatever reason) the system could happen to feel it somehow but > not being in a complete faiure because there are other images to handle > demand. A small (so efficient), well protected critical system can > coordinate measures of contention for the "crisis" an hopefully the system > never really makes feel it's own crisis to the users. > > Again I found this is a tradeof about when to scale horizontally or > vertically. For hardware, Intel and friends have scaled vertically (more > bits and Hz for instance) for years as much as they where phisically able to > do it. Now they reached a kind of barrier and started to scale horizontally > (adding cores). Please don't fall in endless discussions, like the ones I > saw out there, about comparing apples with bannanas because they are fruits > but are not comparable. I mean it's about scaling but they are 2 different > axis of a multidimensional scaling (complexity, load, performance, etc). > > I'm thinking here as vertical being to make one squeak smarter to be capable > to be trhead safe and horizontal to make one smart network of N squeaks. > > Sometimes one choice will be a good business and sometimes it will be the > other. I feel like the horizontal time has come. If that's true, to invest > (time, $, effort) now in vertical scaling could happen to be have a lower > cost/benefit rate if compared to the results of the investiment of > horizontal scaling. > > The truth is that this is all speculative and I don't know. But I do trust > in nature. > I often thought myself about making an ST 'vertical' (by making it multithreaded with single shared memory). Now, after reading this post i think your approach is much better. Then i think, it would be good to make some steps towards supporting multiple images by single executable: - make single executable capable of running a number of images in separate native threads. This will save memory resources and also could help in making inter-image messaging not so costly. > Cheers, > > Sebastian Sastre > > > -----Mensaje original----- > > De: [hidden email] > > [mailto:[hidden email]] En > > nombre de Ralph Johnson > > Enviado el: Jueves, 18 de Octubre de 2007 08:09 > > Para: The general-purpose Squeak developers list > > Asunto: Re: Multy-core CPUs > > > > On 10/17/07, Steve Wart <[hidden email]> wrote: > > > I don't know if mapping Smalltalk processes to native > > threads is the > > > way to go, given the pain I've seen in the Java and C# space. > > > > Shared-memory parallelism has always been difficult. People > > claimed it was the language, the environment, or they needed > > better training. > > They always thought that with one more thing, they could "fix" > > shared-memory parallelism and make it usable. But Java has > > done a good job with providiing reasonable language > > primitives. There has been a lot of work on making threads > > efficient, and plenty of people have learned to write > > mutli-threaded Java. But it is still way too hard. > > > > I think that shared-memory parallism, with explicit > > synchronization, is a bad idea. Transactional memory might > > be a solution, but it eliminates explicit synchronization. I > > think the most likely solution is to avoid shared memory > > altogether, and go with message passing. > > Erlang is a perfect example of this. We could take this > > approach in Smalltalk by making minimal images like Spoon, > > making images that are designed to be used by other images > > (angain, like Spoon), and then implementing our systms as > > hundreds or thousands of separate images. > > Image startup would have to be very fast. I think that this > > is more likely to be useful than rewriting garbage collectors > > to support parallelism. > > > > -Ralph Johnson > > > > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Sebastian Sastre-2
Hi. What do you recommend for communication among running images?
RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)? Thanks, p. On 18.10.2007, at 16:18, Sebastian Sastre wrote: > Hey this sounds a an interesting path to me. If we think in nature > and it's > design, that images could be analog to cells of a larger body. > Fragmentation > keep things simple without compromising scalability. Natural facts > concluded > that is more efficient not to develop few supercomplex brain cells > but to > develop zillions of a far simpler brain cells, this is, that are just > complex enough, and make them able to setup in an inimaginable > super complex > network: a brain. > > Other approach that also makes me conclude this is interesting is > that we > know that one object that is too smart smells bad. I mean it easily > starts > to become less flexible so less scalable in complexity, less > intuitive (you > have to learn more about how to use it), more to memorize, maintain, > document, etc. So it is smarter but it could happen that it begins > to become > a bad deal because of beign too costly. Said so, if we think in those > flexible mini images as objects, each one using a core we can scale > enourmusly and almost trivially in this whole multicore thing and > in a way > we know it works. > > Other interesting point is faul tolerance. If one of those images > happen to > pass a downtime (because a power faliure on the host where they where > running or whatever reason) the system could happen to feel it > somehow but > not being in a complete faiure because there are other images to > handle > demand. A small (so efficient), well protected critical system can > coordinate measures of contention for the "crisis" an hopefully the > system > never really makes feel it's own crisis to the users. > > Again I found this is a tradeof about when to scale horizontally or > vertically. For hardware, Intel and friends have scaled vertically > (more > bits and Hz for instance) for years as much as they where > phisically able to > do it. Now they reached a kind of barrier and started to scale > horizontally > (adding cores). Please don't fall in endless discussions, like the > ones I > saw out there, about comparing apples with bannanas because they > are fruits > but are not comparable. I mean it's about scaling but they are 2 > different > axis of a multidimensional scaling (complexity, load, performance, > etc). > > I'm thinking here as vertical being to make one squeak smarter to > be capable > to be trhead safe and horizontal to make one smart network of N > squeaks. > > Sometimes one choice will be a good business and sometimes it will > be the > other. I feel like the horizontal time has come. If that's true, to > invest > (time, $, effort) now in vertical scaling could happen to be have a > lower > cost/benefit rate if compared to the results of the investiment of > horizontal scaling. > > The truth is that this is all speculative and I don't know. But I > do trust > in nature. > > Cheers, > > Sebastian Sastre > >> -----Mensaje original----- >> De: [hidden email] >> [mailto:[hidden email]] En >> nombre de Ralph Johnson >> Enviado el: Jueves, 18 de Octubre de 2007 08:09 >> Para: The general-purpose Squeak developers list >> Asunto: Re: Multy-core CPUs >> >> On 10/17/07, Steve Wart <[hidden email]> wrote: >>> I don't know if mapping Smalltalk processes to native >> threads is the >>> way to go, given the pain I've seen in the Java and C# space. >> >> Shared-memory parallelism has always been difficult. People >> claimed it was the language, the environment, or they needed >> better training. >> They always thought that with one more thing, they could "fix" >> shared-memory parallelism and make it usable. But Java has >> done a good job with providiing reasonable language >> primitives. There has been a lot of work on making threads >> efficient, and plenty of people have learned to write >> mutli-threaded Java. But it is still way too hard. >> >> I think that shared-memory parallism, with explicit >> synchronization, is a bad idea. Transactional memory might >> be a solution, but it eliminates explicit synchronization. I >> think the most likely solution is to avoid shared memory >> altogether, and go with message passing. >> Erlang is a perfect example of this. We could take this >> approach in Smalltalk by making minimal images like Spoon, >> making images that are designed to be used by other images >> (angain, like Spoon), and then implementing our systms as >> hundreds or thousands of separate images. >> Image startup would have to be very fast. I think that this >> is more likely to be useful than rewriting garbage collectors >> to support parallelism. >> >> -Ralph Johnson >> > > > smime.p7s (3K) Download Attachment |
Hi Peter, look.. I've implemented RemotedObjects that is remake of rST
available in squeaksource but even with the peformance improvments of using the sockets in full duplex I was hoping better results so I've freezed development there. I can't say anything about the RMT nor SOAP because I had no experience with them. Honestly I think we should consult someone with more experience in squeak and network than me. Perhaps people envolved with Croquet or Spoon can bring some experience/ideas/frameworks? Cheers, Sebastian Sastre > -----Mensaje original----- > De: [hidden email] > [mailto:[hidden email]] En > nombre de Petr Fischer > Enviado el: Jueves, 18 de Octubre de 2007 12:36 > Para: The general-purpose Squeak developers list > Asunto: Re: Multy-core CPUs - communication among images > > Hi. What do you recommend for communication among running images? > RemoteMessagingToolkit (RMT)? > Remote smalltalk (rST)? > Soap (ehm)? > other (not via. TCP/IP stack - for multiple images running locally)? > > Thanks, p. > > On 18.10.2007, at 16:18, Sebastian Sastre wrote: > > > Hey this sounds a an interesting path to me. If we think in > nature and > > it's design, that images could be analog to cells of a larger body. > > Fragmentation > > keep things simple without compromising scalability. Natural facts > > concluded that is more efficient not to develop few > supercomplex brain > > cells but to develop zillions of a far simpler brain cells, > this is, > > that are just complex enough, and make them able to setup in an > > inimaginable super complex > > network: a brain. > > > > Other approach that also makes me conclude this is > interesting is that > > we know that one object that is too smart smells bad. I > mean it easily > > starts to become less flexible so less scalable in complexity, less > > intuitive (you have to learn more about how to use it), more to > > memorize, maintain, document, etc. So it is smarter but it could > > happen that it begins to become a bad deal because of beign too > > costly. Said so, if we think in those flexible mini images > as objects, > > each one using a core we can scale enourmusly and almost > trivially in > > this whole multicore thing and in a way we know it works. > > > > Other interesting point is faul tolerance. If one of those images > > happen to pass a downtime (because a power faliure on the > host where > > they where running or whatever reason) the system could > happen to feel > > it somehow but not being in a complete faiure because there > are other > > images to handle demand. A small (so efficient), well protected > > critical system can coordinate measures of contention for > the "crisis" > > an hopefully the system never really makes feel it's own > crisis to the > > users. > > > > Again I found this is a tradeof about when to scale horizontally or > > vertically. For hardware, Intel and friends have scaled vertically > > (more bits and Hz for instance) for years as much as they where > > phisically able to do it. Now they reached a kind of barrier and > > started to scale horizontally (adding cores). Please don't fall in > > endless discussions, like the ones I saw out there, about comparing > > apples with bannanas because they are fruits but are not > comparable. I > > mean it's about scaling but they are 2 different axis of a > > multidimensional scaling (complexity, load, performance, etc). > > > > I'm thinking here as vertical being to make one squeak > smarter to be > > capable to be trhead safe and horizontal to make one smart > network of > > N squeaks. > > > > Sometimes one choice will be a good business and sometimes > it will be > > the other. I feel like the horizontal time has come. If > that's true, > > to invest (time, $, effort) now in vertical scaling could > happen to be > > have a lower cost/benefit rate if compared to the results of the > > investiment of horizontal scaling. > > > > The truth is that this is all speculative and I don't know. > But I do > > trust in nature. > > > > Cheers, > > > > Sebastian Sastre > > > >> -----Mensaje original----- > >> De: [hidden email] > >> [mailto:[hidden email]] En > nombre de > >> Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 > >> Para: The general-purpose Squeak developers list > >> Asunto: Re: Multy-core CPUs > >> > >> On 10/17/07, Steve Wart <[hidden email]> wrote: > >>> I don't know if mapping Smalltalk processes to native > >> threads is the > >>> way to go, given the pain I've seen in the Java and C# space. > >> > >> Shared-memory parallelism has always been difficult. > People claimed > >> it was the language, the environment, or they needed > better training. > >> They always thought that with one more thing, they could "fix" > >> shared-memory parallelism and make it usable. But Java has done a > >> good job with providiing reasonable language primitives. > There has > >> been a lot of work on making threads efficient, and plenty > of people > >> have learned to write mutli-threaded Java. But it is > still way too > >> hard. > >> > >> I think that shared-memory parallism, with explicit > synchronization, > >> is a bad idea. Transactional memory might be a solution, but it > >> eliminates explicit synchronization. I think the most likely > >> solution is to avoid shared memory altogether, and go with message > >> passing. > >> Erlang is a perfect example of this. We could take this > approach in > >> Smalltalk by making minimal images like Spoon, making > images that are > >> designed to be used by other images (angain, like Spoon), and then > >> implementing our systms as hundreds or thousands of > separate images. > >> Image startup would have to be very fast. I think that > this is more > >> likely to be useful than rewriting garbage collectors to support > >> parallelism. > >> > >> -Ralph Johnson > >> > > > > > > > > |
In reply to this post by Hans-Martin Mosner
On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote: > This would only make things more complicated since then the primitives > would have to start parallel native threads working on the same object > memory. > The problem with native threads is that the current object memory > is not > designed to work with multiple independent mutator threads. There > are GC > algorithms which work with parallel threads, but AFAIK they all have > quite some overhead relative to the single-thread situation. > > IMO, a combination of native threads and green threads would be the > best > (although it still has the problem of parallel GC): > The VM runs a small fixed number of native threads (default: number of > available cores, but could be a little more to efficiently handle > blocking calls to external functions) which compete for the runnable > Smalltalk processes. That way, a number of processes could be > active at > any one time instead of just one. The synchronization overhead in the > process-switching primitives should be negligible compared to the > overhead needed for GC synchronization. This is exactly what I have started work on. I want to use the foundations of SqueakElib as a msg passing mechanism between objects assigned to different native threads. There would be one native thread per core. I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code. I have given very little thought to what would need to be protected in the object memory or in the primitives. I take this very much as a learning project. Just think, I'll be able to see how the interpreter works, the object memory, bytecode dispatch, primitives....all of it in fact. If I can come out with a working system that does msg passing, even at the cost of poorly performing object memory, et al., then it will be a major success for me. It is going to be slower, anyway, because I have to intercept each msg send as a possible non-local send. To this end, the Macro Transforms had to be disabled so I could intercept them. The system slowed considerably. I hope to speed them up with runtime info: is the receiver in the same thread that's running? I do appreciate your comments and know that I may be wasting my time. :) > > The simple yet efficient ObjectMemory of current Squeak can not be > used > with parallel threads (at least not without significant > synchronization > overhead). AFAIK, efficient algorithms require every thread to have > its > own object allocation area to avoid contention on object allocations. > Tenuring (making young objects old) and storing new objects into old > objects (remembered table) require synchronization. In other words, > grafting a threadsafe object memory onto Squeak would be a major > project. > > In contrast, for a significant subset of applications (servers) it is > orders of magnitudes simpler to run several images in parallel. Those > images don't stomp on each other's object memory, so there is > absolutely > no synchronization overhead. For stateful sessions, a front end can > handle routing requests to the image which currently holds a session's > state, stateless requests can be handled by any image. > > Cheers, > Hans-Martin > |
In reply to this post by Sebastian Sastre-2
Here is a break down from February of the different options for
dealing with threading (and therefor Multi-cores): http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-February/114181.html. I see since then (or could have been before, I don't see a date) Lukas and co. have written a paper about adding STM to Squeak. http://www.lukas-renggli.ch/files/95/wwpettvsbj457o5i530ou2lrptx0is/transmem-presentation.pdf On 10/17/07, Sebastian Sastre <[hidden email]> wrote: > This is not my area but I imagine that somehow Squeak processes should map > to OS native threads paralellizable by each of the cores. Any chance to > Exupery be of some help on that? I ask because if it is then is a must for > that future. > > regards, > > Sebastian Sastre > > > > -----Mensaje original----- > > De: [hidden email] > > [mailto:[hidden email]] En > > nombre de gruntfuttuck > > Enviado el: MiƩrcoles, 17 de Octubre de 2007 06:10 > > Para: [hidden email] > > Asunto: Multy-core CPUs > > > > > > How is squeak going to handle multy-core CPUs, if at all? If > > we see cores of 100 plus in the future and squeak stay as it > > is, I would imagine other languages such as erlang, will look > > more attractive. > > -- > > View this message in context: > > http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 > > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > > > > > > |
In reply to this post by Petr Fischer-3
Check out MaClientServer (developed for Magma, but useful on its own):
http://liststest.squeakfoundation.org/pipermail/squeak-dev/2004-June/078767.html On 10/18/07, Petr Fischer <[hidden email]> wrote: > Hi. What do you recommend for communication among running images? > RemoteMessagingToolkit (RMT)? > Remote smalltalk (rST)? > Soap (ehm)? > other (not via. TCP/IP stack - for multiple images running locally)? > > Thanks, p. > > On 18.10.2007, at 16:18, Sebastian Sastre wrote: > > > Hey this sounds a an interesting path to me. If we think in nature > > and it's > > design, that images could be analog to cells of a larger body. > > Fragmentation > > keep things simple without compromising scalability. Natural facts > > concluded > > that is more efficient not to develop few supercomplex brain cells > > but to > > develop zillions of a far simpler brain cells, this is, that are just > > complex enough, and make them able to setup in an inimaginable > > super complex > > network: a brain. > > > > Other approach that also makes me conclude this is interesting is > > that we > > know that one object that is too smart smells bad. I mean it easily > > starts > > to become less flexible so less scalable in complexity, less > > intuitive (you > > have to learn more about how to use it), more to memorize, maintain, > > document, etc. So it is smarter but it could happen that it begins > > to become > > a bad deal because of beign too costly. Said so, if we think in those > > flexible mini images as objects, each one using a core we can scale > > enourmusly and almost trivially in this whole multicore thing and > > in a way > > we know it works. > > > > Other interesting point is faul tolerance. If one of those images > > happen to > > pass a downtime (because a power faliure on the host where they where > > running or whatever reason) the system could happen to feel it > > somehow but > > not being in a complete faiure because there are other images to > > handle > > demand. A small (so efficient), well protected critical system can > > coordinate measures of contention for the "crisis" an hopefully the > > system > > never really makes feel it's own crisis to the users. > > > > Again I found this is a tradeof about when to scale horizontally or > > vertically. For hardware, Intel and friends have scaled vertically > > (more > > bits and Hz for instance) for years as much as they where > > phisically able to > > do it. Now they reached a kind of barrier and started to scale > > horizontally > > (adding cores). Please don't fall in endless discussions, like the > > ones I > > saw out there, about comparing apples with bannanas because they > > are fruits > > but are not comparable. I mean it's about scaling but they are 2 > > different > > axis of a multidimensional scaling (complexity, load, performance, > > etc). > > > > I'm thinking here as vertical being to make one squeak smarter to > > be capable > > to be trhead safe and horizontal to make one smart network of N > > squeaks. > > > > Sometimes one choice will be a good business and sometimes it will > > be the > > other. I feel like the horizontal time has come. If that's true, to > > invest > > (time, $, effort) now in vertical scaling could happen to be have a > > lower > > cost/benefit rate if compared to the results of the investiment of > > horizontal scaling. > > > > The truth is that this is all speculative and I don't know. But I > > do trust > > in nature. > > > > Cheers, > > > > Sebastian Sastre > > > >> -----Mensaje original----- > >> De: [hidden email] > >> [mailto:[hidden email]] En > >> nombre de Ralph Johnson > >> Enviado el: Jueves, 18 de Octubre de 2007 08:09 > >> Para: The general-purpose Squeak developers list > >> Asunto: Re: Multy-core CPUs > >> > >> On 10/17/07, Steve Wart <[hidden email]> wrote: > >>> I don't know if mapping Smalltalk processes to native > >> threads is the > >>> way to go, given the pain I've seen in the Java and C# space. > >> > >> Shared-memory parallelism has always been difficult. People > >> claimed it was the language, the environment, or they needed > >> better training. > >> They always thought that with one more thing, they could "fix" > >> shared-memory parallelism and make it usable. But Java has > >> done a good job with providiing reasonable language > >> primitives. There has been a lot of work on making threads > >> efficient, and plenty of people have learned to write > >> mutli-threaded Java. But it is still way too hard. > >> > >> I think that shared-memory parallism, with explicit > >> synchronization, is a bad idea. Transactional memory might > >> be a solution, but it eliminates explicit synchronization. I > >> think the most likely solution is to avoid shared memory > >> altogether, and go with message passing. > >> Erlang is a perfect example of this. We could take this > >> approach in Smalltalk by making minimal images like Spoon, > >> making images that are designed to be used by other images > >> (angain, like Spoon), and then implementing our systms as > >> hundreds or thousands of separate images. > >> Image startup would have to be very fast. I think that this > >> is more likely to be useful than rewriting garbage collectors > >> to support parallelism. > >> > >> -Ralph Johnson > >> > > > > > > > > > > > > |
In reply to this post by Rob Withers
> I am currently trying to understand what to do with all of the > global variables used in the interp loop, so I can have multiple > threads running that code. Ah, well my intent was to ensure there was no globals, however there are a few because. sqInt extraVMMemory; /* Historical reasons for mac os-9 setup, not needed as a global now*/ sqInt (*compilerHooks[16])(); /* earlier versions of the code warrior compiler had issues when you stuck this in a structure, should be fixed now usqInt memory; /* There where some usage of memory in ccode: constructs in the in interp, I think these might be gone now void* showSurfaceFn; /* not sure about this one struct VirtualMachine* interpreterProxy; /* This points to the interpreterProxy, It's there historically to allow direct linking from support code, but really you should use an accessor. The rest are set to values which you can't do in a struct, however somewhere in or before the readImageFromFileHeapSizeStartingAt you could allocate the foo structure and initialize these values. There of course is some messy setup code in the VM that might refer to procedures in interp.c before an image is loaded of course, that is poor practice, you would need to root that out. char* obsoleteIndexedPrimitiveTable[][3] = { const char* obsoleteNamedPrimitiveTable[][3] = { void *primitiveTable[577] = { const char *interpreterVersion = -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Rob Withers
On Oct 18, 2007, at 9:06 AM, Robert Withers wrote: > > On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote: > >> This would only make things more complicated since then the >> primitives >> would have to start parallel native threads working on the same >> object >> memory. >> The problem with native threads is that the current object memory >> is not >> designed to work with multiple independent mutator threads. There >> are GC >> algorithms which work with parallel threads, but AFAIK they all have >> quite some overhead relative to the single-thread situation. >> >> IMO, a combination of native threads and green threads would be >> the best >> (although it still has the problem of parallel GC): >> The VM runs a small fixed number of native threads (default: >> number of >> available cores, but could be a little more to efficiently handle >> blocking calls to external functions) which compete for the runnable >> Smalltalk processes. That way, a number of processes could be >> active at >> any one time instead of just one. The synchronization overhead in the >> process-switching primitives should be negligible compared to the >> overhead needed for GC synchronization. > > This is exactly what I have started work on. I want to use the > foundations of SqueakElib as a msg passing mechanism between > objects assigned to different native threads. There would be one > native thread per core. I am currently trying to understand what > to do with all of the global variables used in the interp loop, so > I can have multiple threads running that code. I have given very > little thought to what would need to be protected in the object > memory or in the primitives. I take this very much as a learning > project. Just think, I'll be able to see how the interpreter > works, the object memory, bytecode dispatch, primitives....all of > it in fact. If I can come out with a working system that does msg > passing, even at the cost of poorly performing object memory, et > al., then it will be a major success for me. > > It is going to be slower, anyway, because I have to intercept each > msg send as a possible non-local send. Isn't this a show-stopper for a practical system? Or is this a stepping-stone? If so, how do you envision resolving this in the future? FWIW, Croquet was at one time envisioned to work in the way that you describe. The architects weren't able to produce a satisfactory design/implementation within the necessary time frame, and instead developed the current "Islands" mechanism. This has worked out very well in practice, and there is no pressing need to try to implement the original idea. In my understanding, Croquet islands and E vats are quite similar in that regard (and the latter informed the design of the former)... both use an explicit far-ref proxy to an object in another island/ vat. What is the motivation for the approach you have chosen, other than it being a fun learning process (which may certainly be a good enough reason on its own)? Cheers, Josh > To this end, the Macro Transforms had to be disabled so I could > intercept them. The system slowed considerably. I hope to speed > them up with runtime info: is the receiver in the same thread > that's running? > > I do appreciate your comments and know that I may be wasting my > time. :) > >> >> The simple yet efficient ObjectMemory of current Squeak can not be >> used >> with parallel threads (at least not without significant >> synchronization >> overhead). AFAIK, efficient algorithms require every thread to >> have its >> own object allocation area to avoid contention on object allocations. >> Tenuring (making young objects old) and storing new objects into old >> objects (remembered table) require synchronization. In other words, >> grafting a threadsafe object memory onto Squeak would be a major >> project. >> >> In contrast, for a significant subset of applications (servers) it is >> orders of magnitudes simpler to run several images in parallel. Those >> images don't stomp on each other's object memory, so there is >> absolutely >> no synchronization overhead. For stateful sessions, a front end can >> handle routing requests to the image which currently holds a >> session's >> state, stateless requests can be handled by any image. >> >> Cheers, >> Hans-Martin >> > > |
In reply to this post by Petr Fischer-3
Ugh, not SOAP. Unless you plan to talk to non-smalltalk entities.
Smalltalk already deals with objects serialized binarily and deals with the case that the save was done on a machine with a different byte ordering scheme. I would think that system could be exploited to dump Smalltalk data raw across a link (including running methods frozen before transfer). Maybe Spoon is doing something like this? As old as Smalltalk is, someone must be. :) On 10/18/07, Petr Fischer <[hidden email]> wrote: > Hi. What do you recommend for communication among running images? > RemoteMessagingToolkit (RMT)? > Remote smalltalk (rST)? > Soap (ehm)? > other (not via. TCP/IP stack - for multiple images running locally)? > > Thanks, p. > > On 18.10.2007, at 16:18, Sebastian Sastre wrote: > > > Hey this sounds a an interesting path to me. If we think in nature > > and it's > > design, that images could be analog to cells of a larger body. > > Fragmentation > > keep things simple without compromising scalability. Natural facts > > concluded > > that is more efficient not to develop few supercomplex brain cells > > but to > > develop zillions of a far simpler brain cells, this is, that are just > > complex enough, and make them able to setup in an inimaginable > > super complex > > network: a brain. > > > > Other approach that also makes me conclude this is interesting is > > that we > > know that one object that is too smart smells bad. I mean it easily > > starts > > to become less flexible so less scalable in complexity, less > > intuitive (you > > have to learn more about how to use it), more to memorize, maintain, > > document, etc. So it is smarter but it could happen that it begins > > to become > > a bad deal because of beign too costly. Said so, if we think in those > > flexible mini images as objects, each one using a core we can scale > > enourmusly and almost trivially in this whole multicore thing and > > in a way > > we know it works. > > > > Other interesting point is faul tolerance. If one of those images > > happen to > > pass a downtime (because a power faliure on the host where they where > > running or whatever reason) the system could happen to feel it > > somehow but > > not being in a complete faiure because there are other images to > > handle > > demand. A small (so efficient), well protected critical system can > > coordinate measures of contention for the "crisis" an hopefully the > > system > > never really makes feel it's own crisis to the users. > > > > Again I found this is a tradeof about when to scale horizontally or > > vertically. For hardware, Intel and friends have scaled vertically > > (more > > bits and Hz for instance) for years as much as they where > > phisically able to > > do it. Now they reached a kind of barrier and started to scale > > horizontally > > (adding cores). Please don't fall in endless discussions, like the > > ones I > > saw out there, about comparing apples with bannanas because they > > are fruits > > but are not comparable. I mean it's about scaling but they are 2 > > different > > axis of a multidimensional scaling (complexity, load, performance, > > etc). > > > > I'm thinking here as vertical being to make one squeak smarter to > > be capable > > to be trhead safe and horizontal to make one smart network of N > > squeaks. > > > > Sometimes one choice will be a good business and sometimes it will > > be the > > other. I feel like the horizontal time has come. If that's true, to > > invest > > (time, $, effort) now in vertical scaling could happen to be have a > > lower > > cost/benefit rate if compared to the results of the investiment of > > horizontal scaling. > > > > The truth is that this is all speculative and I don't know. But I > > do trust > > in nature. > > > > Cheers, > > > > Sebastian Sastre > > > >> -----Mensaje original----- > >> De: [hidden email] > >> [mailto:[hidden email]] En > >> nombre de Ralph Johnson > >> Enviado el: Jueves, 18 de Octubre de 2007 08:09 > >> Para: The general-purpose Squeak developers list > >> Asunto: Re: Multy-core CPUs > >> > >> On 10/17/07, Steve Wart <[hidden email]> wrote: > >>> I don't know if mapping Smalltalk processes to native > >> threads is the > >>> way to go, given the pain I've seen in the Java and C# space. > >> > >> Shared-memory parallelism has always been difficult. People > >> claimed it was the language, the environment, or they needed > >> better training. > >> They always thought that with one more thing, they could "fix" > >> shared-memory parallelism and make it usable. But Java has > >> done a good job with providiing reasonable language > >> primitives. There has been a lot of work on making threads > >> efficient, and plenty of people have learned to write > >> mutli-threaded Java. But it is still way too hard. > >> > >> I think that shared-memory parallism, with explicit > >> synchronization, is a bad idea. Transactional memory might > >> be a solution, but it eliminates explicit synchronization. I > >> think the most likely solution is to avoid shared memory > >> altogether, and go with message passing. > >> Erlang is a perfect example of this. We could take this > >> approach in Smalltalk by making minimal images like Spoon, > >> making images that are designed to be used by other images > >> (angain, like Spoon), and then implementing our systms as > >> hundreds or thousands of separate images. > >> Image startup would have to be very fast. I think that this > >> is more likely to be useful than rewriting garbage collectors > >> to support parallelism. > >> > >> -Ralph Johnson > >> > > > > > > > > > > > > |
In reply to this post by Igor Stasenko
Hi,
What Ralph and the others have said said is on target in many respects. The Erlang and Smalltalk model of messaging have a lot to be desired and when combined may provide a powerful and compelling computing platform. However, having just worked on a very large multi-threaded commercial application in live production it is very clear that even single native threaded Smalltalk applications have very nasty concurrency problems. Our team of seven was able to solve many of the worst of the these concurrency problems in a year and a half and improved the reliability of this important production application. It's important that concurrency be taken into account at all levels of an application's design, from the lowest levels of the virtual machine through the end user experience (which is where concurrency on multiple cores can really make a significant paradigm adjusting difference if done well). Of the lessons learned from this complex real world application was that almost ALL of the concurrency problems you have with multiple cores running N native threads you have with a single core running one native thread. The implication of this is that the proposed solutions of running multiple images with one native thread each won't really save you from concurrency problems, as each image on it's own can have serious concurrency issues. When you have a single native thread running, say, ten to twenty Smalltalk green threads (aka Smalltalk processes) the concurrency problems can be a real nightmare to contemplate. Comprehension of what is happening is exasperated by the limited debugging information captured at runtime crash dumps. Diagnosing the real world concurrency problems in a live production application revealed that it's not an easy problem even with one native thread running! Additional native threads really wouldn't have changed much (assuming that the VM can properly handle GC and other issues as is done in Smalltalk MT) with the concurrency problems we were dealing with. This includes all the nasty problems with the standard class library collection classes. It is for the above reasons that I support many approaches be implemented so that we can find out the best one(s) for various application domains. It's unlikely that there is a one solution fits all needs type of paradigm. 1) With existing Smalltalks (and other languages) it's relatively easy to support one image per native "process" (aka task) with their own separate memory spaces. This seems to be trivial for squeak. The main thing that is needed is an effective and appropriate distributed object-messaging system via TCP/IP. This also has the advantage of easily distributing the image-nodes across multiple server nodes on a network. I propose that any distributed object messaging system that is developed for inter-image communication meet a wide range of criteria and application needs before being considered as a part of the upcoming next Smalltalk Standard. These criteria would need to be elucidated from the literature and the needs of members of the Smalltalk community and their clients. 2) It's been mentioned that it would be straightforward to have squeak start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach. This is very likely practical for multi-core cpus such as the N-core (where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera. 3) A single image running on N-cores with M-native threads (M may be larger than N) is the full generalization of course. This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera. However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time). I believe that we, the Smalltalk community, need to have Smalltalk capable of being deployed into the fully generalized scenario running on N-cores with M-native threads and with O-images in one memory space being able to communicate with P other nodes. It is us that need to do the hard work of providing systems that work correctly in the face of the multi-core-multi-threaded reality that is now upon us. If we run away from the hard work the competitors who tackle it and provide workable solutions will prevail. Food for thought. All the best, Peter William Lount Smalltalk.org Editor [hidden email] |
Free forum by Nabble | Edit this page |