Smalltalk › Squeak › Squeak - Dev

Multy-core CPUs

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

194 messages Options

1234 ... 10

gruntfuttuck

Multy-core CPUs

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Sebastian Sastre-2

RE: Multy-core CPUs

This is not my area but I imagine that somehow Squeak processes should map
to OS native threads paralellizable by each of the cores. Any chance to
Exupery be of some help on that? I ask because if it is then is a must for
that future.

regards,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de gruntfuttuck
> Enviado el: Miércoles, 17 de Octubre de 2007 06:10
> Para: [hidden email]
> Asunto: Multy-core CPUs
>
>
> How is squeak going to handle multy-core CPUs, if at all? If
> we see cores of 100 plus in the future and squeak stay as it
> is, I would imagine other languages such as erlang, will look
> more attractive.
> --
> View this message in context:
> http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Steve Wart

Re: Multy-core CPUs

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions.

No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery.

Steve

On 10/17/07, Sebastian Sastre <[hidden email]> wrote:

This is not my area but I imagine that somehow Squeak processes should map
to OS native threads paralellizable by each of the cores. Any chance to
Exupery be of some help on that? I ask because if it is then is a must for
that future.

regards,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de gruntfuttuck
> Enviado el: Miércoles, 17 de Octubre de 2007 06:10
> Para: [hidden email]
> Asunto: Multy-core CPUs
>
>
> How is squeak going to handle multy-core CPUs, if at all? If
> we see cores of 100 plus in the future and squeak stay as it
> is, I would imagine other languages such as erlang, will look
> more attractive.
> --
> View this message in context:
> http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Davide Varvello-2

Metrics

In reply to this post by gruntfuttuck

Metrics

Hi all!
Have you knowledge of tools to retrieve metrics on Smalltalk code like McCabe, coupling, NCSS...?
TIA
Davide

Lukas Renggli

Re: Metrics

> Have you knowledge of tools to retrieve metrics on Smalltalk code like
> McCabe, coupling, NCSS...?

http://moose.unibe.ch/

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Sebastian Sastre-2

RE: Multy-core CPUs

In reply to this post by Steve Wart

Mmmm original, yeah thats a very different approach. Hard to say which one is best. But for your comments, maybe the primitives way is a path *better* for the human beigns that program the system in terms of easing that pain.

Question: that would be a path that prioritizes usability at samlltalk developer level? if so, for me is more interesting even if it is less efficient than the other in terms of a couple of more or less [whatever measure unit] per second that in one year will be duplicated with a cpu with 2 more cores for a few u$s.

Not prioritizing usability and intelectual ergonomy is equal to not geting the point of all this smalltalk thing, perhaps even more.. all TI thing. Just a thought.

I'm quite sure that multicore this is the begining of a new crisis for the industry. But is a good one!

cheers,

Sebastian Sastre

De: [hidden email] [mailto:[hidden email]] En nombre de Steve Wart
Enviado el: Miércoles, 17 de Octubre de 2007 17:26
Para: The general-purpose Squeak developers list
Asunto: Re: Multy-core CPUs

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions.

No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery.

Steve

On 10/17/07, Sebastian Sastre <[hidden email]> wrote:
This is not my area but I imagine that somehow Squeak processes should map
to OS native threads paralellizable by each of the cores. Any chance to
Exupery be of some help on that? I ask because if it is then is a must for
that future.

regards,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de gruntfuttuck
> Enviado el: Miércoles, 17 de Octubre de 2007 06:10
> Para: [hidden email]
> Asunto: Multy-core CPUs
>
>
> How is squeak going to handle multy-core CPUs, if at all? If
> we see cores of 100 plus in the future and squeak stay as it
> is, I would imagine other languages such as erlang, will look
> more attractive.
> --
> View this message in context:
> http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Hans-Martin Mosner

Re: Multy-core CPUs

In reply to this post by Steve Wart

Steve Wart schrieb:
> I don't know if mapping Smalltalk processes to native threads is the
> way to go, given the pain I've seen in the Java and C# space.
>
> What might be interesting is to develop low-level primitives (along
> the lines of the famed map/reduce operations) that provide parallel
> processing versions of commonly used collection functions.
>
> No idea how easy this would be to do, but on the surface seems more
> promising than trying to do process/thread jiggery pokery.
This would only make things more complicated since then the primitives
would have to start parallel native threads working on the same object
memory.
The problem with native threads is that the current object memory is not
designed to work with multiple independent mutator threads. There are GC
algorithms which work with parallel threads, but AFAIK they all have
quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best
(although it still has the problem of parallel GC):
The VM runs a small fixed number of native threads (default: number of
available cores, but could be a little more to efficiently handle
blocking calls to external functions) which compete for the runnable
Smalltalk processes. That way, a number of processes could be active at
any one time instead of just one. The synchronization overhead in the
process-switching primitives should be negligible compared to the
overhead needed for GC synchronization.

The simple yet efficient ObjectMemory of current Squeak can not be used
with parallel threads (at least not without significant synchronization
overhead). AFAIK, efficient algorithms require every thread to have its
own object allocation area to avoid contention on object allocations.
Tenuring (making young objects old) and storing new objects into old
objects (remembered table) require synchronization. In other words,
grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is
orders of magnitudes simpler to run several images in parallel. Those
images don't stomp on each other's object memory, so there is absolutely
no synchronization overhead. For stateful sessions, a front end can
handle routing requests to the image which currently holds a session's
state, stateless requests can be handled by any image.

Cheers,
Hans-Martin

Davide Varvello-2

Re: Metrics

In reply to this post by Lukas Renggli

Thanks Lukas.
Davide

Lukas Renggli wrote:
>> Have you knowledge of tools to retrieve metrics on Smalltalk code like
>> McCabe, coupling, NCSS...?
>
> http://moose.unibe.ch/
>
> Lukas
>

Ralph Johnson

Re: Multy-core CPUs

In reply to this post by Steve Wart

On 10/17/07, Steve Wart <[hidden email]> wrote:
> I don't know if mapping Smalltalk processes to native threads is the way to
> go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed
it was the language, the environment, or they needed better training.
They always thought that with one more thing, they could "fix"
shared-memory parallelism and make it usable. But Java has done a
good job with providiing reasonable language primitives. There has
been a lot of work on making threads efficient, and plenty of people
have learned to write mutli-threaded Java. But it is still way too
hard.

I think that shared-memory parallism, with explicit synchronization,
is a bad idea. Transactional memory might be a solution, but it
eliminates explicit synchronization. I think the most likely solution
is to avoid shared memory altogether, and go with message passing.
Erlang is a perfect example of this. We could take this approach in
Smalltalk by making minimal images like Spoon, making images that are
designed to be used by other images (angain, like Spoon), and then
implementing our systms as hundreds or thousands of separate images.
Image startup would have to be very fast. I think that this is more
likely to be useful than rewriting garbage collectors to support
parallelism.

-Ralph Johnson

Sebastian Sastre-2

RE: Multy-core CPUs

Hey this sounds a an interesting path to me. If we think in nature and it's
design, that images could be analog to cells of a larger body. Fragmentation
keep things simple without compromising scalability. Natural facts concluded
that is more efficient not to develop few supercomplex brain cells but to
develop zillions of a far simpler brain cells, this is, that are just
complex enough, and make them able to setup in an inimaginable super complex
network: a brain.

Other approach that also makes me conclude this is interesting is that we
know that one object that is too smart smells bad. I mean it easily starts
to become less flexible so less scalable in complexity, less intuitive (you
have to learn more about how to use it), more to memorize, maintain,
document, etc. So it is smarter but it could happen that it begins to become
a bad deal because of beign too costly. Said so, if we think in those
flexible mini images as objects, each one using a core we can scale
enourmusly and almost trivially in this whole multicore thing and in a way
we know it works.

Other interesting point is faul tolerance. If one of those images happen to
pass a downtime (because a power faliure on the host where they where
running or whatever reason) the system could happen to feel it somehow but
not being in a complete faiure because there are other images to handle
demand. A small (so efficient), well protected critical system can
coordinate measures of contention for the "crisis" an hopefully the system
never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or
vertically. For hardware, Intel and friends have scaled vertically (more
bits and Hz for instance) for years as much as they where phisically able to
do it. Now they reached a kind of barrier and started to scale horizontally
(adding cores). Please don't fall in endless discussions, like the ones I
saw out there, about comparing apples with bannanas because they are fruits
but are not comparable. I mean it's about scaling but they are 2 different
axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable
to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the
other. I feel like the horizontal time has come. If that's true, to invest
(time, $, effort) now in vertical scaling could happen to be have a lower
cost/benefit rate if compared to the results of the investiment of
horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust
in nature.

Cheers,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de Ralph Johnson
> Enviado el: Jueves, 18 de Octubre de 2007 08:09
> Para: The general-purpose Squeak developers list
> Asunto: Re: Multy-core CPUs
>
> On 10/17/07, Steve Wart <[hidden email]> wrote:
> > I don't know if mapping Smalltalk processes to native
> threads is the
> > way to go, given the pain I've seen in the Java and C# space.
>
> Shared-memory parallelism has always been difficult. People
> claimed it was the language, the environment, or they needed
> better training.
> They always thought that with one more thing, they could "fix"
> shared-memory parallelism and make it usable. But Java has
> done a good job with providiing reasonable language
> primitives. There has been a lot of work on making threads
> efficient, and plenty of people have learned to write
> mutli-threaded Java. But it is still way too hard.
>
> I think that shared-memory parallism, with explicit
> synchronization, is a bad idea. Transactional memory might
> be a solution, but it eliminates explicit synchronization. I
> think the most likely solution is to avoid shared memory
> altogether, and go with message passing.
> Erlang is a perfect example of this. We could take this
> approach in Smalltalk by making minimal images like Spoon,
> making images that are designed to be used by other images
> (angain, like Spoon), and then implementing our systms as
> hundreds or thousands of separate images.
> Image startup would have to be very fast. I think that this
> is more likely to be useful than rewriting garbage collectors
> to support parallelism.
>
> -Ralph Johnson
>

Igor Stasenko

Re: Multy-core CPUs

On 18/10/2007, Sebastian Sastre <[hidden email]> wrote:

> Hey this sounds a an interesting path to me. If we think in nature and it's
> design, that images could be analog to cells of a larger body. Fragmentation
> keep things simple without compromising scalability. Natural facts concluded
> that is more efficient not to develop few supercomplex brain cells but to
> develop zillions of a far simpler brain cells, this is, that are just
> complex enough, and make them able to setup in an inimaginable super complex
> network: a brain.
>
> Other approach that also makes me conclude this is interesting is that we
> know that one object that is too smart smells bad. I mean it easily starts
> to become less flexible so less scalable in complexity, less intuitive (you
> have to learn more about how to use it), more to memorize, maintain,
> document, etc. So it is smarter but it could happen that it begins to become
> a bad deal because of beign too costly. Said so, if we think in those
> flexible mini images as objects, each one using a core we can scale
> enourmusly and almost trivially in this whole multicore thing and in a way
> we know it works.
>
> Other interesting point is faul tolerance. If one of those images happen to
> pass a downtime (because a power faliure on the host where they where
> running or whatever reason) the system could happen to feel it somehow but
> not being in a complete faiure because there are other images to handle
> demand. A small (so efficient), well protected critical system can
> coordinate measures of contention for the "crisis" an hopefully the system
> never really makes feel it's own crisis to the users.
>
> Again I found this is a tradeof about when to scale horizontally or
> vertically. For hardware, Intel and friends have scaled vertically (more
> bits and Hz for instance) for years as much as they where phisically able to
> do it. Now they reached a kind of barrier and started to scale horizontally
> (adding cores). Please don't fall in endless discussions, like the ones I
> saw out there, about comparing apples with bannanas because they are fruits
> but are not comparable. I mean it's about scaling but they are 2 different
> axis of a multidimensional scaling (complexity, load, performance, etc).
>
> I'm thinking here as vertical being to make one squeak smarter to be capable
> to be trhead safe and horizontal to make one smart network of N squeaks.
>
> Sometimes one choice will be a good business and sometimes it will be the
> other. I feel like the horizontal time has come. If that's true, to invest
> (time, $, effort) now in vertical scaling could happen to be have a lower
> cost/benefit rate if compared to the results of the investiment of
> horizontal scaling.
>
> The truth is that this is all speculative and I don't know. But I do trust
> in nature.
>

I often thought myself about making an ST 'vertical' (by making it
multithreaded with single shared memory). Now, after reading this post
i think your approach is much better.
Then i think, it would be good to make some steps towards supporting
multiple images by single executable:
- make single executable capable of running a number of images in
separate native threads.
This will save memory resources and also could help in making
inter-image messaging not so costly.

> Cheers,
>
> Sebastian Sastre
>
> > -----Mensaje original-----
> > De: [hidden email]
> > [mailto:[hidden email]] En
> > nombre de Ralph Johnson
> > Enviado el: Jueves, 18 de Octubre de 2007 08:09
> > Para: The general-purpose Squeak developers list
> > Asunto: Re: Multy-core CPUs
> >
> > On 10/17/07, Steve Wart <[hidden email]> wrote:
> > > I don't know if mapping Smalltalk processes to native
> > threads is the
> > > way to go, given the pain I've seen in the Java and C# space.
> >
> > Shared-memory parallelism has always been difficult. People
> > claimed it was the language, the environment, or they needed
> > better training.
> > They always thought that with one more thing, they could "fix"
> > shared-memory parallelism and make it usable. But Java has
> > done a good job with providiing reasonable language
> > primitives. There has been a lot of work on making threads
> > efficient, and plenty of people have learned to write
> > mutli-threaded Java. But it is still way too hard.
> >
> > I think that shared-memory parallism, with explicit
> > synchronization, is a bad idea. Transactional memory might
> > be a solution, but it eliminates explicit synchronization. I
> > think the most likely solution is to avoid shared memory
> > altogether, and go with message passing.
> > Erlang is a perfect example of this. We could take this
> > approach in Smalltalk by making minimal images like Spoon,
> > making images that are designed to be used by other images
> > (angain, like Spoon), and then implementing our systms as
> > hundreds or thousands of separate images.
> > Image startup would have to be very fast. I think that this
> > is more likely to be useful than rewriting garbage collectors
> > to support parallelism.
> >
> > -Ralph Johnson
> >
>
>
>

--
Best regards,
Igor Stasenko AKA sig.

Petr Fischer-3

Re: Multy-core CPUs - communication among images

In reply to this post by Sebastian Sastre-2

Hi. What do you recommend for communication among running images?
RemoteMessagingToolkit (RMT)?
Remote smalltalk (rST)?
Soap (ehm)?
other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

> Hey this sounds a an interesting path to me. If we think in nature
> and it's
> design, that images could be analog to cells of a larger body.
> Fragmentation
> keep things simple without compromising scalability. Natural facts
> concluded
> that is more efficient not to develop few supercomplex brain cells
> but to
> develop zillions of a far simpler brain cells, this is, that are just
> complex enough, and make them able to setup in an inimaginable
> super complex
> network: a brain.
>
> Other approach that also makes me conclude this is interesting is
> that we
> know that one object that is too smart smells bad. I mean it easily
> starts
> to become less flexible so less scalable in complexity, less
> intuitive (you
> have to learn more about how to use it), more to memorize, maintain,
> document, etc. So it is smarter but it could happen that it begins
> to become
> a bad deal because of beign too costly. Said so, if we think in those
> flexible mini images as objects, each one using a core we can scale
> enourmusly and almost trivially in this whole multicore thing and
> in a way
> we know it works.
>
> Other interesting point is faul tolerance. If one of those images
> happen to
> pass a downtime (because a power faliure on the host where they where
> running or whatever reason) the system could happen to feel it
> somehow but
> not being in a complete faiure because there are other images to
> handle
> demand. A small (so efficient), well protected critical system can
> coordinate measures of contention for the "crisis" an hopefully the
> system
> never really makes feel it's own crisis to the users.
>
> Again I found this is a tradeof about when to scale horizontally or
> vertically. For hardware, Intel and friends have scaled vertically
> (more
> bits and Hz for instance) for years as much as they where
> phisically able to
> do it. Now they reached a kind of barrier and started to scale
> horizontally
> (adding cores). Please don't fall in endless discussions, like the
> ones I
> saw out there, about comparing apples with bannanas because they
> are fruits
> but are not comparable. I mean it's about scaling but they are 2
> different
> axis of a multidimensional scaling (complexity, load, performance,
> etc).
>
> I'm thinking here as vertical being to make one squeak smarter to
> be capable
> to be trhead safe and horizontal to make one smart network of N
> squeaks.
>
> Sometimes one choice will be a good business and sometimes it will
> be the
> other. I feel like the horizontal time has come. If that's true, to
> invest
> (time, $, effort) now in vertical scaling could happen to be have a
> lower
> cost/benefit rate if compared to the results of the investiment of
> horizontal scaling.
>
> The truth is that this is all speculative and I don't know. But I
> do trust
> in nature.
>
> Cheers,
>
> Sebastian Sastre
>
>> -----Mensaje original-----
>> De: [hidden email]
>> [mailto:[hidden email]] En
>> nombre de Ralph Johnson
>> Enviado el: Jueves, 18 de Octubre de 2007 08:09
>> Para: The general-purpose Squeak developers list
>> Asunto: Re: Multy-core CPUs
>>
>> On 10/17/07, Steve Wart <[hidden email]> wrote:
>>> I don't know if mapping Smalltalk processes to native
>> threads is the
>>> way to go, given the pain I've seen in the Java and C# space.
>>
>> Shared-memory parallelism has always been difficult. People
>> claimed it was the language, the environment, or they needed
>> better training.
>> They always thought that with one more thing, they could "fix"
>> shared-memory parallelism and make it usable. But Java has
>> done a good job with providiing reasonable language
>> primitives. There has been a lot of work on making threads
>> efficient, and plenty of people have learned to write
>> mutli-threaded Java. But it is still way too hard.
>>
>> I think that shared-memory parallism, with explicit
>> synchronization, is a bad idea. Transactional memory might
>> be a solution, but it eliminates explicit synchronization. I
>> think the most likely solution is to avoid shared memory
>> altogether, and go with message passing.
>> Erlang is a perfect example of this. We could take this
>> approach in Smalltalk by making minimal images like Spoon,
>> making images that are designed to be used by other images
>> (angain, like Spoon), and then implementing our systms as
>> hundreds or thousands of separate images.
>> Image startup would have to be very fast. I think that this
>> is more likely to be useful than rewriting garbage collectors
>> to support parallelism.
>>
>> -Ralph Johnson
>>
>
>
>

smime.p7s (3K) Download Attachment

Sebastian Sastre-2

RE: Multy-core CPUs - communication among images

Hi Peter, look.. I've implemented RemotedObjects that is remake of rST
available in squeaksource but even with the peformance improvments of using
the sockets in full duplex I was hoping better results so I've freezed
development there. I can't say anything about the RMT nor SOAP because I had
no experience with them. Honestly I think we should consult someone with
more experience in squeak and network than me. Perhaps people envolved with
Croquet or Spoon can bring some experience/ideas/frameworks?

Cheers,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de Petr Fischer
> Enviado el: Jueves, 18 de Octubre de 2007 12:36
> Para: The general-purpose Squeak developers list
> Asunto: Re: Multy-core CPUs - communication among images
>
> Hi. What do you recommend for communication among running images?
> RemoteMessagingToolkit (RMT)?
> Remote smalltalk (rST)?
> Soap (ehm)?
> other (not via. TCP/IP stack - for multiple images running locally)?
>
> Thanks, p.
>
> On 18.10.2007, at 16:18, Sebastian Sastre wrote:
>
> > Hey this sounds a an interesting path to me. If we think in
> nature and
> > it's design, that images could be analog to cells of a larger body.
> > Fragmentation
> > keep things simple without compromising scalability. Natural facts
> > concluded that is more efficient not to develop few
> supercomplex brain
> > cells but to develop zillions of a far simpler brain cells,
> this is,
> > that are just complex enough, and make them able to setup in an
> > inimaginable super complex
> > network: a brain.
> >
> > Other approach that also makes me conclude this is
> interesting is that
> > we know that one object that is too smart smells bad. I
> mean it easily
> > starts to become less flexible so less scalable in complexity, less
> > intuitive (you have to learn more about how to use it), more to
> > memorize, maintain, document, etc. So it is smarter but it could
> > happen that it begins to become a bad deal because of beign too
> > costly. Said so, if we think in those flexible mini images
> as objects,
> > each one using a core we can scale enourmusly and almost
> trivially in
> > this whole multicore thing and in a way we know it works.
> >
> > Other interesting point is faul tolerance. If one of those images
> > happen to pass a downtime (because a power faliure on the
> host where
> > they where running or whatever reason) the system could
> happen to feel
> > it somehow but not being in a complete faiure because there
> are other
> > images to handle demand. A small (so efficient), well protected
> > critical system can coordinate measures of contention for
> the "crisis"
> > an hopefully the system never really makes feel it's own
> crisis to the
> > users.
> >
> > Again I found this is a tradeof about when to scale horizontally or
> > vertically. For hardware, Intel and friends have scaled vertically
> > (more bits and Hz for instance) for years as much as they where
> > phisically able to do it. Now they reached a kind of barrier and
> > started to scale horizontally (adding cores). Please don't fall in
> > endless discussions, like the ones I saw out there, about comparing
> > apples with bannanas because they are fruits but are not
> comparable. I
> > mean it's about scaling but they are 2 different axis of a
> > multidimensional scaling (complexity, load, performance, etc).
> >
> > I'm thinking here as vertical being to make one squeak
> smarter to be
> > capable to be trhead safe and horizontal to make one smart
> network of
> > N squeaks.
> >
> > Sometimes one choice will be a good business and sometimes
> it will be
> > the other. I feel like the horizontal time has come. If
> that's true,
> > to invest (time, $, effort) now in vertical scaling could
> happen to be
> > have a lower cost/benefit rate if compared to the results of the
> > investiment of horizontal scaling.
> >
> > The truth is that this is all speculative and I don't know.
> But I do
> > trust in nature.
> >
> > Cheers,
> >
> > Sebastian Sastre
> >
> >> -----Mensaje original-----
> >> De: [hidden email]
> >> [mailto:[hidden email]] En
> nombre de
> >> Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09
> >> Para: The general-purpose Squeak developers list
> >> Asunto: Re: Multy-core CPUs
> >>
> >> On 10/17/07, Steve Wart <[hidden email]> wrote:
> >>> I don't know if mapping Smalltalk processes to native
> >> threads is the
> >>> way to go, given the pain I've seen in the Java and C# space.
> >>
> >> Shared-memory parallelism has always been difficult.
> People claimed
> >> it was the language, the environment, or they needed
> better training.
> >> They always thought that with one more thing, they could "fix"
> >> shared-memory parallelism and make it usable. But Java has done a
> >> good job with providiing reasonable language primitives.
> There has
> >> been a lot of work on making threads efficient, and plenty
> of people
> >> have learned to write mutli-threaded Java. But it is
> still way too
> >> hard.
> >>
> >> I think that shared-memory parallism, with explicit
> synchronization,
> >> is a bad idea. Transactional memory might be a solution, but it
> >> eliminates explicit synchronization. I think the most likely
> >> solution is to avoid shared memory altogether, and go with message
> >> passing.
> >> Erlang is a perfect example of this. We could take this
> approach in
> >> Smalltalk by making minimal images like Spoon, making
> images that are
> >> designed to be used by other images (angain, like Spoon), and then
> >> implementing our systms as hundreds or thousands of
> separate images.
> >> Image startup would have to be very fast. I think that
> this is more
> >> likely to be useful than rewriting garbage collectors to support
> >> parallelism.
> >>
> >> -Ralph Johnson
> >>
> >
> >
> >
>
>

Rob Withers

Re: Multy-core CPUs

In reply to this post by Hans-Martin Mosner

On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:

> This would only make things more complicated since then the primitives
> would have to start parallel native threads working on the same object
> memory.
> The problem with native threads is that the current object memory
> is not
> designed to work with multiple independent mutator threads. There
> are GC
> algorithms which work with parallel threads, but AFAIK they all have
> quite some overhead relative to the single-thread situation.
>
> IMO, a combination of native threads and green threads would be the
> best
> (although it still has the problem of parallel GC):
> The VM runs a small fixed number of native threads (default: number of
> available cores, but could be a little more to efficiently handle
> blocking calls to external functions) which compete for the runnable
> Smalltalk processes. That way, a number of processes could be
> active at
> any one time instead of just one. The synchronization overhead in the
> process-switching primitives should be negligible compared to the
> overhead needed for GC synchronization.

This is exactly what I have started work on. I want to use the
foundations of SqueakElib as a msg passing mechanism between objects
assigned to different native threads. There would be one native
thread per core. I am currently trying to understand what to do with
all of the global variables used in the interp loop, so I can have
multiple threads running that code. I have given very little thought
to what would need to be protected in the object memory or in the
primitives. I take this very much as a learning project. Just
think, I'll be able to see how the interpreter works, the object
memory, bytecode dispatch, primitives....all of it in fact. If I can
come out with a working system that does msg passing, even at the
cost of poorly performing object memory, et al., then it will be a
major success for me.

It is going to be slower, anyway, because I have to intercept each
msg send as a possible non-local send. To this end, the Macro
Transforms had to be disabled so I could intercept them. The system
slowed considerably. I hope to speed them up with runtime info: is
the receiver in the same thread that's running?

I do appreciate your comments and know that I may be wasting my
time. :)

>
> The simple yet efficient ObjectMemory of current Squeak can not be
> used
> with parallel threads (at least not without significant
> synchronization
> overhead). AFAIK, efficient algorithms require every thread to have
> its
> own object allocation area to avoid contention on object allocations.
> Tenuring (making young objects old) and storing new objects into old
> objects (remembered table) require synchronization. In other words,
> grafting a threadsafe object memory onto Squeak would be a major
> project.
>
> In contrast, for a significant subset of applications (servers) it is
> orders of magnitudes simpler to run several images in parallel. Those
> images don't stomp on each other's object memory, so there is
> absolutely
> no synchronization overhead. For stateful sessions, a front end can
> handle routing requests to the image which currently holds a session's
> state, stateless requests can be handled by any image.
>
> Cheers,
> Hans-Martin
>

Jason Johnson-5

Re: Multy-core CPUs

In reply to this post by Sebastian Sastre-2

Here is a break down from February of the different options for
dealing with threading (and therefor Multi-cores):

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-February/114181.html.

I see since then (or could have been before, I don't see a date) Lukas
and co. have written a paper about adding STM to Squeak.

http://www.lukas-renggli.ch/files/95/wwpettvsbj457o5i530ou2lrptx0is/transmem-presentation.pdf

On 10/17/07, Sebastian Sastre <[hidden email]> wrote:

> This is not my area but I imagine that somehow Squeak processes should map
> to OS native threads paralellizable by each of the cores. Any chance to
> Exupery be of some help on that? I ask because if it is then is a must for
> that future.
>
> regards,
>
> Sebastian Sastre
>
>
> > -----Mensaje original-----
> > De: [hidden email]
> > [mailto:[hidden email]] En
> > nombre de gruntfuttuck
> > Enviado el: Miércoles, 17 de Octubre de 2007 06:10
> > Para: [hidden email]
> > Asunto: Multy-core CPUs
> >
> >
> > How is squeak going to handle multy-core CPUs, if at all? If
> > we see cores of 100 plus in the future and squeak stay as it
> > is, I would imagine other languages such as erlang, will look
> > more attractive.
> > --
> > View this message in context:
> > http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733
> > Sent from the Squeak - Dev mailing list archive at Nabble.com.
> >
> >
>
>
>

David Mitchell-10

Re: Multy-core CPUs - communication among images

In reply to this post by Petr Fischer-3

Check out MaClientServer (developed for Magma, but useful on its own):

http://liststest.squeakfoundation.org/pipermail/squeak-dev/2004-June/078767.html

On 10/18/07, Petr Fischer <[hidden email]> wrote:

> Hi. What do you recommend for communication among running images?
> RemoteMessagingToolkit (RMT)?
> Remote smalltalk (rST)?
> Soap (ehm)?
> other (not via. TCP/IP stack - for multiple images running locally)?
>
> Thanks, p.
>
> On 18.10.2007, at 16:18, Sebastian Sastre wrote:
>
> > Hey this sounds a an interesting path to me. If we think in nature
> > and it's
> > design, that images could be analog to cells of a larger body.
> > Fragmentation
> > keep things simple without compromising scalability. Natural facts
> > concluded
> > that is more efficient not to develop few supercomplex brain cells
> > but to
> > develop zillions of a far simpler brain cells, this is, that are just
> > complex enough, and make them able to setup in an inimaginable
> > super complex
> > network: a brain.
> >
> > Other approach that also makes me conclude this is interesting is
> > that we
> > know that one object that is too smart smells bad. I mean it easily
> > starts
> > to become less flexible so less scalable in complexity, less
> > intuitive (you
> > have to learn more about how to use it), more to memorize, maintain,
> > document, etc. So it is smarter but it could happen that it begins
> > to become
> > a bad deal because of beign too costly. Said so, if we think in those
> > flexible mini images as objects, each one using a core we can scale
> > enourmusly and almost trivially in this whole multicore thing and
> > in a way
> > we know it works.
> >
> > Other interesting point is faul tolerance. If one of those images
> > happen to
> > pass a downtime (because a power faliure on the host where they where
> > running or whatever reason) the system could happen to feel it
> > somehow but
> > not being in a complete faiure because there are other images to
> > handle
> > demand. A small (so efficient), well protected critical system can
> > coordinate measures of contention for the "crisis" an hopefully the
> > system
> > never really makes feel it's own crisis to the users.
> >
> > Again I found this is a tradeof about when to scale horizontally or
> > vertically. For hardware, Intel and friends have scaled vertically
> > (more
> > bits and Hz for instance) for years as much as they where
> > phisically able to
> > do it. Now they reached a kind of barrier and started to scale
> > horizontally
> > (adding cores). Please don't fall in endless discussions, like the
> > ones I
> > saw out there, about comparing apples with bannanas because they
> > are fruits
> > but are not comparable. I mean it's about scaling but they are 2
> > different
> > axis of a multidimensional scaling (complexity, load, performance,
> > etc).
> >
> > I'm thinking here as vertical being to make one squeak smarter to
> > be capable
> > to be trhead safe and horizontal to make one smart network of N
> > squeaks.
> >
> > Sometimes one choice will be a good business and sometimes it will
> > be the
> > other. I feel like the horizontal time has come. If that's true, to
> > invest
> > (time, $, effort) now in vertical scaling could happen to be have a
> > lower
> > cost/benefit rate if compared to the results of the investiment of
> > horizontal scaling.
> >
> > The truth is that this is all speculative and I don't know. But I
> > do trust
> > in nature.
> >
> > Cheers,
> >
> > Sebastian Sastre
> >
> >> -----Mensaje original-----
> >> De: [hidden email]
> >> [mailto:[hidden email]] En
> >> nombre de Ralph Johnson
> >> Enviado el: Jueves, 18 de Octubre de 2007 08:09
> >> Para: The general-purpose Squeak developers list
> >> Asunto: Re: Multy-core CPUs
> >>
> >> On 10/17/07, Steve Wart <[hidden email]> wrote:
> >>> I don't know if mapping Smalltalk processes to native
> >> threads is the
> >>> way to go, given the pain I've seen in the Java and C# space.
> >>
> >> Shared-memory parallelism has always been difficult. People
> >> claimed it was the language, the environment, or they needed
> >> better training.
> >> They always thought that with one more thing, they could "fix"
> >> shared-memory parallelism and make it usable. But Java has
> >> done a good job with providiing reasonable language
> >> primitives. There has been a lot of work on making threads
> >> efficient, and plenty of people have learned to write
> >> mutli-threaded Java. But it is still way too hard.
> >>
> >> I think that shared-memory parallism, with explicit
> >> synchronization, is a bad idea. Transactional memory might
> >> be a solution, but it eliminates explicit synchronization. I
> >> think the most likely solution is to avoid shared memory
> >> altogether, and go with message passing.
> >> Erlang is a perfect example of this. We could take this
> >> approach in Smalltalk by making minimal images like Spoon,
> >> making images that are designed to be used by other images
> >> (angain, like Spoon), and then implementing our systms as
> >> hundreds or thousands of separate images.
> >> Image startup would have to be very fast. I think that this
> >> is more likely to be useful than rewriting garbage collectors
> >> to support parallelism.
> >>
> >> -Ralph Johnson
> >>
> >
> >
> >
>
>
>
>
>
>

johnmci

Re: Multy-core CPUs

In reply to this post by Rob Withers

> I am currently trying to understand what to do with all of the
> global variables used in the interp loop, so I can have multiple
> threads running that code.

Ah, well my intent was to ensure there was no globals, however there
are a few because.

sqInt extraVMMemory; /* Historical reasons for mac os-9 setup, not
needed as a global now*/
sqInt (*compilerHooks[16])(); /* earlier versions of the code warrior
compiler had issues when you stuck this in a structure, should be
fixed now
usqInt memory; /* There where some usage of memory in ccode:
constructs in the in interp, I think these might be gone now
void* showSurfaceFn; /* not sure about this one

struct VirtualMachine* interpreterProxy; /* This points to the
interpreterProxy, It's there historically to allow direct linking
from support code, but really you should use an accessor.

The rest are set to values which you can't do in a struct, however
somewhere in or before the readImageFromFileHeapSizeStartingAt
you could allocate the foo structure and initialize these values.
There of course is some messy setup code in the VM that might refer
to procedures in
interp.c before an image is loaded of course, that is poor practice,
you would need to root that out.

char* obsoleteIndexedPrimitiveTable[][3] = {
const char* obsoleteNamedPrimitiveTable[][3] = {
void *primitiveTable[577] = {
const char *interpreterVersion =

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Joshua Gargus-2

Re: Multy-core CPUs

In reply to this post by Rob Withers

On Oct 18, 2007, at 9:06 AM, Robert Withers wrote:

>
> On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:
>
>> This would only make things more complicated since then the
>> primitives
>> would have to start parallel native threads working on the same
>> object
>> memory.
>> The problem with native threads is that the current object memory
>> is not
>> designed to work with multiple independent mutator threads. There
>> are GC
>> algorithms which work with parallel threads, but AFAIK they all have
>> quite some overhead relative to the single-thread situation.
>>
>> IMO, a combination of native threads and green threads would be
>> the best
>> (although it still has the problem of parallel GC):
>> The VM runs a small fixed number of native threads (default:
>> number of
>> available cores, but could be a little more to efficiently handle
>> blocking calls to external functions) which compete for the runnable
>> Smalltalk processes. That way, a number of processes could be
>> active at
>> any one time instead of just one. The synchronization overhead in the
>> process-switching primitives should be negligible compared to the
>> overhead needed for GC synchronization.
>
> This is exactly what I have started work on. I want to use the
> foundations of SqueakElib as a msg passing mechanism between
> objects assigned to different native threads. There would be one
> native thread per core. I am currently trying to understand what
> to do with all of the global variables used in the interp loop, so
> I can have multiple threads running that code. I have given very
> little thought to what would need to be protected in the object
> memory or in the primitives. I take this very much as a learning
> project. Just think, I'll be able to see how the interpreter
> works, the object memory, bytecode dispatch, primitives....all of
> it in fact. If I can come out with a working system that does msg
> passing, even at the cost of poorly performing object memory, et
> al., then it will be a major success for me.
>
> It is going to be slower, anyway, because I have to intercept each
> msg send as a possible non-local send.

Isn't this a show-stopper for a practical system? Or is this a
stepping-stone? If so, how do you envision resolving this in the
future?

FWIW, Croquet was at one time envisioned to work in the way that you
describe. The architects weren't able to produce a satisfactory
design/implementation within the necessary time frame, and instead
developed the current "Islands" mechanism. This has worked out very
well in practice, and there is no pressing need to try to implement
the original idea.

In my understanding, Croquet islands and E vats are quite similar in
that regard (and the latter informed the design of the former)...
both use an explicit far-ref proxy to an object in another island/
vat. What is the motivation for the approach you have chosen, other
than it being a fun learning process (which may certainly be a good
enough reason on its own)?

Cheers,
Josh

> To this end, the Macro Transforms had to be disabled so I could
> intercept them. The system slowed considerably. I hope to speed
> them up with runtime info: is the receiver in the same thread
> that's running?
>
> I do appreciate your comments and know that I may be wasting my
> time. :)
>
>>
>> The simple yet efficient ObjectMemory of current Squeak can not be
>> used
>> with parallel threads (at least not without significant
>> synchronization
>> overhead). AFAIK, efficient algorithms require every thread to
>> have its
>> own object allocation area to avoid contention on object allocations.
>> Tenuring (making young objects old) and storing new objects into old
>> objects (remembered table) require synchronization. In other words,
>> grafting a threadsafe object memory onto Squeak would be a major
>> project.
>>
>> In contrast, for a significant subset of applications (servers) it is
>> orders of magnitudes simpler to run several images in parallel. Those
>> images don't stomp on each other's object memory, so there is
>> absolutely
>> no synchronization overhead. For stateful sessions, a front end can
>> handle routing requests to the image which currently holds a
>> session's
>> state, stateless requests can be handled by any image.
>>
>> Cheers,
>> Hans-Martin
>>
>
>

Jason Johnson-5

Re: Multy-core CPUs - communication among images

In reply to this post by Petr Fischer-3

Ugh, not SOAP. Unless you plan to talk to non-smalltalk entities.
Smalltalk already deals with objects serialized binarily and deals
with the case that the save was done on a machine with a different
byte ordering scheme. I would think that system could be exploited to
dump Smalltalk data raw across a link (including running methods
frozen before transfer).

Maybe Spoon is doing something like this? As old as Smalltalk is,
someone must be. :)

On 10/18/07, Petr Fischer <[hidden email]> wrote:

pwl

Re: Multy-core CPUs

In reply to this post by Igor Stasenko

Hi,

What Ralph and the others have said said is on target in many respects.
The Erlang and Smalltalk model of messaging have a lot to be desired and
when combined may provide a powerful and compelling computing platform.

However, having just worked on a very large multi-threaded commercial
application in live production it is very clear that even single native
threaded Smalltalk applications have very nasty concurrency problems.
Our team of seven was able to solve many of the worst of the these
concurrency problems in a year and a half and improved the reliability
of this important production application.

It's important that concurrency be taken into account at all levels of
an application's design, from the lowest levels of the virtual machine
through the end user experience (which is where concurrency on multiple
cores can really make a significant paradigm adjusting difference if
done well).

Of the lessons learned from this complex real world application was that
almost ALL of the concurrency problems you have with multiple cores
running N native threads you have with a single core running one native
thread. The implication of this is that the proposed solutions of
running multiple images with one native thread each won't really save
you from concurrency problems, as each image on it's own can have
serious concurrency issues.

When you have a single native thread running, say, ten to twenty
Smalltalk green threads (aka Smalltalk processes) the concurrency
problems can be a real nightmare to contemplate. Comprehension of what
is happening is exasperated by the limited debugging information
captured at runtime crash dumps.

Diagnosing the real world concurrency problems in a live production
application revealed that it's not an easy problem even with one native
thread running! Additional native threads really wouldn't have changed
much (assuming that the VM can properly handle GC and other issues as is
done in Smalltalk MT) with the concurrency problems we were dealing
with. This includes all the nasty problems with the standard class
library collection classes.

It is for the above reasons that I support many approaches be
implemented so that we can find out the best one(s) for various
application domains.

It's unlikely that there is a one solution fits all needs type of paradigm.

1) With existing Smalltalks (and other languages) it's relatively easy
to support one image per native "process" (aka task) with their own
separate memory spaces. This seems to be trivial for squeak. The main
thing that is needed is an effective and appropriate distributed
object-messaging system via TCP/IP. This also has the advantage of
easily distributing the image-nodes across multiple server nodes on a
network.

I propose that any distributed object messaging system that is developed
for inter-image communication meet a wide range of criteria and
application needs before being considered as a part of the upcoming next
Smalltalk Standard. These criteria would need to be elucidated from the
literature and the needs of members of the Smalltalk community and their
clients.

2) It's been mentioned that it would be straightforward to have squeak
start up multiple copies of the image (or even multiple different
images) in one process (task) memory space with each image having it's
own native thread and keeping it's object table and memory separate
within the larger memory space. This sounds like a very nice approach.
This is very likely practical for multi-core cpus such as the N-core
(where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

3) A single image running on N-cores with M-native threads (M may be
larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips
such as the Tile64 processor from Tilera.

However, we may need to rethink the entire architecture of the Smalltalk
virtual machine notions since the Tile 64 chip has capabilities that
radically alter the paradigm. Messages between processor nodes take less
time to pass between nodes then the same amount of data takes to be
written into memory. Think about that. It offers a new paradigm
unavailable to other N-Core processors (at this current time).

I believe that we, the Smalltalk community, need to have Smalltalk
capable of being deployed into the fully generalized scenario running on
N-cores with M-native threads and with O-images in one memory space
being able to communicate with P other nodes. It is us that need to do
the hard work of providing systems that work correctly in the face of
the multi-core-multi-threaded reality that is now upon us. If we run
away from the hard work the competitors who tackle it and provide
workable solutions will prevail.

Food for thought.

All the best,

Peter William Lount
Smalltalk.org Editor
[hidden email]

1234 ... 10