Smalltalk › Squeak › Squeak - Dev

Multy-core CPUs

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

194 messages Options

12345 ... 10

Rob Withers

Re: Multy-core CPUs

On Oct 18, 2007, at 10:01 AM, Joshua Gargus wrote:

>
> On Oct 18, 2007, at 9:06 AM, Robert Withers wrote:
>
>>
>> On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:
>>
>>> This would only make things more complicated since then the
>>> primitives
>>> would have to start parallel native threads working on the same
>>> object
>>> memory.
>>> The problem with native threads is that the current object memory
>>> is not
>>> designed to work with multiple independent mutator threads. There
>>> are GC
>>> algorithms which work with parallel threads, but AFAIK they all have
>>> quite some overhead relative to the single-thread situation.
>>>
>>> IMO, a combination of native threads and green threads would be
>>> the best
>>> (although it still has the problem of parallel GC):
>>> The VM runs a small fixed number of native threads (default:
>>> number of
>>> available cores, but could be a little more to efficiently handle
>>> blocking calls to external functions) which compete for the runnable
>>> Smalltalk processes. That way, a number of processes could be
>>> active at
>>> any one time instead of just one. The synchronization overhead in
>>> the
>>> process-switching primitives should be negligible compared to the
>>> overhead needed for GC synchronization.
>>
>> This is exactly what I have started work on. I want to use the
>> foundations of SqueakElib as a msg passing mechanism between
>> objects assigned to different native threads. There would be one
>> native thread per core. I am currently trying to understand what
>> to do with all of the global variables used in the interp loop, so
>> I can have multiple threads running that code. I have given very
>> little thought to what would need to be protected in the object
>> memory or in the primitives. I take this very much as a learning
>> project. Just think, I'll be able to see how the interpreter
>> works, the object memory, bytecode dispatch, primitives....all of
>> it in fact. If I can come out with a working system that does msg
>> passing, even at the cost of poorly performing object memory, et
>> al., then it will be a major success for me.
>>
>> It is going to be slower, anyway, because I have to intercept each
>> msg send as a possible non-local send.
>
> Isn't this a show-stopper for a practical system?

Probably. Although, if a single thread executes code slower than
current squeak, yet all threads together generate higher throughput,
then it's to be considered faster.

> Or is this a stepping-stone?

It is a stepping-stone to see what inter-thread messaging looks like
and behaves.

> If so, how do you envision resolving this in the future?

My thinking is that getting the messaging working is the first step,
followed by looking at synchronization problems, and then looking at
what things like Exupery may offer to speed things up.

The example I gave of MacroTransforms is telling. Currently an
#ifTrue: message is macro transformed into bytecodes that do the
#ifTrue: inline. I have had to back that out so the #ifTrue: can be
intercepted if the receiver is non-local. At runtime, it would be
nice to see that if the receiver is in fact local, then some form of
inlining could be used, otherwise intercept. Since this is runtime
selected bytecodes, I thought of Exupery.

I think there could be lots of interesting optimization work if the
basic system if functional.

>
> FWIW, Croquet was at one time envisioned to work in the way that
> you describe. The architects weren't able to produce a
> satisfactory design/implementation within the necessary time frame,
> and instead developed the current "Islands" mechanism. This has
> worked out very well in practice, and there is no pressing need to
> try to implement the original idea.

I didn't know that, that's cool. Islands is neat.

>
> In my understanding, Croquet islands and E vats are quite similar
> in that regard (and the latter informed the design of the
> former)... both use an explicit far-ref proxy to an object in
> another island/vat. What is the motivation for the approach you
> have chosen, other than it being a fun learning process (which may
> certainly be a good enough reason on its own)?

As I described above, maybe it's a stepping-stone. Having a thread-
based vat, means there are resolved refs like NearRef (same thread),
ThreadRef (same process/mem, different thread), possibly ProcessRef
(different process, uses pipes), FarRef (on the net).

I'm not very experienced with the vm/object memory, so this is also a
fun learning experience!

Cheers,
Rob

>
> Cheers,
> Josh
>
>
>> To this end, the Macro Transforms had to be disabled so I could
>> intercept them. The system slowed considerably. I hope to speed
>> them up with runtime info: is the receiver in the same thread
>> that's running?
>>
>> I do appreciate your comments and know that I may be wasting my
>> time. :)
>>
>>>
>>> The simple yet efficient ObjectMemory of current Squeak can not
>>> be used
>>> with parallel threads (at least not without significant
>>> synchronization
>>> overhead). AFAIK, efficient algorithms require every thread to
>>> have its
>>> own object allocation area to avoid contention on object
>>> allocations.
>>> Tenuring (making young objects old) and storing new objects into old
>>> objects (remembered table) require synchronization. In other words,
>>> grafting a threadsafe object memory onto Squeak would be a major
>>> project.
>>>
>>> In contrast, for a significant subset of applications (servers)
>>> it is
>>> orders of magnitudes simpler to run several images in parallel.
>>> Those
>>> images don't stomp on each other's object memory, so there is
>>> absolutely
>>> no synchronization overhead. For stateful sessions, a front end can
>>> handle routing requests to the image which currently holds a
>>> session's
>>> state, stateless requests can be handled by any image.
>>>
>>> Cheers,
>>> Hans-Martin
>>>
>>
>>
>
>

Rob Withers

Re: Multy-core CPUs

In reply to this post by johnmci

Thanks John! I'll save this for when I actually start looking at,
even though I said I already was. I am reading Tim's chapter to get
familiar with it all. I'm going to need to add a word to the object
header, I think.

cheers,
Rob

On Oct 18, 2007, at 9:49 AM, John M McIntosh wrote:

>
>> I am currently trying to understand what to do with all of the
>> global variables used in the interp loop, so I can have multiple
>> threads running that code.
>
> Ah, well my intent was to ensure there was no globals, however
> there are a few because.
>
> sqInt extraVMMemory; /* Historical reasons for mac os-9 setup, not
> needed as a global now*/
> sqInt (*compilerHooks[16])(); /* earlier versions of the code
> warrior compiler had issues when you stuck this in a structure,
> should be fixed now
> usqInt memory; /* There where some usage of memory in ccode:
> constructs in the in interp, I think these might be gone now
> void* showSurfaceFn; /* not sure about this one
>
> struct VirtualMachine* interpreterProxy; /* This points to the
> interpreterProxy, It's there historically to allow direct linking
> from support code, but really you should use an accessor.
>
> The rest are set to values which you can't do in a struct, however
> somewhere in or before the readImageFromFileHeapSizeStartingAt
> you could allocate the foo structure and initialize these values.
> There of course is some messy setup code in the VM that might refer
> to procedures in
> interp.c before an image is loaded of course, that is poor
> practice, you would need to root that out.
>
> char* obsoleteIndexedPrimitiveTable[][3] = {
> const char* obsoleteNamedPrimitiveTable[][3] = {
> void *primitiveTable[577] = {
> const char *interpreterVersion =
>
>
>
> --
> ======================================================================
> =====
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://
> www.smalltalkconsulting.com
> ======================================================================
> =====
>
>
>

Jason Johnson-5

Re: Multy-core CPUs

In reply to this post by pwl

On 10/18/07, Peter William Lount <[hidden email]> wrote:
> Hi,
>
> However, having just worked on a very large multi-threaded commercial
> application in live production it is very clear that even single native
> threaded Smalltalk applications have very nasty concurrency problems.

Yes, and these problems are exaggerated by (what I would call) the old
way of doing threaded programming, i.e. shared state, fine-grained
locking.

> It's important that concurrency be taken into account at all levels of
> an application's design, from the lowest levels of the virtual machine
> through the end user experience (which is where concurrency on multiple
> cores can really make a significant paradigm adjusting difference if
> done well).

But if we truly to the n-core (n being 100 and above) world to improve
computation speed, and it looks as though we must, then this simply
isn't realistic for most programmers. No more then manual memory
management was realistic for large applications.

> Of the lessons learned from this complex real world application was that
> almost ALL of the concurrency problems you have with multiple cores
> running N native threads you have with a single core running one native
> thread.

Depending on when execution can be interrupted, you have exactly the
same issues.

> When you have a single native thread running, say, ten to twenty
> Smalltalk green threads (aka Smalltalk processes) the concurrency
> problems can be a real nightmare to contemplate. Comprehension of what
> is happening is exasperated by the limited debugging information
> captured at runtime crash dumps.

But this depends largely on the model. If you go away from the old,
tried and untrue method of fine-grained locking then debugging gets
much easier. It's no problem at all, for example, in Erlang.
Sometimes when something is really really hard to do, it is a sign
that we are going about it the wrong way.

> It is for the above reasons that I support many approaches be
> implemented so that we can find out the best one(s) for various
> application domains.

We know from long experience what fine-grained is like. At least one
STM implementation is out there to try, and I believe the actor model
Erlang uses is either out there, or easy to set up.

> It's unlikely that there is a one solution fits all needs type of paradigm.

No, but we can get the 99% like Garbage collection has.

> 2) It's been mentioned that it would be straightforward to have squeak
> start up multiple copies of the image (or even multiple different
> images) in one process (task) memory space with each image having it's
> own native thread and keeping it's object table and memory separate
> within the larger memory space. This sounds like a very nice approach.
> This is very likely practical for multi-core cpus such as the N-core
> (where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.
>
> 3) A single image running on N-cores with M-native threads (M may be
> larger than N) is the full generalization of course.
>
> This may be the best way to take advantage of paradigm shaking chips
> such as the Tile64 processor from Tilera.

If you mean by this a form of shared-state fine-grained programming
then I disagree whole heartedly. We have long experience with
fine-grained in C++, Java, now C#, Smalltalk, on and on. It just
can't be the path to the future.

Smalltalk needs to keep inventing the future. Chasing this primitive
form of threading would put us firmly behind languages like C++ that
have been doing it this way for decades.

> However, we may need to rethink the entire architecture of the Smalltalk
> virtual machine notions since the Tile 64 chip has capabilities that
> radically alter the paradigm. Messages between processor nodes take less
> time to pass between nodes then the same amount of data takes to be
> written into memory. Think about that. It offers a new paradigm
> unavailable to other N-Core processors (at this current time).

I wonder how Erlang will run on these machines.

David T. Lewis

Re: Multy-core CPUs

In reply to this post by Igor Stasenko

On Thu, Oct 18, 2007 at 06:36:00PM +0300, Igor Stasenko wrote:
>
> Then i think, it would be good to make some steps towards supporting
> multiple images by single executable:
> - make single executable capable of running a number of images in
> separate native threads.
> This will save memory resources and also could help in making
> inter-image messaging not so costly.

What memory and resources do you think that you will save? Squeak
already does almost exactly what you describe when you run it on a
unix/linux/OSX platform. Putting the interpreters into separate
threads (as opposed to unix processes) would at best save a trivial
amount of memory, and would add a lot of complexity.

I don't see any savings for inter-image messaging either, but
maybe I'm missing something there.

Dave

Igor Stasenko

Re: Multy-core CPUs

On 19/10/2007, David T. Lewis <[hidden email]> wrote:

> On Thu, Oct 18, 2007 at 06:36:00PM +0300, Igor Stasenko wrote:
> >
> > Then i think, it would be good to make some steps towards supporting
> > multiple images by single executable:
> > - make single executable capable of running a number of images in
> > separate native threads.
> > This will save memory resources and also could help in making
> > inter-image messaging not so costly.
>
> What memory and resources do you think that you will save? Squeak
> already does almost exactly what you describe when you run it on a
> unix/linux/OSX platform. Putting the interpreters into separate
> threads (as opposed to unix processes) would at best save a trivial
> amount of memory, and would add a lot of complexity.

I can't see how OS process handling can be less complex than threading.
Also, unix is the best example among OS-es, which tries to do things
nicely. But on windows, for instance, it not shares a DLL's instances
between processes resulting to have a copy of same DLL for different
processes.

>
> I don't see any savings for inter-image messaging either, but
> maybe I'm missing something there.
>

Well, for spoon 'imprinting' it requires that you copy object's
behaviors between images. By having an images kept in same process,
you can just put them as external references.
For same reasons, inter-image message sends can be done without
serializing objects, because all objects of all images are accessible
at any time.
Also, the things above don't restricts you to have inter-image
processing only for images kept in same process. But for me it's
obvious, that inter-image processing between images which are in same
process address space can be greatly simplified.

> Dave
>

--
Best regards,
Igor Stasenko AKA sig.

pwl

Re: Multy-core CPUs

In reply to this post by Jason Johnson-5

Hi Jason,

However, having just worked on a very large multi-threaded commercial
application in live production it is very clear that even single native
threaded Smalltalk applications have very nasty concurrency problems.


Yes, and these problems are exaggerated by (what I would call) the old
way of doing threaded programming, i.e. shared state, fine-grained
locking.

That may be the case from your - and others - perspective - and I have empathy for it -, however they are still valid techniques and others, such as myself, don't share your perspective.

Smalltalk should let people - the (educated) users - choose the mechanism of concurrency, not dictate it. In my humble opinion.

It's important that concurrency be taken into account at all levels of
an application's design, from the lowest levels of the virtual machine
through the end user experience (which is where concurrency on multiple
cores can really make a significant paradigm adjusting difference if
done well).


But if we truly to the n-core (n being 100 and above) world to improve
computation speed, and it looks as though we must, then this simply
isn't realistic for most programmers.  No more then manual memory
management was realistic for large applications.

The reality of processor designs like the Tile 64 require us to have all available techniques at our disposal.

Of the lessons learned from this complex real world application was that
almost ALL of the concurrency problems you have with multiple cores
running N native threads you have with a single core running one native
thread.


Depending on when execution can be interrupted, you have exactly the
same issues.

Exactly my point. Thus the solutions proposed as being "simplier" are just an illusion. They might be simplier in some cases but when you really need complex concurrency controls sometimes you need the other "dirtier" techniques at your disposal. Smalltalk is supposed to be a computer language with general power to control the computer and access it's true power and potential. Limiting the solution space by only implementing a limited set of concurrency primitives makes no sense. You'll just give the market to other lesser systems like Erlang and Java type systems.

When you have a single native thread running, say, ten to twenty
Smalltalk green threads (aka Smalltalk processes) the concurrency
problems can be a real nightmare to contemplate. Comprehension of what
is happening is exasperated by the limited debugging information
captured at runtime crash dumps.


But this depends largely on the model.  If you go away from the old,
tried and untrue method of fine-grained locking then debugging gets
much easier.  It's no problem at all, for example, in Erlang.
Sometimes when something is really really hard to do, it is a sign
that we are going about it the wrong way.

Yes, the model is always important.

Yes, sometimes that is a sign for improvement. Maybe I simply need detailed specifics of what you are talking about, however, if you really want Smalltalk to be general purpose - as I do - then it needs to cover the full domain of techniques for concurrency!

It is for the above reasons that I support many approaches be
implemented so that we can find out the best one(s) for various
application domains.


We know from long experience what fine-grained is like.  At least one
STM implementation is out there to try, and I believe the actor model
Erlang uses is either out there, or easy to set up.

Yes, we do. Sometimes it is what is needed.

For example, when building hard core operating systems.

It's unlikely that there is a one solution fits all needs type of paradigm.


No, but we can get the 99% like Garbage collection has.

Not really.

2) It's been mentioned that it would be straightforward to have squeak
start up multiple copies of the image (or even multiple different
images) in one process (task) memory space with each image having it's
own native thread and keeping it's object table and memory separate
within the larger memory space. This sounds like a very nice approach.
This is very likely practical for multi-core cpus such as the N-core
(where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

3) A single image running on N-cores with M-native threads (M may be
larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips
such as the Tile64 processor from Tilera.


If you mean by this a form of shared-state fine-grained programming
then I disagree whole heartedly.  We have long experience with
fine-grained in C++, Java, now C#, Smalltalk, on and on.  It just
can't be the path to the future.

There are many paths. I'm excited about the path that you are forging. All I ask is that you don't make that the only path to travel for people using Smalltalk.

Smalltalk needs to keep inventing the future.  Chasing this primitive
form of threading would put us firmly behind languages like C++ that
have been doing it this way for decades.

While I support Smalltalk inventing the future, keeping it from supporting valid concurrency techniques is ignoring the future (and the past) of what works!

However, we may need to rethink the entire architecture of the Smalltalk
virtual machine notions since the Tile 64 chip has capabilities that
radically alter the paradigm. Messages between processor nodes take less
time to pass between nodes then the same amount of data takes to be
written into memory. Think about that. It offers a new paradigm
unavailable to other N-Core processors (at this current time).


I wonder how Erlang will run on these machines.

I do as well.

All the best,

Peter

Petr Fischer-3

Re: Multy-core CPUs - communication among images

In reply to this post by David Mitchell-10

Just reading http://wiki.squeak.org/squeak/2978
Looks great, thanks for tip. pf

On 18.10.2007, at 18:40, David Mitchell wrote:

> Check out MaClientServer (developed for Magma, but useful on its own):
>
> http://liststest.squeakfoundation.org/pipermail/squeak-dev/2004-
> June/078767.html
>
>
> On 10/18/07, Petr Fischer <[hidden email]> wrote:
>> Hi. What do you recommend for communication among running images?
>> RemoteMessagingToolkit (RMT)?
>> Remote smalltalk (rST)?
>> Soap (ehm)?
>> other (not via. TCP/IP stack - for multiple images running locally)?
>>
>> Thanks, p.
>>
>> On 18.10.2007, at 16:18, Sebastian Sastre wrote:
>>
>>> Hey this sounds a an interesting path to me. If we think in nature
>>> and it's
>>> design, that images could be analog to cells of a larger body.
>>> Fragmentation
>>> keep things simple without compromising scalability. Natural facts
>>> concluded
>>> that is more efficient not to develop few supercomplex brain cells
>>> but to
>>> develop zillions of a far simpler brain cells, this is, that are
>>> just
>>> complex enough, and make them able to setup in an inimaginable
>>> super complex
>>> network: a brain.
>>>
>>> Other approach that also makes me conclude this is interesting is
>>> that we
>>> know that one object that is too smart smells bad. I mean it easily
>>> starts
>>> to become less flexible so less scalable in complexity, less
>>> intuitive (you
>>> have to learn more about how to use it), more to memorize, maintain,
>>> document, etc. So it is smarter but it could happen that it begins
>>> to become
>>> a bad deal because of beign too costly. Said so, if we think in
>>> those
>>> flexible mini images as objects, each one using a core we can scale
>>> enourmusly and almost trivially in this whole multicore thing and
>>> in a way
>>> we know it works.
>>>
>>> Other interesting point is faul tolerance. If one of those images
>>> happen to
>>> pass a downtime (because a power faliure on the host where they
>>> where
>>> running or whatever reason) the system could happen to feel it
>>> somehow but
>>> not being in a complete faiure because there are other images to
>>> handle
>>> demand. A small (so efficient), well protected critical system can
>>> coordinate measures of contention for the "crisis" an hopefully the
>>> system
>>> never really makes feel it's own crisis to the users.
>>>
>>> Again I found this is a tradeof about when to scale horizontally or
>>> vertically. For hardware, Intel and friends have scaled vertically
>>> (more
>>> bits and Hz for instance) for years as much as they where
>>> phisically able to
>>> do it. Now they reached a kind of barrier and started to scale
>>> horizontally
>>> (adding cores). Please don't fall in endless discussions, like the
>>> ones I
>>> saw out there, about comparing apples with bannanas because they
>>> are fruits
>>> but are not comparable. I mean it's about scaling but they are 2
>>> different
>>> axis of a multidimensional scaling (complexity, load, performance,
>>> etc).
>>>
>>> I'm thinking here as vertical being to make one squeak smarter to
>>> be capable
>>> to be trhead safe and horizontal to make one smart network of N
>>> squeaks.
>>>
>>> Sometimes one choice will be a good business and sometimes it will
>>> be the
>>> other. I feel like the horizontal time has come. If that's true, to
>>> invest
>>> (time, $, effort) now in vertical scaling could happen to be have a
>>> lower
>>> cost/benefit rate if compared to the results of the investiment of
>>> horizontal scaling.
>>>
>>> The truth is that this is all speculative and I don't know. But I
>>> do trust
>>> in nature.
>>>
>>> Cheers,
>>>
>>> Sebastian Sastre
>>>
>>>> -----Mensaje original-----
>>>> De: [hidden email]
>>>> [mailto:[hidden email]] En
>>>> nombre de Ralph Johnson
>>>> Enviado el: Jueves, 18 de Octubre de 2007 08:09
>>>> Para: The general-purpose Squeak developers list
>>>> Asunto: Re: Multy-core CPUs
>>>>
>>>> On 10/17/07, Steve Wart <[hidden email]> wrote:
>>>>> I don't know if mapping Smalltalk processes to native
>>>> threads is the
>>>>> way to go, given the pain I've seen in the Java and C# space.
>>>>
>>>> Shared-memory parallelism has always been difficult. People
>>>> claimed it was the language, the environment, or they needed
>>>> better training.
>>>> They always thought that with one more thing, they could "fix"
>>>> shared-memory parallelism and make it usable. But Java has
>>>> done a good job with providiing reasonable language
>>>> primitives. There has been a lot of work on making threads
>>>> efficient, and plenty of people have learned to write
>>>> mutli-threaded Java. But it is still way too hard.
>>>>
>>>> I think that shared-memory parallism, with explicit
>>>> synchronization, is a bad idea. Transactional memory might
>>>> be a solution, but it eliminates explicit synchronization. I
>>>> think the most likely solution is to avoid shared memory
>>>> altogether, and go with message passing.
>>>> Erlang is a perfect example of this. We could take this
>>>> approach in Smalltalk by making minimal images like Spoon,
>>>> making images that are designed to be used by other images
>>>> (angain, like Spoon), and then implementing our systms as
>>>> hundreds or thousands of separate images.
>>>> Image startup would have to be very fast. I think that this
>>>> is more likely to be useful than rewriting garbage collectors
>>>> to support parallelism.
>>>>
>>>> -Ralph Johnson
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>
>

smime.p7s (3K) Download Attachment

K. K. Subramaniam

Re: Multy-core CPUs

In reply to this post by pwl

On Thursday 18 October 2007 10:58 pm, Peter William Lount wrote:

> I propose that any distributed object messaging system that is developed
> for inter-image communication meet a wide range of criteria and
> application needs before being considered as a part of the upcoming next
> Smalltalk Standard. These criteria would need to be elucidated from the
> literature and the needs of members of the Smalltalk community and their
> clients.
>
> 2) It's been mentioned that it would be straightforward to have squeak
> start up multiple copies of the image (or even multiple different
> images) in one process (task) memory space with each image having it's
> own native thread and keeping it's object table and memory separate
> within the larger memory space. This sounds like a very nice approach.

I am not so sure. Squeak VM is a processor hog. Threads within VM will need
processor for bytecode interpretation. So a VM process can only scale to a
few threads before it starves for processor. On the downside, coding errors
could trash object memory across threads making testing and debugging
difficult. Will the juice be worth the squeeze?
> 3) A single image running on N-cores with M-native threads (M may be
> larger than N) is the full generalization of course.
> This may be the best way to take advantage of paradigm shaking chips
> such as the Tile64 processor from Tilera.
With single or few processors, we tend to "serialize" logic ourselves and
create huge linear programs. When processors are aplenty, we are free to
exploit inherent parallelism and create many small co-ordinating programs. So
the N-cores are a problem only for small N (around 8).
> However, we may need to rethink the entire architecture of the Smalltalk
> virtual machine notions since the Tile 64 chip has capabilities that
> radically alter the paradigm. Messages between processor nodes take less
> time to pass between nodes then the same amount of data takes to be
> written into memory. Think about that. It offers a new paradigm
> unavailable to other N-Core processors (at this current time).
True. Squeak's VM could virtualize display/sensors and spawn each project in
its own background process bound to a specific processor. The high-speed, low
latency paths are well-suited for UI events. Imagine running different
projects on each face of a rotating hexecontahedron :-).

Subbu

timrowledge

Re: Multy-core CPUs

On 19-Oct-07, at 1:06 PM, subbukk wrote:

> I am not so sure. Squeak VM is a processor hog.
No it isn't. It uses cpu when there is a process to run. If there is
no process to run, it sleeps.

It's the code in the image that gets to decide when processes run or
sleep.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Fractured Idiom:- MAZEL TON - Lots of luck

Jason Johnson-5

Re: Multy-core CPUs

In reply to this post by pwl

On 10/19/07, Peter William Lount <[hidden email]> wrote:
>
> That may be the case from your - and others - perspective - and I have
> empathy for it -, however they are still valid techniques and others, such
> as myself, don't share your perspective.

Sure, just as manual memory management is still valid and needed at
the lowest levels of programming. It's just not valid in most
applications.

> Smalltalk should let people - the (educated) users - choose the mechanism
> of concurrency, not dictate it. In my humble opinion.

But herein lies the problem. As discussed in the previous thread I
liked to; adding actor style message passing should be relatively
easy. Adding state transactional memory is doable. Making the Squeak
VM fully multi-threaded (natively) is going to be a lot of pain and
hard to get right. Just ask the Java VM team.

The pay back of adding this obsolete (except in the lowest level
cases) method of dealing with threading just isn't going to be worth
the pain to implement it.

> The reality of processor designs like the Tile 64 require us to have all
> available techniques at our disposal.

Why?

> Exactly my point. Thus the solutions proposed as being "simplier" are just
> an illusion.

? They are unquestionably simpler to the programmer who is using them
(which is what I meant).

> They might be simplier in some cases but when you really need
> complex concurrency controls sometimes you need the other "dirtier"
> techniques at your disposal.

This is like saying that Smalltalk is wrong to not expose manual
memory management to you for when you need to get "down and dirty".
It's simply not the case. You move to a higher level, just as we do
with all abstractions.

> Smalltalk is supposed to be a computer language
> with general power to control the computer and access it's true power and
> potential. Limiting the solution space by only implementing a limited set of
> concurrency primitives makes no sense. You'll just give the market to other
> lesser systems like Erlang and Java type systems.

This last sentence is quite odd, and to be frank not well reasoned at all.

First of all Erlang is not lesser, it is in fact currently the leader
in this area. It's funny though, that you suggest we would "give the
market over" to Erlang, since Erlang supports precisely *one* form of
concurrency: share-nothing message passing. Erlang can run in
multiple threads, but only the interpreter does that, and it's
transparent to the processes running in the VM.

Second of all, do you seriously think adding fine-grained threading to
Smalltalk automatically will cause it to take over the market? Of
course not, everyone would just say "what? You *just now* got that?",
except Erlang who would simply laugh at a language who put in so much
effort to add a feature that gets less relevant all the time.

Ironically, the fact of the matter is: the languages that make
threading *simpler to implementers* are going to be the ones who win
in the apparently coming multi-core world. Just ask Tim Sweeny.

> For example, when building hard core operating systems.

If you want to build a hard core operating system in Smalltalk you
have other more pressing issues to deal with then how threading is
accomplished. And actually there has been some work done in operating
systems that do not support this silly pthreads module we have today,
but use something closer to the Erlang model. It's interesting work,
but sadly one still can't get much traction with an OS other then
Windows, Mac or a Unix variant.

> Not really.

Aren't you the one who always requests well thought out arguments? :)
I really don't see what it is you think you lose not having this old,
out dated fine-grained threading model.

> There are many paths. I'm excited about the path that you are forging. All
> I ask is that you don't make that the only path to travel for people using
> Smalltalk.

Well, at the moment I'm forging nothing, only stating what I know of
the situation. At some later point I do intend to look at what's
required to make this happen in Squeak, but I have some other more
pressing issues for the present.

> While I support Smalltalk inventing the future, keeping it from supporting
> valid concurrency techniques is ignoring the future (and the past) of what
> works!

We have very different definitions for "works". Here you are using it
the same way someone would use for <insert crappy programming
language>. It works in the same way you can paint a house with a
tooth brush.

K. K. Subramaniam

Re: Multy-core CPUs

In reply to this post by timrowledge

On Saturday 20 October 2007 1:44 am, tim Rowledge wrote:
> On 19-Oct-07, at 1:06 PM, subbukk wrote:
> > I am not so sure. Squeak VM is a processor hog.
>
> No it isn't. It uses cpu when there is a process to run. If there is
> no process to run, it sleeps.
> It's the code in the image that gets to decide when processes run or
> sleep.
I was referring to VM process executing bytecodes in images. Bytecode
interpretation is a cpu intensive process.

For instance, the Linux VM running latest etoy-dev consumes a steady 7-12% of
cpu if I just drag a polygon object and make it do a forward/turn loop about
once a second.

Still, Squeak is a lot smaller and more efficient compared to other
interpreters.

Subbu

Bert Freudenberg

Re: Multy-core CPUs

On Oct 19, 2007, at 23:53 , subbukk wrote:

> On Saturday 20 October 2007 1:44 am, tim Rowledge wrote:
>> On 19-Oct-07, at 1:06 PM, subbukk wrote:
>>> I am not so sure. Squeak VM is a processor hog.
>>
>> No it isn't. It uses cpu when there is a process to run. If there is
>> no process to run, it sleeps.
>> It's the code in the image that gets to decide when processes run or
>> sleep.
> I was referring to VM process executing bytecodes in images. Bytecode
> interpretation is a cpu intensive process.
>
> For instance, the Linux VM running latest etoy-dev consumes a
> steady 7-12% of
> cpu if I just drag a polygon object and make it do a forward/turn
> loop about
> once a second.

This is probably much more the fault of Morphic and Etoys than the
VM's. Would that we had time to start optimizing for OLPC ... but
even then it's not certain how far you can get with the current
Morphic design.

- Bert -

K. K. Subramaniam

Re: Multy-core CPUs

On Saturday 20 October 2007 3:41 am, Bert Freudenberg wrote:

> > For instance, the Linux VM running latest etoy-dev consumes a
> > steady 7-12% of
> > cpu if I just drag a polygon object and make it do a forward/turn
> > loop about
> > once a second.
>
> This is probably much more the fault of Morphic and Etoys than the
> VM's. Would that we had time to start optimizing for OLPC ... but
> even then it's not certain how far you can get with the current
> Morphic design.

Being cpu-bound is not a sin :-) per se. Morphic is quite useful as is. I get
more work in Morphic than in some of the other graphical apps. Morphic 3
deserves a separate discussion thread.

My point was that a Squeak VM process with cpu-bound threads will max out a
core with just a few threads. On large multi-core processors, we could scale
better if VMs can spawn out into different communicating lightly-threaded
processes rather than a single heavily-threaded process.

Subbu

Igor Stasenko

Re: Multy-core CPUs

As for Multi-Core, see this: http://rt07.raytracing.nl/

My dream is to see this running interactively and
implemented completely in smalltalk.

--
Best regards,
Igor Stasenko AKA sig.

Bergel, Alexandre

Re: Multy-core CPUs

In reply to this post by Hans-Martin Mosner

> This would only make things more complicated since then the primitives
> would have to start parallel native threads working on the same object
> memory.
> The problem with native threads is that the current object memory
> is not
> designed to work with multiple independent mutator threads. There
> are GC
> algorithms which work with parallel threads, but AFAIK they all have
> quite some overhead relative to the single-thread situation.
[...]

Dear Hans-Martin,

Thanks for your clear explanation. It is really instructive.

Regards,
Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

pwl

Re: Multy-core CPUs

In reply to this post by Igor Stasenko

Igor Stasenko wrote:
> As for Multi-Core, see this: http://rt07.raytracing.nl/
>
> My dream is to see this running interactively and
> implemented completely in smalltalk.
>

Hi Igor,

I for one like your vision! Bring it on!

All the best,

Peter William Lount
[hidden email]

pwl

Re: Multy-core CPUs

In reply to this post by Jason Johnson-5

Hi Jason,

 That may be the case from your - and others - perspective - and I have
empathy for it -, however they are still valid techniques and others, such
as myself, don't share your perspective.


Sure, just as manual memory management is still valid and needed at
the lowest levels of programming.  It's just not valid in most
applications.

I've not yet seen any serious discussion of the case for your point of view which bridges the gap of complexity in concurrency as automatic memory management magically does. Please illuminate us with specific and complete details of your proposal for such a breakthrough in concurrency complexity.

Making the Squeak VM fully multi-threaded (natively) is going to be a lot of pain and
hard to get right.  Just ask the Java VM team.

Then either the hard work needs to be done, or the VM needs to be completely rethought.

The pay back of adding this obsolete (except in the lowest level
cases) method of dealing with threading just isn't going to be worth
the pain to implement it.

What are you going on about? What techniques are you saying are obsolete exactly? How are they obsolete?

 The reality of processor designs like the Tile 64 require us to have all
available techniques at our disposal.


Why?

Why? 64 processors on a single chip - with 128 coming next year and 1024 planned - that's why.

With that many processors on a single core it's important that systems and applications run smoothly taking advantage of all the opportunities for parallelism. This has many implications, some of which work better on one method of concurrency than on another. One size of shoe doesn't fit all solutions.

 Exactly my point. Thus the solutions proposed as being "simplier" are just
an illusion.


?  They are unquestionably simpler to the programmer who is using them
(which is what I meant).

You've missed the point. Even the simplest of concurrency methods proposed so far by people in the Squeak thread lead to the most complex concurrency control error scenarios. That's one of the points. Another is that the simplest of concurrency models can't handle all the scenarios.

As asked above please describe in detail and completely the proposed "simple" approach to concurrency that is being proposed. Links to appropriate descriptions if it exist would also be fine (unless it contains too much extraneous text).

They might be simplier in some cases but when you really need
complex concurrency controls sometimes you need the other "dirtier"
techniques at your disposal.


This is like saying that Smalltalk is wrong to not expose manual
memory management to you for when you need to get "down and dirty".
It's simply not the case.  You move to a higher level, just as we do
with all abstractions.

Nonsense it's not like saying that at all.

Sometimes moving to a higher level abstraction isn't the solution. Sometimes moving laterally provides the insight for the solution. Often moving down to the lowest levels and rethinking how they work provides the solutions without higher levels of abstraction.

A case in point for clarity: Exokernels. They remove higher levels of abstraction so that we have access to the power of the real hardware.

"The idea behind exokernels is to force as few abstractions as possible on developers, enabling them to make as many decisions as possible about hardware abstractions." - http://en.wikipedia.org/wiki/Exokernel.

The problem with concurrency is that it's much more complex by orders of magnitude than garbage collection. Much more complex a beast, so much more so that the comparison breaks down.

Stephen Wolfram's work on Cellular Automata (page 27 of A New Kind of Science http://www.wolframscience.com/nksonline/page-27) proves, yes, proves that even (some) simple systems can generate results (i.e. behaviour) that is as complex as any generated by a complex system.

Wolfram states: "The picture [of a cellular automation rule 30] shows what happens when one starts with just one black cell and then applies this rule over and over again. And what one sees is something quite startling - and probably the single most surprising scientific discovery I have ever made. Rather than getting a simple regular pattern as we might expect, the cellular automation instead produces a pattern that seems extremely irregular and complex." Most of the rest of his book is exploring just how complex this behavior really is.

Often this feature of simple systems is just what we want to take advantage of. Certainly Smalltalk leverages the power of this simplicity with it's simple syntax. So if there is a way (or ways) to have simple concurrency that is effective I'm all for it.

However, there is a dark side to Stephen Wolfram's discovery as well that needs addressing and that I'm attempting to point out here. The dark side is that simple systems can generate complex results. Complex results (beyond comprehension) is just want we don't want when we enter the world of concurrency. The rub is that there isn't anyway to avoid the complex results as far as I can see for simple systems can generate complexity as complex as complex systems.

I fear that the solutions space isn't as straightforward as having a set of simplified concurrency primitives as proposed by some for Smalltalk. The reality is harsher than you think. The solution space requires more than a simple set of concurrency primitives.

Smalltalk is supposed to be a computer language
with general power to control the computer and access it's true power and
potential. Limiting the solution space by only implementing a limited set of
concurrency primitives makes no sense. You'll just give the market to other
lesser systems like Erlang and Java type systems.


This last sentence is quite odd, and to be frank not well reasoned at all.

Thank you for calling it "odd". That's what happens when you think different, at first people think it odd. I often encourage people to think different as Apple does in their marketing of a few years ago.

As for how it's reasoned, yes, it is well reasoned even if you don't get it at first or even if I wasn't clear about it. Let me attempt to clarify the reasoning for you.

First of all Erlang is not lesser, it is in fact currently the leader
in this area.

How is that?

It's funny though, that you suggest we would "give the
market over" to Erlang, since Erlang supports precisely *one* form of
concurrency:  share-nothing message passing.

Yes, but Erlang is a purely function non-object-oriented non-keyword-message passing language.

While it has a form of message passing it's not the same as Smalltalk's. It's simply passing parameters to functions that run in separate green or native threads.

Yes it is impressive what they have accomplished, but it isn't the be all and end all.

Erlang can run in multiple threads, but only the interpreter does that, and it's
transparent to the processes running in the VM.

Every system needs improvement.

Second of all, do you seriously think adding fine-grained threading to
Smalltalk automatically will cause it to take over the market?

No I don't think that it will "cause" Smalltalk to "automatically take over the market". Of course not! Nor did I imply that or intend to imply that.

I simply think that having all the tools at our disposal is important to maintaining and growing market share.

Ironically, the fact of the matter is: the languages that make
threading *simpler to implementers* are going to be the ones who win
in the apparently coming multi-core world.

"Simpler to implement" concurrency leads to just as difficult to manage software systems as more complex well thought out concurrency. In fact, I think, that making it simplistic will lead many programmers to implement software that is impossible to debug without enormous efforts. The problem is that even the simplest concurrency leads to the nastiest and most complex bugs in software.

For example what could be simpler than Smalltalk's forking of blocks of code? It sure seems simple doesn't it; just a simple message to a block of code: "[ ... ] fork". In many cases it is simple and no harm is done as the code will execute and everything will be good and consistent afterwards. The problems begin when you "fork" code that wasn't designed to be forked and run concurrently. Then all hell can break loose and boom your program crashes unexpectedly with a mystery. It gets even worse when it only happens occasionally - try figuring it out then.

I just working on fixing a "fork" happy approach in a major Smalltalk production application. In the end we took out many of the forking statements or fixed them in other ways. That application was running in a Smalltalk with ONE native thread for all the Smalltalk processes (aka green lightweight threads). So much for simple threading being easier.

Part of the problem was the programmers being "fork" happy. Part of the problem is that the Smalltalk class library isn't designed to be thread safe. Very few class libraries are. Most of the problem is that the concurrency simply wasn't thought out well.

I don't see how you can have a simple concurrency threading model solve the problems of when and how to use concurrency properly to avoid the many pitfalls of threading. If you can see that please illuminate it for the rest of us.

Just ask Tim Sweeny.

Do you mean Tim Sweeney the game developer? http://en.wikipedia.org/wiki/Tim_Sweeney_(game_developer)

Alright even though I don't know Tim I'll take the bait and see where it goes, Tim Sweeney (or Sweeny) what do you think? (If someone who knows him would be kind enough to pass this thread on to him or post his thoughts on this topic that would be great - thanks).

 For example, when building hard core operating systems.


If you want to build a hard core operating system in Smalltalk you
have other more pressing issues to deal with then how threading is
accomplished.

Yes, there are many issues in implementing an operating system in a language such as Smalltalk. In exploring these issues ZokuScript was born. Real native processes with protected memory spaces and multiple threads are just one of these important issues. Performance is another.

Threading including native threading on one core or N cores (where N can be large) under existing operating systems is very important to the future of Smalltalk.

I really don't see what it is you think you lose not having this old,
out dated fine-grained threading model.

For clarity purposes, please define in detail what you mean when you use the phrase "fine-grained threading model" so that we can make sure that we are on the same page.

 There are many paths. I'm excited about the path that you are forging. All
I ask is that you don't make that the only path to travel for people using
Smalltalk.


Well, at the moment I'm forging nothing, only stating what I know of
the situation.  At some later point I do intend to look at what's
required to make this happen in Squeak, but I have some other more
pressing issues for the present.

 While I support Smalltalk inventing the future, keeping it from supporting
valid concurrency techniques is ignoring the future (and the past) of what
works!


We have very different definitions for "works".  Here you are using it
the same way someone would use for <insert crappy programming
language>.  It works in the same way you can paint a house with a
tooth brush.

You seem to think that there is some magical breakthrough in the world of concurrency that is on par with the magic of automatic garbage collection. I'd sure love to know what that is and how it avoids the pitfalls with even simple concurrency models and issues that occur in real world projects. If you could, please describe in full detail and completely with real world examples. Thanks very much.

All the best,

Peter William Lount
[hidden email]
Smalltalk.org Editor

pwl

Re: Multy-core CPUs

In reply to this post by K. K. Subramaniam

Hi,

I propose that any distributed object messaging system that is developed
for inter-image communication meet a wide range of criteria and
application needs before being considered as a part of the upcoming next
Smalltalk Standard. These criteria would need to be elucidated from the
literature and the needs of members of the Smalltalk community and their
clients.

2) It's been mentioned that it would be straightforward to have squeak
start up multiple copies of the image (or even multiple different
images) in one process (task) memory space with each image having it's
own native thread and keeping it's object table and memory separate
within the larger memory space. This sounds like a very nice approach.

I am not so sure. Squeak VM is a processor hog. Threads within VM will need 
processor for bytecode interpretation. So a VM process can only scale to a 
few threads before it starves for processor.

It's not the byte codes that cause a lot of cpu usage. It's how many processor instructions that are being executed that cause that. If you run lots of code than you can expect higher cpu usage. The more dense the packing of capability into the computer language library of objects the more processor instruction may be executed. To find out what Squeak is doing when it's chewing through while executing the ~12% cpu you mentioned elsewhere you'd have to trace the code. Then you'd see exactly what's going on. Tracing the code at two levels would be helpful, first at the Smalltalk level and then at the VM primitive byte code level. The byte codes may be fine while the image you've deployed might be doing many things that you really don't need for your particular application needs.

On the downside, coding errors 
could trash object memory across threads making testing and debugging 
difficult.

Yes. The point that I'm making is that even with so called simple concurrency models these errors can happen. Basically there is no such think as hassle free simple concurrency when it comes to computers!!! Simple concurrency is a myth and a lie. Don't fall for it.

Will the juice be worth the squeeze?

That depends on what you are using your computer for. If it's an application that benefits from massive parallelism then yes it is worth the squeeze. If you have a very serial sort of application, like a series of complex dependent computations then it might not be worth the squeeze at all.

If you have a complex business application that is highly threaded - running say ten to twenty Smalltalk processes - on a single native thread then it might be worth the squeeze if the users can work noticeably faster without incurring concurrency nightmares then yes it's worth the squeeze. Otherwise, no it's not worth is as users get very frustrated.

3) A single image running on N-cores with M-native threads (M may be
larger than N) is the full generalization of course.
This may be the best way to take advantage of paradigm shaking chips
such as the Tile64 processor from Tilera.

With single or few processors, we tend to "serialize" logic ourselves and 
create huge linear programs. When processors are aplenty, we are free to 
exploit inherent parallelism and create many small co-ordinating programs. So 
the N-cores are a problem only for small N (around 8).

Eh? Why only "small N (around 8)? Please illuminate further.

However, we may need to rethink the entire architecture of the Smalltalk
virtual machine notions since the Tile 64 chip has capabilities that
radically alter the paradigm. Messages between processor nodes take less
time to pass between nodes then the same amount of data takes to be
written into memory. Think about that. It offers a new paradigm
unavailable to other N-Core processors (at this current time).

True. Squeak's VM could virtualize display/sensors and spawn each project in 
its own background process bound to a specific processor. The high-speed, low 
latency paths are well-suited for UI events. Imagine running different 
projects on each face of a rotating hexecontahedron :-)

That would be cool.

The power of the Tile-64 processor from Tilera is that processors can form on the fly arbitrary "compute streams" where data is computed in one processor and passed along to another without ever touching RAM. Oh, WOW! This means for example the six typical stages of rendering could be implemented on six or six * N processors in the Tile-N (where N=36, 64, 128, 512, 1024 or 4096 or more processors). WOW! Now how would you have the Smalltalk system generate objects and messaging binary code from Smalltalk source code to model and program that? How? Let's do it! This requires a shift in paradigm. This requires a shift in your thinking. This requires a shift in my thinking. Think it through. What solutions can you come up with?

All the best,

Peter

Ralph Johnson

Re: Multy-core CPUs

In reply to this post by pwl

> I've not yet seen any serious discussion of the case for your point of view
> which bridges the gap of complexity in concurrency as automatic memory
> management magically does. Please illuminate us with specific and complete
> details of your proposal for such a breakthrough in concurrency complexity.

Peter, Jason is not saying that eliminating shared memory will make
concurrent programming as easy as automatic memory management. What
he said is that, just like a system that mostly uses automatic memory
management might use manual memory management in a few places, so a
system that mostly uses message passing for concurrency might use
threads and semaphores in a few places.
>
>
> Making the Squeak VM fully multi-threaded (natively) is going to be a lot
> of pain and
> hard to get right. Just ask the Java VM team.
>
>
> Then either the hard work needs to be done, or the VM needs to be
> completely rethought.

What Jason said was that, for any VM design, making the VM fully
mutl-threaded is hard. It has nothing to do with Squeak or with the
Squeak VM.

> The pay back of adding this obsolete (except in the lowest level
> cases) method of dealing with threading just isn't going to be worth
> the pain to implement it.
>
>
> What are you going on about? What techniques are you saying are obsolete
> exactly? How are they obsolete?

He is saying that shared memory parallel programming is obsolete. It
doesn't scale. By the time we get to thousands of processors (which
will be only a decade) then it won't work at all. Experience shows
that it doesn't work very well know even when hardware can support it,
because it is just too hard to make correct pograms using that model.

Jason's point, which I agree with, is that programming with threads in
shared memory and using semaphores (or monitors, or critical sections)
to eliminate interference is a bad idea. Parallel programming with no
shared memory, i.e. by having processes only communicate with shared
memory, is much easier to program.

-Ralph

pwl

Re: Multy-core CPUs

Hi Ralph,

It's good to converse again with you. It's been many years.

 I've not yet seen any serious discussion of the case for your point of view
which bridges the gap of complexity in concurrency as automatic memory
management magically does. Please illuminate us with specific and complete
details of your proposal for such a breakthrough in concurrency complexity.


Peter, Jason is not saying that eliminating shared memory will make
concurrent programming as easy as automatic memory management.

That's good for that makes no sense given real world experience with systems that don't use shared memory as a basis for their currency control.

 What he said is that, just like a system that mostly uses automatic memory
management might use manual memory management in a few places, so a
system that mostly uses message passing for concurrency might use
threads and semaphores in a few places.

Ok, that sounds nice and rosey but so far. Can someone please explain in full detail and completely how it would actually work? Thanks.

 Making the Squeak VM fully multi-threaded (natively) is going to be a lot
of pain and hard to get right. Just ask the Java VM team.


 Then either the hard work needs to be done, or the VM needs to be
completely rethought.


What Jason said was that, for any VM design, making the VM fully
mutl-threaded is hard.  It has nothing to do with Squeak or with the
Squeak VM.

Yes, I'm clear about that. That's why I said that the hard work needs to be done by those of us who are knowledgeable and more experienced than the typical programmer using Smalltalk. We are supposed to be systems people aren't we? We are supposed to do the hard work so that others have an easier time aren't we? Of course if we can avoid the hard work then I'm all for it. However when it comes to concurrency control and program consistency it just isn't possible to avoid the hard work.

 The pay back of adding this obsolete (except in the lowest level
cases) method of dealing with threading just isn't going to be worth
the pain to implement it.


 What are you going on about? What techniques are you saying are obsolete
exactly? How are they obsolete?


He is saying that shared memory parallel programming is obsolete.  It
doesn't scale.  By the time we get to thousands of processors (which
will be only a decade) then it won't work at all.  Experience shows
that it doesn't work very well know even when hardware can support it,
because it is just too hard to make correct pograms using that model.

So, two processes that share a chunk of RAM memory across their protected memory spaces is obsolete in your view?

What about two or N light weight threads (aka Smalltalk processes) in one memory space sharing objects in that single memory space? Is that obsolete as well?

Jason's point, which I agree with, is that programming with threads in
shared memory and using semaphores (or monitors, or critical sections)
to eliminate interference is a bad idea.

Ok. I get that it's complex and that there are scaling issues with some of the techniques when N-core is very large. I don't see how it's a bad idea though - I don't see how it's any worse than the alternative that's being suggested.

Parallel programming with no
shared memory, i.e. by having processes only communicate with shared
memory, is much easier to program.

So you mean one thread per protected memory space? No light weight threads (since they are using shared memory by definition)? Not more than one Smalltalk process per protected memory space? Just one thread of execution for each operating system process/task? So if I want one hundred Smalltalk processes running in my application I will need one hundred operating system processes?

All objects are to be copied across the memory space boundaries via serialized objects or via references (for copying later or for return messages back to the originating node later)?

No "active" object running in it's own operating system process can respond to more than one inbound message at once? Since it only has one thread/Smalltalk process to avoid shared memory it must complete all the work that the current message send caused. What about deadlock avoidance in your model?

Maybe I'm misunderstanding your definitions but it seems to me that that is what is implied by what you are saying.

To ensure clarity on this complex topic please provide definitions and full explanations with examples. Please be very detailed. Thanks very much.

All the best,

Peter

12345 ... 10