Concurrent Futures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
152 messages Options
12345678
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
On 10/31/07, Rob Withers <[hidden email]> wrote:
>
> I am trying to ensure that objects in one Vat don't get directly manipulated
> by objects in other Processes.

I don't see this part as too difficult, since you can't modify what
you can't get a reference to.  But the issue is mutable globals.  The
most obvious example here are class side variables: if multiple vats
can see the same class and that class has class side variables that it
mutates, then that's an issue.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Igor Stasenko
On 10/31/07, Igor Stasenko <[hidden email]> wrote:
> I don't know what to add to above. I just said that we should use
> approaches which is best fit for architecture where our project(s)
> will run on.
> Of course what is best fit is arguable. But i don't think we should
> drop a shared memory model support when we building a system on top of
> architecture which haves it.

So what we can build must be constrained by an implementation detail
that's not even visible to us [1]?  If I had seen this on a C++ list I
wouldn't be so surprised but Smalltalk? :)

[1]  Obviously we don't because Intel and AMD don't handle shared
memory access the same way.  AMD already does something a bit closer
to message passing:
http://www.digit-life.com/articles2/cpu/rmma-numa.html

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Jason Johnson-5
On 10/31/07, Jecel Assumpcao Jr <[hidden email]> wrote:
> I would like to mention some of my previous work in this area:
>
> - tinySelf 1 (1996)
> http://www.lsi.usp.br/~jecel/tiny.html#rel1
>
<snip>
>
> - 64 node Smalltalk machine (1992)
> http://www.lsi.usp.br/~jecel/ms8702.html
>
<snip>
>
> - operating system in an Objective-C like language (1988)
> http://www.lsi.usp.br/~jecel/atos.html (this page has download links but
> the text still hasn't been written)
>
<snip>
>
> -- current model --
>
<snip>
>

This thread is really getting good! :)

> Speaking of hardware, I would like to stress how fantastically slow
> (relatively speaking) main memory is these days. If I have a good
> network connecting processor cores in a single chip then I can probably
> send a message from one to another, get a reply, send a second message
> and get another reply in the time that it takes to read a byte from
> external RAM. So we should start thinking of DDR SDRAM as a really fast
> disk to swap objects to/from and not as a shared memory. We should start
> to take message passing seriously.
>
> -- Jecel

Wonderful point.  I suppose a big part of the problem I have had in
this thread has been not accepting that a lot of people don't realize
the reality of your comment.  I'm not proposing to abandon the
shared-state programming model *only* because it's too hard to program
in.  My *biggest* motivation is the fact that it *wont scale* to
tomorrows systems.

I would just hate to see Squeak put a ton of work in just so we can
run out and announce that we have "real" concurrency, only to find out
that everyone else already discovered it can't scale and moved on.

Smalltalk for me is about being *ahead* of other platforms.  Not decades behind.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Igor Stasenko
On 10/31/07, Igor Stasenko <[hidden email]> wrote:
>
> Then i wonder, why they don't drop the idea of having shared memory at all?

Oh they will.  If you read up on all the incredible hoops they jump
through to make this model work on 2+ cores now, it has to be obvious
that this can't scale indefinitely.

Intel has a lot of resources, and much invested in this model so they
may try to push it further then it should go.  But I don't see that
having a good end for them.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Joshua Gargus-2
On 10/31/07, Joshua Gargus <[hidden email]> wrote:
>
> It is unreasonable to assume that ad-hoc, fine-grained sharing of
> objects between processors will give you the fastest performance on
> the upcoming machines with 100s and 1000s of cores.  What about
> memory locality and cache coherency?  It is not cheap to juggle an
> object between processors now, and it will become more expensive as
> the number of cores increase.

Great point.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Igor Stasenko
On 10/30/07, Igor Stasenko <[hidden email]> wrote:
>
> Most of reasons why CPU not utilized at 100% is using a blocking I/O
> calls. Then a simplest solution to not use them and instead of blowing
> up the number of threads use asynchronous I/O . Most major platforms
> support asynchronous I/O and there are many libraries which support
> async data handling almost in each area we need for. We just need to
> build on top of them.

Good point.  How many kinds of I/O in Squeak is currently blocking?  I
think I heard networking blocks, what about disk?

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Igor Stasenko
On 10/30/07, Igor Stasenko <[hidden email]> wrote:
> And if you read in previous
> discussions, they proven that there can't be a single generic solution
> for all problems which raising when we go in parallel world.

Uh, no such thing was *proven* in any of these discussions.  It was
strongly suggested, and I even personally conceded that it may be the
case.  But that is by no means *proof*.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Andreas.Raab
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
> Then i wonder, why they don't drop the idea of having shared memory at all?

The major reason is cost, not performance. With a single shared memory
subsystem you can allocate memory dynamically to the cores as you need
it. Not using shared memory at all means you need to pre-allocate memory
for each core. Which leaves you with two options: Either over-allocate
memory for each core (expensive) or assume that the programmer can keep
relatively small caches utilized effectively. The PS2 had that approach
and failed miserably (this is one of the reasons why it took so long
before the games could actually utilize its full power - keeping those
caches filled was a major pain in the neck despite the bandwidth and
computational power available).

The same effect can be seen with GPUs - the cheapest (usually Intel)
GPUs utilize shared (main) memory to drive cost down. But that's all. It
doesn't mean that just because Intel likes cheap graphics they're the
fastest (in fact, the precise opposite is true - lots of VRAM and a fast
bus outperforms shared memory by far).

> Each CPU then could have own memory, and they could interact by
> sending messages in network-style fashion. And we then would write a
> code which uses such architecture in best way. But while this is not
> true, should we assume that such code will work faster than code which
> 'knows' that there is a single shared memory for all CPUs and uses
> such knowledge in best way?

No. But the opposite isn't necessarily true either. We shouldn't assume
either way, we should measure and compare. And not only cycles but also
programming effort, correctness and robustness.

> I thought that goals was pretty clear. We have a single image. And we
> want to run multiple native threads upon it to utilize all cores of
> multi-core CPU's.
> What we currently have is a VM, which can't do that. So, i think, any
> other , even naively implemented, which can do, is better than
> nothing.
> If you have any ideas how such VM would look like i'm glad to hear.

See my other post.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Klaus D. Witzel
In reply to this post by Andreas.Raab
This is a very good plan, Andreas.

It will immediately make available the power of multicore CPUs to software  
researchers and [perhaps] to production environments. I believe that can  
bring a new class of users (and their experiments and their experience) to  
the Squeak community.

For the primitive that forks a new instance of the interpreter running in  
the same object memory, wouldn't it be necessary to do something on the GC  
side as well.

Anyways, your suggestion is the top entry in the category of the best  
*practical* solution.

/Klaus

On Wed, 31 Oct 2007 04:53:11 +0100, Andreas Raab wrote:

> Igor Stasenko wrote:
>> If you have any ideas how such VM would look like i'm glad to hear.
>
> Okay, so Josh convinced me to write up the ideas. The main problem as I  
> see it with a *practical* solution to the problem is that all of the  
> solutions so far require huge leaps and can't be implemented  
> step-by-step (which almost certainly dooms them to failure).
>
> So what do we know and what do we actually all pretty much agree on?  
> It's that we need to be able to utilize multiple cores and that we need  
> a practical way to get there (if you disagree with the latter this  
> message is not meant for you ;-) Running multiple processes is one  
> option but it is not always sufficient. For example, some OSes would  
> have trouble firing off a couple of thousand processes whereas the same  
> OS may have no problem at all with a couple of thousand threads in one  
> process. To give an example, starting a thread on Windows cost somewhere  
> in the range of a millisecond which is admittedly slow, but still orders  
> of magnitude faster than creating a new process. Then there are issues  
> with resource sharing (like file handles) which are practically  
> guaranteed not to work across process boundaries etc. So while there are  
> perfectly good reasons to run multiple processes, there are reasons just  
> as good to wanting to run multiple threads in one process.
>
> The question then is, can we find an easy way to extend the Squeak VM to  
> run multiple threads and if so how? Given the simplistic nature of the  
> Squeak interpreter, there is actually very little global state that is  
> not encapsulated in objects on the Squeak heap - basically all the  
> variables in class interpreter. So if we would put them into state that  
> is local to each thread, we could trivially run multiple instances of  
> the byte code interpreter in the same VM. This gets us to the two major  
> questions:
>
> * How do we encapsulate the interpreter state?
> * How do we deal with primitives and plugins?
>
> Let's start with the first one. Obviously, the answer is "make it an  
> object". The way how I would go about is by modifying the CCodeGenerator  
> such that it generates all functions with an argument of type "struct  
> VM" and that variable accesses prefix things properly and that all  
> functions calls pass the extra argument along. In short, what used to be  
> translated as:
>
> sqInt primitiveAdd(void) {
>    integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>    /* etc. */
> }
>
> will then become something like here:
>
> sqInt primitiveAdd(struct VM *vm) {
>    integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>    /* etc. */
> }
>
> This is a *purely* mechanical step that can be done independent of  
> anything else. It should be possible to generate code that is entirely  
> equivalent to todays code and with a bit of tweaking it should be  
> possible to make that code roughly as fast as we have today (not that I  
> think it matters but understanding the speed difference between this and  
> the default interpreter is important for judging relative speed  
> improvements later).
>
> The above takes care about the interpreter but there are still  
> primitives and plugins that need to be dealt with. What I would do here  
> is define operations like ioLock(struct VM) and ioUnlock(struct VM) that  
> are the effective equivalent of Python's GIL (global interpreter lock)  
> and allow exclusive access to primitives that have not been converted to  
> multi-threading yet. How exactly this conversion should happen is  
> deliberately left open here; maybe changing the VMs major proxy version  
> is the right thing to do to indicate the changed semantics. In any case,  
> the GIL allows us to readily reuse all existing plugins without having  
> to worry about conversion early on.
>
> So now we've taken care of the two major parts of Squeak: We have the  
> ability to run new interpreters and we have the ability to use  
> primitives. This is when the fun begins, because at this point we have  
> options:
>
> For example, if you are into shared-state concurrency, you might  
> implement a primitive that forks a new instance of the interpreter  
> running in the same object memory that your previous interpreter is  
> running in.
>
> Or, and that would be the path that I would take, implement a primitive  
> that loads an image into a new object memory (I can explain in more  
> detail how memory allocation needs to work for that; it is a fairly  
> straightforward scheme but a little too long for this message) and run  
> that interpreter.
>
> And at this point, the *real* fun begins because we can now start to  
> define the communication patterns we'd like to use (initially sockets,  
> later shared memory or event queues or whatever else). We can have tiny  
> worker images that only do minimal stuff but we can also do a Spoon-like  
> thing where we have a "master image" that contains all the code possibly  
> needed and fire off micro-images that (via imprinting) swap in just the  
> code they need to run.
>
> [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
>
> Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-)  
> Anyway, the main thing I'm trying to say in the above is that for a  
> *practical* solution to the problem there are some steps that are pretty  
> much required whichever way you look at it. And I think that regardless  
> of your interest in shared state or message passing concurrency we may  
> be able to define a road that leads to interesting experiments without  
> sacrificing the practical artifact. A VM built like described in the  
> above would be strictly a superset of the current VM so it would be able  
> to run any current images and leave room for further experiments.
>
> Cheers,
>    - Andreas
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Jason Johnson-5
On 31/10/2007, Jason Johnson <[hidden email]> wrote:

> On 10/31/07, Igor Stasenko <[hidden email]> wrote:
> > I don't know what to add to above. I just said that we should use
> > approaches which is best fit for architecture where our project(s)
> > will run on.
> > Of course what is best fit is arguable. But i don't think we should
> > drop a shared memory model support when we building a system on top of
> > architecture which haves it.
>
> So what we can build must be constrained by an implementation detail
> that's not even visible to us [1]?  If I had seen this on a C++ list I
> wouldn't be so surprised but Smalltalk? :)
>
But why smalltalk? I'm talking about VM, which is much closer to
hardware than smalltalk. As you know a squeak VM are compiled from C
sources.
You are free to use any model you want in smalltalk, but for VM?

> [1]  Obviously we don't because Intel and AMD don't handle shared
> memory access the same way.  AMD already does something a bit closer
> to message passing:
> http://www.digit-life.com/articles2/cpu/rmma-numa.html
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Andreas.Raab
On 31/10/2007, Andreas Raab <[hidden email]> wrote:

> Igor Stasenko wrote:
> > Then i wonder, why they don't drop the idea of having shared memory at all?
>
> The major reason is cost, not performance. With a single shared memory
> subsystem you can allocate memory dynamically to the cores as you need
> it. Not using shared memory at all means you need to pre-allocate memory
> for each core. Which leaves you with two options: Either over-allocate
> memory for each core (expensive) or assume that the programmer can keep
> relatively small caches utilized effectively. The PS2 had that approach
> and failed miserably (this is one of the reasons why it took so long
> before the games could actually utilize its full power - keeping those
> caches filled was a major pain in the neck despite the bandwidth and
> computational power available).
>
> The same effect can be seen with GPUs - the cheapest (usually Intel)
> GPUs utilize shared (main) memory to drive cost down. But that's all. It
> doesn't mean that just because Intel likes cheap graphics they're the
> fastest (in fact, the precise opposite is true - lots of VRAM and a fast
> bus outperforms shared memory by far).
>
> > Each CPU then could have own memory, and they could interact by
> > sending messages in network-style fashion. And we then would write a
> > code which uses such architecture in best way. But while this is not
> > true, should we assume that such code will work faster than code which
> > 'knows' that there is a single shared memory for all CPUs and uses
> > such knowledge in best way?
>
> No. But the opposite isn't necessarily true either. We shouldn't assume
> either way, we should measure and compare. And not only cycles but also
> programming effort, correctness and robustness.
>
> > I thought that goals was pretty clear. We have a single image. And we
> > want to run multiple native threads upon it to utilize all cores of
> > multi-core CPU's.
> > What we currently have is a VM, which can't do that. So, i think, any
> > other , even naively implemented, which can do, is better than
> > nothing.
> > If you have any ideas how such VM would look like i'm glad to hear.
>

If we look at multi-core problem as networking problem then, to what i
see, a shared memory helps us in minimizing traffic between cores.
Because we don't need to spend time of serializing data and transfer
it between cores if its located in shared memory and can be easily
accessed from both ends.
But share-nothing model proposes to not use shared memory, which in
own turn means that there will be a much higher traffic between cores
comparing to model which uses shared memory.
So, there a balance should be found between network load and using
shared resources. We can't win if we choose one of opposite sides,
only something in the middle.
I am still wrong here?

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Question about YAXO-XML and possible bug

Michael Rueger-4
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
> 2007/10/30, Boris.Gaertner <[hidden email]>:

Finally managed to look into these long known issues.
Hopefully will have a new version soon :-)

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Ralph Johnson
In reply to this post by Andreas.Raab
That is a very interesting plan, Andreas.  However, I don't see
garbage collection on the list.  Won't you have to make a concurrent
garbage collecter?

-Ralph

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Jason Johnson-5
On 31/10/2007, Jason Johnson <[hidden email]> wrote:

> On 10/31/07, Rob Withers <[hidden email]> wrote:
> >
> > I am trying to ensure that objects in one Vat don't get directly manipulated
> > by objects in other Processes.
>
> I don't see this part as too difficult, since you can't modify what
> you can't get a reference to.  But the issue is mutable globals.  The
> most obvious example here are class side variables: if multiple vats
> can see the same class and that class has class side variables that it
> mutates, then that's an issue.
>

This is not the only case. Its just a special case when two or more
processing units having access to same mutable object.
There are other problems related with reflective nature of smalltalk.
A killer example is passing reference to context (thisContext) or
stack to other vat/thread/processing units.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Jason Johnson-5
On 31/10/2007, Jason Johnson <[hidden email]> wrote:

> On 10/30/07, Igor Stasenko <[hidden email]> wrote:
> >
> > Most of reasons why CPU not utilized at 100% is using a blocking I/O
> > calls. Then a simplest solution to not use them and instead of blowing
> > up the number of threads use asynchronous I/O . Most major platforms
> > support asynchronous I/O and there are many libraries which support
> > async data handling almost in each area we need for. We just need to
> > build on top of them.
>
> Good point.  How many kinds of I/O in Squeak is currently blocking?  I
> think I heard networking blocks, what about disk?
>
All socket/file IO primitives using blocking calls.
To what i see, there is only one set of async primitives -
AsyncFilePlugin. But i'm not sure if it used at first place (i.e.
replaces a FilePlugin).
I think Andreas could answer on this more precisely.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Igor Stasenko
In reply to this post by Andreas.Raab
On 31/10/2007, Andreas Raab <[hidden email]> wrote:

> Igor Stasenko wrote:
> > If you have any ideas how such VM would look like i'm glad to hear.
>
> Okay, so Josh convinced me to write up the ideas. The main problem as I
> see it with a *practical* solution to the problem is that all of the
> solutions so far require huge leaps and can't be implemented
> step-by-step (which almost certainly dooms them to failure).
>
> So what do we know and what do we actually all pretty much agree on?
> It's that we need to be able to utilize multiple cores and that we need
> a practical way to get there (if you disagree with the latter this
> message is not meant for you ;-) Running multiple processes is one
> option but it is not always sufficient. For example, some OSes would
> have trouble firing off a couple of thousand processes whereas the same
> OS may have no problem at all with a couple of thousand threads in one
> process. To give an example, starting a thread on Windows cost somewhere
> in the range of a millisecond which is admittedly slow, but still orders
> of magnitude faster than creating a new process. Then there are issues
> with resource sharing (like file handles) which are practically
> guaranteed not to work across process boundaries etc. So while there are
> perfectly good reasons to run multiple processes, there are reasons just
> as good to wanting to run multiple threads in one process.
>
> The question then is, can we find an easy way to extend the Squeak VM to
> run multiple threads and if so how? Given the simplistic nature of the
> Squeak interpreter, there is actually very little global state that is
> not encapsulated in objects on the Squeak heap - basically all the
> variables in class interpreter. So if we would put them into state that
> is local to each thread, we could trivially run multiple instances of
> the byte code interpreter in the same VM. This gets us to the two major
> questions:
>
> * How do we encapsulate the interpreter state?
> * How do we deal with primitives and plugins?
>
> Let's start with the first one. Obviously, the answer is "make it an
> object". The way how I would go about is by modifying the CCodeGenerator
> such that it generates all functions with an argument of type "struct
> VM" and that variable accesses prefix things properly and that all
> functions calls pass the extra argument along. In short, what used to be
> translated as:
>
> sqInt primitiveAdd(void) {
>    integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>    /* etc. */
> }
>
> will then become something like here:
>
> sqInt primitiveAdd(struct VM *vm) {
>    integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>    /* etc. */
> }
>
> This is a *purely* mechanical step that can be done independent of
> anything else. It should be possible to generate code that is entirely
> equivalent to todays code and with a bit of tweaking it should be
> possible to make that code roughly as fast as we have today (not that I
> think it matters but understanding the speed difference between this and
> the default interpreter is important for judging relative speed
> improvements later).
>

There are already some steps done in this direction. A sources for
RISC architecture generate a foo struct , which holds all interpreter
globals.
Also, i did some changes in Exupery to create a single struct of all
VM globals (not only variables, but functions too).
This was done to make it easier to get address of any global symbol
what Exupery needs.
I'm also experimented to replace all direct calls to function to
indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused
about ~1% of speed degradation in tinyBenchmarks :)
Also, moving forward on this renders an InterpreterProxy struct
useless, because we can just pass an address to our 'foo' struct to
plugins which already contains everything what plugin can reach.

> The above takes care about the interpreter but there are still
> primitives and plugins that need to be dealt with. What I would do here
> is define operations like ioLock(struct VM) and ioUnlock(struct VM) that
> are the effective equivalent of Python's GIL (global interpreter lock)
> and allow exclusive access to primitives that have not been converted to
> multi-threading yet. How exactly this conversion should happen is
> deliberately left open here; maybe changing the VMs major proxy version
> is the right thing to do to indicate the changed semantics. In any case,
> the GIL allows us to readily reuse all existing plugins without having
> to worry about conversion early on.
>
Or as i proposed in earlier posts, the other way could be to schedule
all primitive calls, which currently don't support multi-threading to
single 'main' thread.
Then we don't need the GIL.

> So now we've taken care of the two major parts of Squeak: We have the
> ability to run new interpreters and we have the ability to use
> primitives. This is when the fun begins, because at this point we have
> options:
>
> For example, if you are into shared-state concurrency, you might
> implement a primitive that forks a new instance of the interpreter
> running in the same object memory that your previous interpreter is
> running in.
>
> Or, and that would be the path that I would take, implement a primitive
> that loads an image into a new object memory (I can explain in more
> detail how memory allocation needs to work for that; it is a fairly
> straightforward scheme but a little too long for this message) and run
> that interpreter.
>
> And at this point, the *real* fun begins because we can now start to
> define the communication patterns we'd like to use (initially sockets,
> later shared memory or event queues or whatever else). We can have tiny
> worker images that only do minimal stuff but we can also do a Spoon-like
> thing where we have a "master image" that contains all the code possibly
> needed and fire off micro-images that (via imprinting) swap in just the
> code they need to run.
>
> [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
>
> Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-)
> Anyway, the main thing I'm trying to say in the above is that for a
> *practical* solution to the problem there are some steps that are pretty
> much required whichever way you look at it. And I think that regardless
> of your interest in shared state or message passing concurrency we may
> be able to define a road that leads to interesting experiments without
> sacrificing the practical artifact. A VM built like described in the
> above would be strictly a superset of the current VM so it would be able
> to run any current images and leave room for further experiments.
>
> Cheers,
>    - Andreas
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

RE: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Sebastian Sastre-2
In reply to this post by Andreas.Raab
"> that leads to interesting experiments without sacrificing the
> practical artifact. A VM built like described in the above
> would be strictly a superset of the current VM so it would be
> able to run any current images and leave room for further experiments."

Hi Andreas,

        earquakes apart your proposal seems to me remarkably valuable for
short and medium term. Is not a deadly earthquake for the VM. Curiously
seems like just a little shake with insignificant damage to the current
environment to accommodate things to keep it working ;-)

        best regards,

Sebastian Sastre

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de Andreas Raab
> Enviado el: MiƩrcoles, 31 de Octubre de 2007 00:53
> Para: The general-purpose Squeak developers list
> Asunto: Thoughts on a concurrent Squeak VM (was: Re:
> Concurrent Futures)
>
> Igor Stasenko wrote:
> > If you have any ideas how such VM would look like i'm glad to hear.
>
> Okay, so Josh convinced me to write up the ideas. The main
> problem as I see it with a *practical* solution to the
> problem is that all of the solutions so far require huge
> leaps and can't be implemented step-by-step (which almost
> certainly dooms them to failure).
>
> So what do we know and what do we actually all pretty much agree on?
> It's that we need to be able to utilize multiple cores and
> that we need a practical way to get there (if you disagree
> with the latter this message is not meant for you ;-) Running
> multiple processes is one option but it is not always
> sufficient. For example, some OSes would have trouble firing
> off a couple of thousand processes whereas the same OS may
> have no problem at all with a couple of thousand threads in
> one process. To give an example, starting a thread on Windows
> cost somewhere in the range of a millisecond which is
> admittedly slow, but still orders of magnitude faster than
> creating a new process. Then there are issues with resource
> sharing (like file handles) which are practically guaranteed
> not to work across process boundaries etc. So while there are
> perfectly good reasons to run multiple processes, there are
> reasons just as good to wanting to run multiple threads in
> one process.
>
> The question then is, can we find an easy way to extend the
> Squeak VM to run multiple threads and if so how? Given the
> simplistic nature of the Squeak interpreter, there is
> actually very little global state that is not encapsulated in
> objects on the Squeak heap - basically all the variables in
> class interpreter. So if we would put them into state that is
> local to each thread, we could trivially run multiple
> instances of the byte code interpreter in the same VM. This
> gets us to the two major
> questions:
>
> * How do we encapsulate the interpreter state?
> * How do we deal with primitives and plugins?
>
> Let's start with the first one. Obviously, the answer is
> "make it an object". The way how I would go about is by
> modifying the CCodeGenerator such that it generates all
> functions with an argument of type "struct VM" and that
> variable accesses prefix things properly and that all
> functions calls pass the extra argument along. In short, what
> used to be translated as:
>
> sqInt primitiveAdd(void) {
>    integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>    /* etc. */
> }
>
> will then become something like here:
>
> sqInt primitiveAdd(struct VM *vm) {
>    integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>    /* etc. */
> }
>
> This is a *purely* mechanical step that can be done
> independent of anything else. It should be possible to
> generate code that is entirely equivalent to todays code and
> with a bit of tweaking it should be possible to make that
> code roughly as fast as we have today (not that I think it
> matters but understanding the speed difference between this
> and the default interpreter is important for judging relative
> speed improvements later).
>
> The above takes care about the interpreter but there are
> still primitives and plugins that need to be dealt with. What
> I would do here is define operations like ioLock(struct VM)
> and ioUnlock(struct VM) that are the effective equivalent of
> Python's GIL (global interpreter lock) and allow exclusive
> access to primitives that have not been converted to
> multi-threading yet. How exactly this conversion should
> happen is deliberately left open here; maybe changing the VMs
> major proxy version is the right thing to do to indicate the
> changed semantics. In any case, the GIL allows us to readily
> reuse all existing plugins without having to worry about
> conversion early on.
>
> So now we've taken care of the two major parts of Squeak: We
> have the ability to run new interpreters and we have the
> ability to use primitives. This is when the fun begins,
> because at this point we have
> options:
>
> For example, if you are into shared-state concurrency, you
> might implement a primitive that forks a new instance of the
> interpreter running in the same object memory that your
> previous interpreter is running in.
>
> Or, and that would be the path that I would take, implement a
> primitive that loads an image into a new object memory (I can
> explain in more detail how memory allocation needs to work
> for that; it is a fairly straightforward scheme but a little
> too long for this message) and run that interpreter.
>
> And at this point, the *real* fun begins because we can now
> start to define the communication patterns we'd like to use
> (initially sockets, later shared memory or event queues or
> whatever else). We can have tiny worker images that only do
> minimal stuff but we can also do a Spoon-like thing where we
> have a "master image" that contains all the code possibly
> needed and fire off micro-images that (via imprinting) swap
> in just the code they need to run.
>
> [Whoa! I just got interrupted by a little 5.6 quake some 50
> miles away]
>
> Sorry but I lost my train of thought here. Happens at 5.6
> Richter ;-) Anyway, the main thing I'm trying to say in the
> above is that for a
> *practical* solution to the problem there are some steps that
> are pretty much required whichever way you look at it. And I
> think that regardless of your interest in shared state or
> message passing concurrency we may be able to define a road
> that leads to interesting experiments without sacrificing the
> practical artifact. A VM built like described in the above
> would be strictly a superset of the current VM so it would be
> able to run any current images and leave room for further experiments.
>
> Cheers,
>    - Andreas
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers
In reply to this post by Igor Stasenko

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Wednesday, October 31, 2007 6:30 AM
Subject: Re: Concurrent Futures


> On 31/10/2007, Jason Johnson <[hidden email]> wrote:
>> On 10/31/07, Rob Withers <[hidden email]> wrote:
>> >
>> > I am trying to ensure that objects in one Vat don't get directly
>> > manipulated
>> > by objects in other Processes.
>>
>> I don't see this part as too difficult, since you can't modify what
>> you can't get a reference to.  But the issue is mutable globals.  The
>> most obvious example here are class side variables: if multiple vats
>> can see the same class and that class has class side variables that it
>> mutates, then that's an issue.
>>
>
> This is not the only case. Its just a special case when two or more
> processing units having access to same mutable object.
> There are other problems related with reflective nature of smalltalk.
> A killer example is passing reference to context (thisContext) or
> stack to other vat/thread/processing units.

In both of these cases, the objects are owned by a particular Vat.  Any
messages to them from another Vat would be eventual.  If they were passed to
a different Vat (1->2) they would be VatRefs from 2 into 1.  Anyway, this is
the plan, it isn't working at this time.

A concern of mine is how well the development tools would work in a
different Vat.  So let's say we have the class browser open and we try to
define a new class.  The superclass is eventual, since it is owned by a
different Vat.  Where does the new class get created, in the current Vat or
in the superclass' Vat?  There are definitely issues here.


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Rob Withers
In reply to this post by Igor Stasenko

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>


> On 31/10/2007, Andreas Raab <[hidden email]> wrote:
>> sqInt primitiveAdd(void) {
>>    integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>>    /* etc. */
>> }
>>
>> will then become something like here:
>>
>> sqInt primitiveAdd(struct VM *vm) {
>>    integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>>    /* etc. */
>> }
>>
>> This is a *purely* mechanical step that can be done independent of
>> anything else. It should be possible to generate code that is entirely
>> equivalent to todays code and with a bit of tweaking it should be
>> possible to make that code roughly as fast as we have today (not that I
>> think it matters but understanding the speed difference between this and
>> the default interpreter is important for judging relative speed
>> improvements later).
>>
>
> There are already some steps done in this direction. A sources for
> RISC architecture generate a foo struct , which holds all interpreter
> globals.
> Also, i did some changes in Exupery to create a single struct of all
> VM globals (not only variables, but functions too).
> This was done to make it easier to get address of any global symbol
> what Exupery needs.
> I'm also experimented to replace all direct calls to function to
> indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused
> about ~1% of speed degradation in tinyBenchmarks :)

Wouldn't this mean that consideration of using C++ isn't off the mark?
There would be some degradation of performance.  However, there would be the
benefit of structuring the VM classes, of not having to add VM as an
argument everywhere, and it may even be possible to subclass Thread so we
know where the thread-local storage is.

Just a thought.


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Andreas.Raab
In reply to this post by Ralph Johnson
Ralph Johnson wrote:
> That is a very interesting plan, Andreas.  However, I don't see
> garbage collection on the list.  Won't you have to make a concurrent
> garbage collecter?

I don't think so. First, you don't need a new garbage collector for the
direction that I would take. Since the threads don't operate on the same
object memory, no change to the collector is needed. And for shared
state concurrency, simple solutions (like a gc request flag per thread
which is checked on each send) can be used to ensure atomic operation of
the collector.

Cheers,
   - Andreas

12345678