Multiprocessing with Squeak

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiprocessing with Squeak

Levente Uzonyi-2
 
Hi,

I had an idea a few days ago and even though I don't have the time or
knowledge to try it myself, I just can't get it out of my head. The idea
is to let an interpreter use two images at once. One of them is read only
"fully working" image let's call it S (source), the other is empty
(contains no objects), writeable, possibly generated on the fly, let's
call it W (working). The vm knows if an object is in S or W by checking
the object pointer. Whenever an object in S is about to be modified, a
copy is created in W and all references to it are changed to the new one
(which means that more than one object might have to be copied). This
means a slower startup, but once all necessary objects are copied
performance would be normal.
(This approach is similar to the way sources are handled today: the
sources file is read only, new source code goes to the changes file.)

What are the benefits?
- the source image can be shared among interpreters (even vms)
- the garbage collector has much less work, since it only has to check the
   objects in W

How does it help with multiprocessing?
- combine it with HydraVM, it might give Erlang-like capabilities (cheap
   and fast processes)
- reduces memory usage if multiple interpreters (vms) use the same source
   image

Possible caveats?
- too many objects might have to be copied after startup (this is
   solveable, see below)
- too many objects might have to be copied overall (this is unlikely, but
   who knows)
- ? (you name it)

Possible enhancements
- let the interpreter use a not-empty, possibly user-specified, W image
   for quick startup

Opinions? Ideas?


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Colin Putney


On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

> Hi,
>
> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>  and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Colin
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Igor Stasenko

On 29 January 2010 08:42, Colin Putney <[hidden email]> wrote:

>
>
> On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:
>
>> Hi,
>>
>> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
>> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)
>
> I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.
>
>> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>>  and fast processes)
>
> Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.
>
> My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.
>
Well, at some point we should start using some kind of native-based
concurrency, not just green threading.
Processes still run on top of a single object space, i.e. all objects
are equally reachable from any process since they are using
non-concurrent memory model.
Oh, nevermind, we had long talks about it in the past , lets not start
over again :)

> Colin



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Josh Gargus


On Jan 29, 2010, at 12:51 AM, Igor Stasenko wrote:

>
> On 29 January 2010 08:42, Colin Putney <[hidden email]> wrote:
>>
>>
>> On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:
>>
>>> Hi,
>>>
>>> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
>>> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)
>>
>> I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.
>>
>>> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>>>  and fast processes)
>>
>> Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.
>>
>> My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.
>>
> Well, at some point we should start using some kind of native-based
> concurrency, not just green threading.
> Processes still run on top of a single object space, i.e. all objects
> are equally reachable from any process since they are using
> non-concurrent memory model.
> Oh, nevermind, we had long talks about it in the past , lets not start
> over again :)

:-)

Modern multi-core/hyper-threaded CPUs present a lot of low-hanging fruit for us to harvest.  I was tickled to learn that my new desktop machine can compile a Squeak VM from scratch in 15 seconds it I let it use 10 threads ("make -j 10").

Hydra seems like the easiest way to do so.  Luckily, it's both orthogonal and complementary to efforts to facilitate event-loop concurrency.  For example, the receiver of a message can be an "eventual reference" (Mark Miller's terminology... I prefer "far-ref") to an object in a different Hydra image: "foo := aRef future bar: baz".  This would result in a Promise being assigned to "foo"; it would resolve once the message executed in the remote image, and communicated the result back via a Hydra channel.

Cheers,
Josh



>
>> Colin
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Igor Stasenko

On 29 January 2010 11:22, Josh Gargus <[hidden email]> wrote:

>
>
> On Jan 29, 2010, at 12:51 AM, Igor Stasenko wrote:
>
>>
>> On 29 January 2010 08:42, Colin Putney <[hidden email]> wrote:
>>>
>>>
>>> On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:
>>>
>>>> Hi,
>>>>
>>>> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
>>>> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)
>>>
>>> I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.
>>>
>>>> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>>>>  and fast processes)
>>>
>>> Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.
>>>
>>> My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.
>>>
>> Well, at some point we should start using some kind of native-based
>> concurrency, not just green threading.
>> Processes still run on top of a single object space, i.e. all objects
>> are equally reachable from any process since they are using
>> non-concurrent memory model.
>> Oh, nevermind, we had long talks about it in the past , lets not start
>> over again :)
>
> :-)
>
> Modern multi-core/hyper-threaded CPUs present a lot of low-hanging fruit for us to harvest.  I was tickled to learn that my new desktop machine can compile a Squeak VM from scratch in 15 seconds it I let it use 10 threads ("make -j 10").
>
> Hydra seems like the easiest way to do so.  Luckily, it's both orthogonal and complementary to efforts to facilitate event-loop concurrency.  For example, the receiver of a message can be an "eventual reference" (Mark Miller's terminology... I prefer "far-ref") to an object in a different Hydra image: "foo := aRef future bar: baz".  This would result in a Promise being assigned to "foo"; it would resolve once the message executed in the remote image, and communicated the result back via a Hydra channel.
>

If you remember, recently i added a primitive in Hydra which could
spawn a 'child' object memory based on hand-crafted
set of objects from main one. Not much rocket science there, it just
cloning a closed object graph, which you are specifying.
But, by proceeding with such approach, one could generate an 'islands'
on the fly, which could serve for a small sub-task which can run in
parallel or on demand. This is much more space efficient than spawning
a full image object memory, when all you need is to do only specific
set of tasks.

> Cheers,
> Josh
>
>
>
>>
>>> Colin
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Levente Uzonyi-2
In reply to this post by Colin Putney
 
On Thu, 28 Jan 2010, Colin Putney wrote:

>
>
> On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:
>
>> Hi,
>>
>> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
>> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)
>
> I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.
>

After a bit of googling I found this:
http://cincomsmalltalk.com/userblogs/runarj/blogView?showComments=true&printTitle=Parallel_Execution_using_Multiple_VisualWorks_Images&entry=3348279474
and it looks similar, though it doesn't describe what Shared Perm Space is.

>> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>>  and fast processes)
>
> Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

Our cheap processes can't do multiprocessing and futures won't help with
that.

>
> My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.
>

I always see people saying: the image is large, we need a kernel image,
squeak is bloated, I can't run this large image on a server, etc.

If a vm could do this, running 1000 images on a single server wouldn't
hurt much (assuming ~15MB source image and <1MB worker images).


Levente

> Colin
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

David T. Lewis
 
On Fri, Jan 29, 2010 at 02:17:46PM +0100, Levente Uzonyi wrote:

>
> On Thu, 28 Jan 2010, Colin Putney wrote:
> >
> >I believe VW can do something like this - they call it "Shared Perm
> >Space." There's a special section of memory that's immutable, not subject
> >to garbage collection, and shared between several VM processes.
>
> After a bit of googling I found this:
> http://cincomsmalltalk.com/userblogs/runarj/blogView?showComments=true&printTitle=Parallel_Execution_using_Multiple_VisualWorks_Images&entry=3348279474
> and it looks similar, though it doesn't describe what Shared Perm Space is.

I think this may be more or less that same thing I was trying to
demonstrate in the "poor man's multiprocessing" thread:
 http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-January/143841.html

The major difference seems to be that Cincom's implementation actually
works, whereas mine just crashed a lot of other people's Squeak images ;)

Dave
 
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Eliot Miranda-2
In reply to this post by Colin Putney
 


On Thu, Jan 28, 2010 at 10:42 PM, Colin Putney <[hidden email]> wrote:


On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

> Hi,
>
> I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.
> (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

It used to exist and then was broken when Barry Hayes and I added memory mapping of new heap segments back in the late 90's.  I was working on bringing it back when I left.

You're almost right (and I'm probably being pedantic; forgive me).  PermSpace (not shared) is a third generation that is not collected unless one does a global GC.  VW has a scavenger, a stop-the-world mark-sweep collector and an incremental mark-sweep collector.  The scavenger collects only new space.  The incremental collector, run in short bursts for a few milliseconds under image-level control, collects oldSpace.  The stop-the-world collector will collect oldSpace or oldSpace + permSpace.  So permSpace is only collected when one does a global stop-the-world collection (globalGarbageCollect) not an oldSpace collection (garbageCollect).  To populate permSpace one does a "perm save" which does an otherwise normal image save that sets a bit in the image header that causes the VM to load the entire image into permSpace.  One then does a globalGarbageCollect and saves, resulting in an image in which most objects are in permSpace (particularly all classes and methods) but where transient objects (font descriptions loaded at startup etc) are in oldSpace.  So the incremental collector, collecting oldSpace, doesn't waste time scan-marking classes and methods, and hence is much more effective.

Shared permSpace extends the scheme by memory mapping an image file's permSpace segment using copy-on-write.  So as objects in permSpace are written to pages of the permSpace part of the image file are copied into private memory.  No effort is made to do things like cluster class variables (which are the most likely targets of writes into permSpace) together on pages to reduce the amount of copying when writes do occur.  A tracer approach would do much better here.

You can infer that memory mapping new oldSpace segments broke shared permSpace because shared permSpace was hacked to map the file at a hard-coded address.  I was trying to bring back shared permSpace for 64-bit images (where it would have more impact because 64-bit objects are bigger) by doing things like aligning the object headers of oldSpace objects on a 16-byte boundary and permSpace objects 8 bytes from a 16-byte boundary so that the permSpace test was a tag test (there being 3 bits of immediate tags).

HTH
Eliot
 

> - combine it with HydraVM, it might give Erlang-like capabilities (cheap
>  and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Jecel Assumpcao Jr
In reply to this post by Levente Uzonyi-2
 
Levente Uzonyi wrote on Date: Wed, 27 Jan 2010 23:04:05 +0100 (CET)

> Opinions? Ideas?

If you haven't seen it already, you should really check out David
Ungar's work:

http://portal.acm.org/citation.cfm?id=1640149

My plan is to work on this from late March to the end of June.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Levente Uzonyi-2
In reply to this post by Levente Uzonyi-2
 
On Fri, 29 Jan 2010, Jecel Assumpcao Jr wrote:

>
> Levente Uzonyi wrote on Date: Wed, 27 Jan 2010 23:04:05 +0100 (CET)
>
>> Opinions? Ideas?
>
> If you haven't seen it already, you should really check out David
> Ungar's work:
>
> http://portal.acm.org/citation.cfm?id=1640149

Thanks for the pointer to the paper, I saw a video about it but I can't
find it atm.

My idea is much simpler than this and (ideally it) would work with current
images. What I'm interrested in is if it's doable or not? If it's
doable, how much work would it be to do it?

> My plan is to work on this from late March to the end of June.

You mean you are about to reimplement that vm?


Levente

>
> -- Jecel
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Multiprocessing with Squeak

Jecel Assumpcao Jr
 
Levente Uzonyi wrote on Fri, 29 Jan 2010 22:43:30 +0100 (CET)
> Thanks for the pointer to the paper, I saw a video about it but I can't
> find it atm.

David Ungar's 2008 Squeak BOF talk can be found at

http://goran.krampe.se/blog/OOPSLA/OOPSLA08-more.rdoc

I try to keep a reasonably up to date list of Smalltalk related movies
like these at

http://www.smalltalk.org.br/movies/

> My idea is much simpler than this and (ideally it) would work with current
> images. What I'm interrested in is if it's doable or not? If it's
> doable, how much work would it be to do it?

I had a similar suggestion for SqueakNOS. The way it is now, it is like
DOS in that any application can bring down the whole system. If you use
two images instead, one with access to the low level SqueakNOS stuff and
the second just running user level code then you would have something
closer to Unix in terms of security. Some primitives in the second image
(which could be an unmodified current one) would generate messages to
the first image with all the drivers and OS level code.

This isn't particularly original - in the "green book" there is a
description of the LOOM virtual memory system for Smalltalk that also
used two images to get the job done (chapter 14 - pages 251 to 271).

> > My plan is to work on this from late March to the end of June.
>
> You mean you are about to reimplement that vm?

I'll post more details later this week on squeak-dev, but the basic idea
is to replace the ObjectMemory part of the VM and patch the message
passing so that images can reference each other's objects and send
messages between themselves. The same scheme is also used as a virtual
memory. allowing binary blobs to be loaded and unloaded on demand. The
goal is to be able to run well on machines with up to thousands of
cores. And to be usable by children.

This is very similar to David Ungar's work, but is not really based on
it and is more an evolution of some of my previous projects (I built a
64 node Self/Smalltalk computer in 1992 with 68020 processors and
Transputers though the software was only partly implemented, and I did
an extremely parallel version of Self in 1997).

-- Jecel