Course-grained multiprocessing with RemoteTask

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Course-grained multiprocessing with RemoteTask

David T. Lewis
RemoteTask provides partitioning of processing tasks at the level of
cooperating OS processes. For problems that can be partitioned into
independent tasks running under the supervision of a Squeak image, such
that each task performs a significant amount of processing and returns
a moderately sized result, RemoteTask can provide substantial improvements
in processing run time versus the equivalent serial processing when
running on multi-core hardware.

A task is scheduled in a working image with

  RemoteTask do: taskBlock whenComplete: aOneArgumentBlock

where taskBlock is the task to be scheduled in a forked Squeak image,
and aOneArgumentBlock handles the result object when the remote task
makes data available. Result data is returned through a ReferenceStream
on the stdout pipe from the remote Squeak image. The forked Squeak image
is headless and is quite memory efficient due to Unix copy-on-write
for forked processes.

RemoteTask may also be useful for evaluating a method that otherwise would
block the VM, such as an FFI call to a long-running external function.

RemoteTask is part of the latest CommandShell package on SqueakSource
(CommandShell 4.5.0), and requires OSProcess as well as a Unix or Mac VM
with OSProcessPlugin (it is helpful if the VM also has AioPlugin for process
completion notification, although polling will be used if this is not present).
I have tested only on Linux with standard VM and Cog, although I am hopeful
that this will work on Mac also (confirmation would be appreciated).

Following is an example of three processing tasks assigned to three Squeak
worker images with results returned to the supervisory image on task
completion.

threeParallelTasks
   "Find all primes in a range of large integers. Divide the problem into
   three tasks running the three child images, and return the results to
   the supervisory image. Answer a tasks array and a results array, where
   the results array will be populated on completion of the tasks."

   "RemoteTask threeParallelTasks"

   | p1 p2 p3 results task1 task2 task3 |
   results := Array new: 3.
   task1 := [(100000000000000000000000000000
               to: 100000000000000000000000019999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   task2 := [(100000000000000000000000020000
               to: 100000000000000000000000039999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   task3 := [(100000000000000000000000040000
               to: 100000000000000000000000059999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   "n.b. Assign task to a variable to prevent RemoteTask from being finalized"
   p1 := RemoteTask do: task1 whenComplete: [:result | results at: 1 put: result].
   p2 := RemoteTask do: task2 whenComplete: [:result | results at: 2 put: result].
   p3 := RemoteTask do: task3 whenComplete: [:result | results at: 3 put: result].
   ^ { #tasks -> { p1 . p2 . p3 } . #results -> results }


Dave


Reply | Threaded
Open this post in threaded view
|

Re: Course-grained multiprocessing with RemoteTask

Chris Muller-3
Really cool Dave!  May I ask a couple of questions?

Is the initial object state of the forked image a clone of the forking
image, so I have access to all of the variables and objects?  For
example:

    RemoteTask
        do: [ someObjectInThisImage calculateComplexValue ]
        whenComplete: [ :calculatedValue | someObjectInThisImage
complexValue: calculatedValue ]

If some objects which were originally present in the forking image get
changed in the forked image, those changes are local to that image
memory only, is that correct?  I assume this is the case since you
mentioned "copy-on-write".

If the forked image returns one of the objects that was originally
present in the forking image (modified or not), I assume it is
returned as a "very deep copy" of the original, since you said it uses
ReferenceStream to instantiate the result answer.

Finally, you mentioned it "should" work on Mac, but is there any
reason it shouldn't work on Windows?

Thanks for this handy utility for easily distributing processing!

  Chris


On Sun, Nov 13, 2011 at 9:21 AM, David T. Lewis <[hidden email]> wrote:

> RemoteTask provides partitioning of processing tasks at the level of
> cooperating OS processes. For problems that can be partitioned into
> independent tasks running under the supervision of a Squeak image, such
> that each task performs a significant amount of processing and returns
> a moderately sized result, RemoteTask can provide substantial improvements
> in processing run time versus the equivalent serial processing when
> running on multi-core hardware.
>
> A task is scheduled in a working image with
>
>  RemoteTask do: taskBlock whenComplete: aOneArgumentBlock
>
> where taskBlock is the task to be scheduled in a forked Squeak image,
> and aOneArgumentBlock handles the result object when the remote task
> makes data available. Result data is returned through a ReferenceStream
> on the stdout pipe from the remote Squeak image. The forked Squeak image
> is headless and is quite memory efficient due to Unix copy-on-write
> for forked processes.
>
> RemoteTask may also be useful for evaluating a method that otherwise would
> block the VM, such as an FFI call to a long-running external function.
>
> RemoteTask is part of the latest CommandShell package on SqueakSource
> (CommandShell 4.5.0), and requires OSProcess as well as a Unix or Mac VM
> with OSProcessPlugin (it is helpful if the VM also has AioPlugin for process
> completion notification, although polling will be used if this is not present).
> I have tested only on Linux with standard VM and Cog, although I am hopeful
> that this will work on Mac also (confirmation would be appreciated).
>
> Following is an example of three processing tasks assigned to three Squeak
> worker images with results returned to the supervisory image on task
> completion.
>
> threeParallelTasks
>   "Find all primes in a range of large integers. Divide the problem into
>   three tasks running the three child images, and return the results to
>   the supervisory image. Answer a tasks array and a results array, where
>   the results array will be populated on completion of the tasks."
>
>   "RemoteTask threeParallelTasks"
>
>   | p1 p2 p3 results task1 task2 task3 |
>   results := Array new: 3.
>   task1 := [(100000000000000000000000000000
>               to: 100000000000000000000000019999)
>            select: [:f | f isPrime] thenCollect: [:s | s asString]].
>   task2 := [(100000000000000000000000020000
>               to: 100000000000000000000000039999)
>            select: [:f | f isPrime] thenCollect: [:s | s asString]].
>   task3 := [(100000000000000000000000040000
>               to: 100000000000000000000000059999)
>            select: [:f | f isPrime] thenCollect: [:s | s asString]].
>   "n.b. Assign task to a variable to prevent RemoteTask from being finalized"
>   p1 := RemoteTask do: task1 whenComplete: [:result | results at: 1 put: result].
>   p2 := RemoteTask do: task2 whenComplete: [:result | results at: 2 put: result].
>   p3 := RemoteTask do: task3 whenComplete: [:result | results at: 3 put: result].
>   ^ { #tasks -> { p1 . p2 . p3 } . #results -> results }
>
>
> Dave
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Course-grained multiprocessing with RemoteTask

David T. Lewis
On Sun, Nov 13, 2011 at 11:07:27AM -0600, Chris Muller wrote:

> Really cool Dave!  May I ask a couple of questions?
>
> Is the initial object state of the forked image a clone of the forking
> image, so I have access to all of the variables and objects?  For
> example:
>
>     RemoteTask
>         do: [ someObjectInThisImage calculateComplexValue ]
>         whenComplete: [ :calculatedValue | someObjectInThisImage
> complexValue: calculatedValue ]

Yes, that's right. The child image is an exact copy of the parent
except for the return value of the #forkSqueak call, which will
be 0 in the child image an a positive integer (the PID of the child)
in the parent "supervisor" image.
 
The #forkSqueak operation is similar to #snapshot:andQuit: in the
sense that the image "wakes up" after the snapshot resume point
(or after the fork) and decides what to do. In the case of
#snapshot:andQuit: the image that wakes up is identical to the
image that was saved. In the case of #forkSqueak, the parent
and child images are identical except for the result value of
the fork call.

>
> If some objects which were originally present in the forking image get
> changed in the forked image, those changes are local to that image
> memory only, is that correct?  I assume this is the case since you
> mentioned "copy-on-write".

Yes that's right. Both images start with separate and identical object
memories (except for the #forkSqueak result value). Actual operating
system memory is consumed only when one or the other image changes
its object memory. In actual practice, the memory usage seems to be
quite modest, but this will of course vary depending on what the
images are doing. Calling a blocking FFI method in a child image
would be an example of something that would be very memory efficient.
Doing a full garbage collect in one of the images, probably not so
good.

> If the forked image returns one of the objects that was originally
> present in the forking image (modified or not), I assume it is
> returned as a "very deep copy" of the original, since you said it uses
> ReferenceStream to instantiate the result answer.

Yes, it can be any object that can be passed through a reference
stream, so it might for example be a matrix of Float values if you
were doing some sort of numeric work.

>
> Finally, you mentioned it "should" work on Mac, but is there any
> reason it shouldn't work on Windows?

The OSProcess support for Windows is not sufficient to support this.
But OS X is a Unix, and the Mac VMs use the Unix OSProcessPlugin, so
I expect (hope?) that it will work there. If somebody tries it, please
let me know.

Dave

>
> Thanks for this handy utility for easily distributing processing!
>
>   Chris
>
>
> On Sun, Nov 13, 2011 at 9:21 AM, David T. Lewis <[hidden email]> wrote:
> > RemoteTask provides partitioning of processing tasks at the level of
> > cooperating OS processes. For problems that can be partitioned into
> > independent tasks running under the supervision of a Squeak image, such
> > that each task performs a significant amount of processing and returns
> > a moderately sized result, RemoteTask can provide substantial improvements
> > in processing run time versus the equivalent serial processing when
> > running on multi-core hardware.
> >
> > A task is scheduled in a working image with
> >
> > ??RemoteTask do: taskBlock whenComplete: aOneArgumentBlock
> >
> > where taskBlock is the task to be scheduled in a forked Squeak image,
> > and aOneArgumentBlock handles the result object when the remote task
> > makes data available. Result data is returned through a ReferenceStream
> > on the stdout pipe from the remote Squeak image. The forked Squeak image
> > is headless and is quite memory efficient due to Unix copy-on-write
> > for forked processes.
> >
> > RemoteTask may also be useful for evaluating a method that otherwise would
> > block the VM, such as an FFI call to a long-running external function.
> >
> > RemoteTask is part of the latest CommandShell package on SqueakSource
> > (CommandShell 4.5.0), and requires OSProcess as well as a Unix or Mac VM
> > with OSProcessPlugin (it is helpful if the VM also has AioPlugin for process
> > completion notification, although polling will be used if this is not present).
> > I have tested only on Linux with standard VM and Cog, although I am hopeful
> > that this will work on Mac also (confirmation would be appreciated).
> >
> > Following is an example of three processing tasks assigned to three Squeak
> > worker images with results returned to the supervisory image on task
> > completion.
> >
> > threeParallelTasks
> > ?? "Find all primes in a range of large integers. Divide the problem into
> > ?? three tasks running the three child images, and return the results to
> > ?? the supervisory image. Answer a tasks array and a results array, where
> > ?? the results array will be populated on completion of the tasks."
> >
> > ?? "RemoteTask threeParallelTasks"
> >
> > ?? | p1 p2 p3 results task1 task2 task3 |
> > ?? results := Array new: 3.
> > ?? task1 := [(100000000000000000000000000000
> > ?? ?? ?? ?? ?? ?? ?? to: 100000000000000000000000019999)
> > ?? ?? ?? ?? ?? ??select: [:f | f isPrime] thenCollect: [:s | s asString]].
> > ?? task2 := [(100000000000000000000000020000
> > ?? ?? ?? ?? ?? ?? ?? to: 100000000000000000000000039999)
> > ?? ?? ?? ?? ?? ??select: [:f | f isPrime] thenCollect: [:s | s asString]].
> > ?? task3 := [(100000000000000000000000040000
> > ?? ?? ?? ?? ?? ?? ?? to: 100000000000000000000000059999)
> > ?? ?? ?? ?? ?? ??select: [:f | f isPrime] thenCollect: [:s | s asString]].
> > ?? "n.b. Assign task to a variable to prevent RemoteTask from being finalized"
> > ?? p1 := RemoteTask do: task1 whenComplete: [:result | results at: 1 put: result].
> > ?? p2 := RemoteTask do: task2 whenComplete: [:result | results at: 2 put: result].
> > ?? p3 := RemoteTask do: task3 whenComplete: [:result | results at: 3 put: result].
> > ?? ^ { #tasks -> { p1 . p2 . p3 } . #results -> results }
> >
> >
> > Dave
> >
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Course-grained multiprocessing with RemoteTask

David T. Lewis
In reply to this post by David T. Lewis
On Sun, Nov 13, 2011 at 10:21:30AM -0500, David T. Lewis wrote:
> RemoteTask provides partitioning of processing tasks at the level of
> cooperating OS processes. For problems that can be partitioned into
> independent tasks running under the supervision of a Squeak image, such
> that each task performs a significant amount of processing and returns
> a moderately sized result, RemoteTask can provide substantial improvements
> in processing run time versus the equivalent serial processing when
> running on multi-core hardware.

Some follow up notes on RemoteTask:

- I collected some documentation and put it on the swiki at
    <http://wiki.squeak.org/squeak/6176>

- Some objects cannot be serialized on a reference stream and therefore
  cannot be used in the result of a remote task. I added a Mantis report
  to document the issue
    <http://bugs.squeak.org/view.php?id=7679>

- The RemoteTask examples fail on Pharo images, but this is due to
  and unrelated bug in Pharo
    <http://code.google.com/p/pharo/issues/detail?id=4997>

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Course-grained multiprocessing with RemoteTask

ccrraaiigg
In reply to this post by David T. Lewis

     Fun! I'd like to make a module of this stuff for Spoon.


-C

--
Craig Latta
www.netjam.org/resume
+31   6 2757 7177
+ 1 415  287 3547



Reply | Threaded
Open this post in threaded view
|

Re: Course-grained multiprocessing with RemoteTask

LawsonEnglish
At some point, it might be a good thing to have a set of "pre-created"
object spaces defined for Spoon, so that you can easily and swiftly
create a minimal image dedicated to a specific task. A "clone me"
command could then leverage Unix-style forking to make multiple copies
very fast.

L.


On 2/18/12 11:43 AM, Craig Latta wrote:

>       Fun! I'd like to make a module of this stuff for Spoon.
>
>
> -C
>
> --
> Craig Latta
> www.netjam.org/resume
> +31   6 2757 7177
> + 1 415  287 3547
>
>
>
>