Smalltalk › Pharo › Pharo Smalltalk Developers

Concurrent programming (was: Would you start a new Smalltalk project today?)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Michael van der Gulik-2

Concurrent programming (was: Would you start a new Smalltalk project today?)

On Mon, Mar 23, 2009 at 12:18 AM, Igor Stasenko <[hidden email]> wrote:

2009/3/22 Igor Stasenko <[hidden email]>:

> 2009/3/22 Michael van der Gulik <[hidden email]>:
>>
>>
>> On Sat, Mar 21, 2009 at 8:38 AM, Janko Mivšek <[hidden email]>
>> wrote:
>>>
>>> Philippe Marschall pravi:
>>> >> Michael van der Gulik wrote:
>>>
>>> >> So now it seems that Gemstone is the only multi-core capable Smalltalk
>>> >> VM :-(.
>>>
>>> > AFAIK Gemstone isn't multi-core capable as well. You can just run
>>> > multiple gems and they share the same persistent memory. Which is
>>> > similar but different.
>>>
>>> Well, Gemstone can for sure be considered as multi-core capable. Every
>>> gem runs on its own process and therefore can run on its own CPU core.
>>> All gems then share a Shared Memory Cache. So, a typical multi-core
>>> scenario.
>>>
>> By multi-core, I mean that the following code would spread CPU usage over at
>> least two cores of a CPU or computer for a while:
>>
>> | sum1 sum2 |
>>
>> sum1 := 0. sum2 := 0.
>>
>> [ 1 to: 10000000 do: [ :i | sum1 := sum1 + 1 ] ] fork.
>>
>> [ 1 to: 10000000 do: [ :i | sum2 := sum2 + 1 ] ] fork.
>>
>> (I didn't try the above so there might be obvious bugs)
>>
>> If a VM can't distribute the load for the above over two or more CPU cores,
>> I consider its multi-core capabilities a hack. No offense intended to the
>> Hydra VM.
>>
>
> Michael, that's would be too ideal to be true, especially for smalltalk.
>
> Consider the following:
>
> | array sum1 sum2 |
>
> sum1 := 0. sum2 := 0.
> array := Array new: 10.
>
> [ 1 to: 10000000 do: [ :i | array at: (10 random) put: (Array new: 10) ] ] fork.
> [ 1 to: 10000000 do: [ :i | array at: (10 random) put: (Array new: 10) ] ] fork.
> 1 to: 10000000 do: [ :i | array at: (10 random) put: (Array new: 10) ].
>
> This code reveals the following problems:
> - concurrent access to same object

The problem here is that the code above wasn't written with concurrency in mind. I'd be looking at what I'm trying to acheive, and then try to keep each thread's data structures as disentangled as possible.

If, for example, you were trying to allocate an array of arrays concurrently, then I would write (using my namespaces :-) ):

array := Concurrent.Collections.Array new: 1000000000. " We ignore the cost of making this array to begin with :-) "
array withIndexdo: [ :each :i |
    " each will always be nil. "
    array at: i put: (Array new: 10).
].

Concurrent.Collections.Array>>withIndexDo: aBlock
     | nThreads sema |
    nThreads := 1000. " Pull this from a system setting or something. "
    sema := Semaphore withSignals: 0-nThreads. " Needs implementing: signal it -1000 times. "
    1 to: nThreads do: [ :i |
        [ (nThreads*i) to: (nThreads*i - 1) do: [ :j
            aBlock value: (self at: j) value: j ]
] fork ].
    ^ self.

There are probably bugs, I haven't tried running this. No synchronisation should be needed in this example, making it fast. Currently, this wouldn't work on the Squeak VM because block contexts aren't er... "reentrant".

> - a heavy memory allocation during running 3 processes, which at some
> point should cause GC.
> While first is more or less on the hands of developer (write a proper
> code to avoid such things), but second is a problem that you need to
> solve to be able to collect garbage in real time , when there are
> multiple threads producing it.

Then... invent a better garbage collector. This might be a difficult problem to solve, but it isn't impossible.

http://gulik.pbwiki.com/Block-based+virtual+machine (ideas only)

But what strikes me, that there are a lot of code, which never cares
about it, for instance see
Symbol class>> intern:
and in some magical fashion it works w/o problems in green threading..
i'm not sure it will continue running when you enable multiple native
threads.

I plan to find and fix these at some stage. Firstly, I would change the behaviour of the scheduler to get rid of the predictable yielding behaviour to expose these bugs. Doing so is required for the implementation of "Dominions" in my SecureSqueak project anyway, and I'll try to feed my changes back into the community.

I also plan to make the Squeak equivalent of java.util.concurrent at some stage.

There is another problem, that squeak processes is cheap (few bytes in
object memory), while allocating new native thread consumes
considerable amount of memory & address space. So, if you map
Processes to native threads, you will lose the ability in having
millions of them, instead you will be limited to thousands.

Answered by Anthony which I agree with.

Gulik.

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Michael van der Gulik-2

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

Argh. Stupid GMail sent it prematurely.

On Mon, Mar 23, 2009 at 10:45 AM, Michael van der Gulik <[hidden email]> wrote:

If, for example, you were trying to allocate an array of arrays concurrently, then I would write (using my namespaces :-) ):

array := Concurrent.Collections.Array new: 1000000000. " We ignore the cost of making this array to begin with :-) "
array withIndexdo: [ :each :i |
    " each will always be nil. "
    array at: i put: (Array new: 10).
].

Concurrent.Collections.Array>>withIndexDo: aBlock
     | nThreads sema |
    nThreads := 1000. " Pull this from a system setting or something. "
    sema := Semaphore withSignals: 0-nThreads. " Needs implementing: signal it -1000 times (or -999 or -1001? Unsure) "
    1 to: nThreads do: [ :i |
        [ (nThreads*i) to: (nThreads*i - 1) do: [ :j
            aBlock value: (self at: j) value: j ]

sema signal.
] fork ].

sema wait.
^ self.

Again, it probably won't just run without some bug fixing. And it won't work on non-closure Squeak anyway.

Gulik.

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

2009/3/22 Michael van der Gulik <[hidden email]>:

> Argh. Stupid GMail sent it prematurely.
>
> On Mon, Mar 23, 2009 at 10:45 AM, Michael van der Gulik <[hidden email]>
> wrote:
>>
>> If, for example, you were trying to allocate an array of arrays
>> concurrently, then I would write (using my namespaces :-) ):
>
>
>>
>> array := Concurrent.Collections.Array new: 1000000000. " We ignore the
>> cost of making this array to begin with :-) "
>> array withIndexdo: [ :each :i |
>>     " each will always be nil. "
>>     array at: i put: (Array new: 10).
>> ].
>>
>> Concurrent.Collections.Array>>withIndexDo: aBlock
>>      | nThreads sema |
>>     nThreads := 1000. " Pull this from a system setting or something. "
>>     sema := Semaphore withSignals: 0-nThreads. " Needs implementing:
>> signal it -1000 times (or -999 or -1001? Unsure) "
>>     1 to: nThreads do: [ :i |
>>         [ (nThreads*i) to: (nThreads*i - 1) do: [ :j
>>             aBlock value: (self at: j) value: j ]
>>
>>          sema signal.
>>          ] fork ].
>>
>>     sema wait.
>>     ^ self.
>>
>
> Again, it probably won't just run without some bug fixing. And it won't work
> on non-closure Squeak anyway.
>

Now consider the overhead of creating a fork vs the actual useful code
which is running within a block.
I presume, this code will run 10x times slower on a single core
processor, comparing to one w/o forks.
So, you will need to have 10 cores to match the computation time with
single core processor.

I think its not wise to introduce parallelism on such low levels (in
Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
with microscope :)

That's why i'm saying its too good to be true.
Introducing parallelism at such low levels will be a waste. I leaning
toward island model. This is a middle point between no sharing, like
in Hydra and sharing everything, like in what you proposing.

> Gulik.
>
> --
> http://gulik.pbwiki.com/
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Michael van der Gulik-2

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> wrote:

Now consider the overhead of creating a fork vs the actual useful code
which is running within a block.
I presume, this code will run 10x times slower on a single core
processor, comparing to one w/o forks.
So, you will need to have 10 cores to match the computation time with
single core processor.

I think its not wise to introduce parallelism on such low levels (in
Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
with microscope :)

That's why i'm saying its too good to be true.
Introducing parallelism at such low levels will be a waste. I leaning
toward island model. This is a middle point between no sharing, like
in Hydra and sharing everything, like in what you proposing.

10 times slower? Sounds like a made-up number to me...

" Using 101 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
5711
5626
6074

" Using 11 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
3086
3406
3256

" Using 1 thread: "
d := Array new: 1000001.
Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i asString]].
2426
2610
2599

My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If the blocks did more work, then the overhead would be lower and some benefit would be gained from using multiple cores.

I don't have a good idea of where the overhead is going - maybe it's being lost in the block copying that is needed to work around Squeak's deficiencies? Or maybe it's waiting for the scheduler to do its stuff?

Implementation attached, MIT license if you're picky.

Gulik.

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

ConcurrentArray.st (1K) Download Attachment

Michael van der Gulik-2

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]> wrote:

On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> wrote:

Now consider the overhead of creating a fork vs the actual useful code
which is running within a block.
I presume, this code will run 10x times slower on a single core
processor, comparing to one w/o forks.
So, you will need to have 10 cores to match the computation time with
single core processor.

I think its not wise to introduce parallelism on such low levels (in
Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
with microscope :)

That's why i'm saying its too good to be true.
Introducing parallelism at such low levels will be a waste. I leaning
toward island model. This is a middle point between no sharing, like
in Hydra and sharing everything, like in what you proposing.

10 times slower? Sounds like a made-up number to me...

" Using 101 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
5711
5626
6074

" Using 11 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
3086
3406
3256

" Using 1 thread: "
d := Array new: 1000001.
Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i asString]].
2426
2610
2599

My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If the blocks did more work, then the overhead would be lower and some benefit would be gained from using multiple cores.

I don't have a good idea of where the overhead is going - maybe it's being lost in the block copying that is needed to work around Squeak's deficiencies? Or maybe it's waiting for the scheduler to do its stuff?

Implementation attached, MIT license if you're picky.

I just tried it on VisualWorks as well. I removed the block copying and renamed the method to "keysDo:" (I never thought of an array like that before... keys and values).

d := Array new: 1000001.
Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i printString]].
1180
982
1008

" Using 101 threads "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i printString. ]].
1072
1120
962

At this stage, I'm suspicous about the results :-).

Gulik.

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

Strange since andreas was mentioning that squeak thread swicthing
costs far less than in VW.

Stef

On Mar 23, 2009, at 4:49 AM, Michael van der Gulik wrote:

>
>
> On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]
> > wrote:
>
>
> On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]>
> wrote:
>
> Now consider the overhead of creating a fork vs the actual useful code
> which is running within a block.
> I presume, this code will run 10x times slower on a single core
> processor, comparing to one w/o forks.
> So, you will need to have 10 cores to match the computation time with
> single core processor.
>
> I think its not wise to introduce parallelism on such low levels (in
> Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
> with microscope :)
>
> That's why i'm saying its too good to be true.
> Introducing parallelism at such low levels will be a waste. I leaning
> toward island model. This is a middle point between no sharing, like
> in Hydra and sharing everything, like in what you proposing.
>
> 10 times slower? Sounds like a made-up number to me...
>
> " Using 101 threads: "
> c := ConcurrentArray new: 1000001.
> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i
> asString. ]].
> 5711
> 5626
> 6074
>
> " Using 11 threads: "
> c := ConcurrentArray new: 1000001.
> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i
> asString. ]].
> 3086
> 3406
> 3256
>
> " Using 1 thread: "
> d := Array new: 1000001.
> Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i
> asString]].
> 2426
> 2610
> 2599
>
> My implementation is 1/2 to 1/3 the speed of the single-threaded
> Array. If the blocks did more work, then the overhead would be lower
> and some benefit would be gained from using multiple cores.
>
> I don't have a good idea of where the overhead is going - maybe it's
> being lost in the block copying that is needed to work around
> Squeak's deficiencies? Or maybe it's waiting for the scheduler to do
> its stuff?
>
> Implementation attached, MIT license if you're picky.
>
>
> I just tried it on VisualWorks as well. I removed the block copying
> and renamed the method to "keysDo:" (I never thought of an array
> like that before... keys and values).
>
> d := Array new: 1000001.
> Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i
> printString]].
> 1180
> 982
> 1008
>
> " Using 101 threads "
> c := ConcurrentArray new: 1000001.
> Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i
> printString. ]].
> 1072
> 1120
> 962
>
> At this stage, I'm suspicous about the results :-).
>
> Gulik.
>
> --
> http://gulik.pbwiki.com/
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

In reply to this post by Michael van der Gulik-2

2009/3/23 Michael van der Gulik <[hidden email]>:

>
>
> On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]>
> wrote:
>>
>>
>> On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]>
>> wrote:
>>>
>>> Now consider the overhead of creating a fork vs the actual useful code
>>> which is running within a block.
>>> I presume, this code will run 10x times slower on a single core
>>> processor, comparing to one w/o forks.
>>> So, you will need to have 10 cores to match the computation time with
>>> single core processor.
>>>
>>> I think its not wise to introduce parallelism on such low levels (in
>>> Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
>>> with microscope :)
>>>
>>> That's why i'm saying its too good to be true.
>>> Introducing parallelism at such low levels will be a waste. I leaning
>>> toward island model. This is a middle point between no sharing, like
>>> in Hydra and sharing everything, like in what you proposing.
>>
>> 10 times slower? Sounds like a made-up number to me...
>>
>> " Using 101 threads: "
>> c := ConcurrentArray new: 1000001.
>> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i
>> asString. ]].
>> 5711
>> 5626
>> 6074
>>
>> " Using 11 threads: "
>> c := ConcurrentArray new: 1000001.
>> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i
>> asString. ]].
>> 3086
>> 3406
>> 3256
>>
>> " Using 1 thread: "
>> d := Array new: 1000001.
>> Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i
>> asString]].
>> 2426
>> 2610
>> 2599
>>
>> My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If
>> the blocks did more work, then the overhead would be lower and some benefit
>> would be gained from using multiple cores.
>>
>> I don't have a good idea of where the overhead is going - maybe it's being
>> lost in the block copying that is needed to work around Squeak's
>> deficiencies? Or maybe it's waiting for the scheduler to do its stuff?
>>
>> Implementation attached, MIT license if you're picky.
>
> I just tried it on VisualWorks as well. I removed the block copying and
> renamed the method to "keysDo:" (I never thought of an array like that
> before... keys and values).
>

Yeah, Array is just a less generic than Dictionary, in respect that it
associating an index as a key to access the values.

> d := Array new: 1000001.
> Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i
> printString]].
> 1180
> 982
> 1008
>
> " Using 101 threads "
> c := ConcurrentArray new: 1000001.
> Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i printString.
> ]].
> 1072
> 1120
> 962
>
> At this stage, I'm suspicous about the results :-).
>

me too :)

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Michael van der Gulik-2

Re: Concurrent programming (was: Would you start a new Smalltalk project today?)

In reply to this post by Michael van der Gulik-2

On Mon, Mar 23, 2009 at 4:49 PM, Michael van der Gulik <[hidden email]> wrote:

On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]> wrote:

On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> wrote:

Now consider the overhead of creating a fork vs the actual useful code
which is running within a block.
I presume, this code will run 10x times slower on a single core
processor, comparing to one w/o forks.
So, you will need to have 10 cores to match the computation time with
single core processor.

I think its not wise to introduce parallelism on such low levels (in
Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails
with microscope :)

That's why i'm saying its too good to be true.
Introducing parallelism at such low levels will be a waste. I leaning
toward island model. This is a middle point between no sharing, like
in Hydra and sharing everything, like in what you proposing.

10 times slower? Sounds like a made-up number to me...

" Using 101 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
5711
5626
6074

" Using 11 threads: "
c := ConcurrentArray new: 1000001.
Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]].
3086
3406
3256

" Using 1 thread: "
d := Array new: 1000001.
Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i asString]].
2426
2610
2599

My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If the blocks did more work, then the overhead would be lower and some benefit would be gained from using multiple cores.

I don't have a good idea of where the overhead is going - maybe it's being lost in the block copying that is needed to work around Squeak's deficiencies? Or maybe it's waiting for the scheduler to do its stuff?

Implementation attached, MIT license if you're picky.

I just tried it on VisualWorks as well. I removed the block copying and renamed the method to "keysDo:" (I never thought of an array like that before... keys and values).

d := Array new: 1000001.
Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i printString]].
1180
982
1008

" Using 101 threads "
c := ConcurrentArray new: 1000001.

Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i printString. ]].
1072
1120
962

At this stage, I'm suspicous about the results :-).

I tried it on SmalltalkMT on a dual-core system, but failed. I couldn't work out how to use its Semaphores; they don't behave like other Smalltalks and seem to be some MS Windows concoction. In fact, the whole environment feels like C++ with the Windows API.

I also failed to get SmalltalkMT to use 100% of both cores on my machine despite forking lots of busy blocks. Huemul Smalltalk crashed when I tried to fork something.

So... I remain saddened. It appears that Hydra and GemStone are the nearest we have to a multi-core Smalltalk, and they only do coarse-grained parallelism.

This is a sad state of affairs.

Gulik.

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project