Concurrency-Oriented Programming in Pharo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Concurrency-Oriented Programming in Pharo

Prof. Andrew P. Black
I was looking at some old slides of Joe Armstrong on Concurrency-orinted programming.  He set the following challenge:

Put N processes in a ring:
Send a simple message round the ring M times.
Increase N until the system crashes.
How long did it take to start the ring?
How long did it take to send a message?
When did it crash?

He gave graphs comparing Erlang, Java and C#.  I decided to compare Pharo.  Here is what I got; the creation times are PER PROCESS and the messaging times are PER MESSAGE.

first run

procs    creation/µs    msg/µs

 200 0 7.0
 500 0 9.7
1000 2 15.4
2000 1 21.6
5000 13 31.5
10000 19.9 40.7
20000 46.5 55.4
50000 130.9 98.0

second run

procs    creation/µs    msg/µs

 200 0.0 7.0
 500 0.0 10.12
1000 0.0 16.53
2000 1.5 24.26
5000 12.8 32.15
10000 28.1 39.497
20000 58.15 52.0295
50000 75.1 95.581

third run

procs    creation/µs    msg/µs

 200 0.0 7.0
 500 0.0 8.6
1000 2.0 11.0
2000 1.0 16.55
5000 10.2 21.76
10000 12.0 49.57
20000 52.35 65.035
50000 91.76 117.1

Each process is a Pharo object (an instance of ERringElement) that contains a counter, a reference to the next ERringElement, and an "ErlangProcess" that is a Process that contains a reference to an instance of SharedQueue (its "mailbox").

The good news is that up to 50k processes, it didn't crash.  But it did run with increasing sloth.

I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?

(My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)





Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Eliot Miranda-2
Hi Andrew,

On Wed, Aug 17, 2011 at 1:50 PM, Andrew P. Black <[hidden email]> wrote:
I was looking at some old slides of Joe Armstrong on Concurrency-orinted programming.  He set the following challenge:

Put N processes in a ring:
Send a simple message round the ring M times.
Increase N until the system crashes.
How long did it take to start the ring?
How long did it take to send a message?
When did it crash?

He gave graphs comparing Erlang, Java and C#.  I decided to compare Pharo.  Here is what I got; the creation times are PER PROCESS and the messaging times are PER MESSAGE.

first run

procs    creation/µs    msg/µs

 200            0       7.0
 500            0       9.7
1000            2       15.4
2000            1       21.6
5000            13      31.5
10000           19.9    40.7
20000           46.5    55.4
50000           130.9   98.0

second run

procs    creation/µs    msg/µs

 200            0.0     7.0
 500            0.0     10.12
1000            0.0     16.53
2000            1.5     24.26
5000            12.8    32.15
10000           28.1    39.497
20000           58.15   52.0295
50000           75.1    95.581

third run

procs    creation/µs    msg/µs

 200            0.0     7.0
 500            0.0     8.6
1000            2.0     11.0
2000            1.0     16.55
5000            10.2    21.76
10000           12.0    49.57
20000           52.35   65.035
50000           91.76   117.1

Each process is a Pharo object (an instance of ERringElement) that contains a counter, a reference to the next ERringElement, and an "ErlangProcess" that is a Process that contains a reference to an instance of SharedQueue (its "mailbox").

The good news is that up to 50k processes, it didn't crash.  But it did run with increasing sloth.

I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?

If you're using Cog then one reason performance falls off with number of processes is context-to-stack mapping, see 08 Under Cover Contexts and the Big Frame-Up.  Once there are more processes than stack pages every process switch faults out a(t least one) frame to a heap context and faults in a heap context to a frame.  You can experiment by changing the number of stack pages (see vmAttributeAt:put:) but you can't have thousands of stack pages; it uses too much C stack memory.  I think the default is 64 pages and Teleplace uses ~ 112.  Each stack page can hold up to approximately 50 activations.

But to be sure what the cause of the slowdown is one could use my VMProfiler.  Has anyone ported this to Pharo yet?


(My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)



--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Stéphane Ducasse
>
> Each process is a Pharo object (an instance of ERringElement) that contains a counter, a reference to the next ERringElement, and an "ErlangProcess" that is a Process that contains a reference to an instance of SharedQueue (its "mailbox").
>
> The good news is that up to 50k processes, it didn't crash.  But it did run with increasing sloth.
>
> I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?
>
> If you're using Cog then one reason performance falls off with number of processes is context-to-stack mapping, see 08 Under Cover Contexts and the Big Frame-Up.  Once there are more processes than stack pages every process switch faults out a(t least one) frame to a heap context and faults in a heap context to a frame.  You can experiment by changing the number of stack pages (see vmAttributeAt:put:) but you can't have thousands of stack pages; it uses too much C stack memory.  I think the default is 64 pages and Teleplace uses ~ 112.  Each stack page can hold up to approximately 50 activations.
>
> But to be sure what the cause of the slowdown is one could use my VMProfiler.  Has anyone ported this to Pharo yet?

not that I know.
Where is the code :)

>
>
> (My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)
>
>
>
> --
> best,
> Eliot
>


Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Igor Stasenko
In reply to this post by Eliot Miranda-2
On 18 August 2011 00:02, Eliot Miranda <[hidden email]> wrote:

> Hi Andrew,
>
> On Wed, Aug 17, 2011 at 1:50 PM, Andrew P. Black <[hidden email]> wrote:
>>
>> I was looking at some old slides of Joe Armstrong on Concurrency-orinted
>> programming.  He set the following challenge:
>>
>> Put N processes in a ring:
>> Send a simple message round the ring M times.
>> Increase N until the system crashes.
>> How long did it take to start the ring?
>> How long did it take to send a message?
>> When did it crash?
>>
>> He gave graphs comparing Erlang, Java and C#.  I decided to compare Pharo.
>>  Here is what I got; the creation times are PER PROCESS and the messaging
>> times are PER MESSAGE.
>>
>> first run
>>
>> procs    creation/µs    msg/µs
>>
>>  200            0       7.0
>>  500            0       9.7
>> 1000            2       15.4
>> 2000            1       21.6
>> 5000            13      31.5
>> 10000           19.9    40.7
>> 20000           46.5    55.4
>> 50000           130.9   98.0
>>
>> second run
>>
>> procs    creation/µs    msg/µs
>>
>>  200            0.0     7.0
>>  500            0.0     10.12
>> 1000            0.0     16.53
>> 2000            1.5     24.26
>> 5000            12.8    32.15
>> 10000           28.1    39.497
>> 20000           58.15   52.0295
>> 50000           75.1    95.581
>>
>> third run
>>
>> procs    creation/µs    msg/µs
>>
>>  200            0.0     7.0
>>  500            0.0     8.6
>> 1000            2.0     11.0
>> 2000            1.0     16.55
>> 5000            10.2    21.76
>> 10000           12.0    49.57
>> 20000           52.35   65.035
>> 50000           91.76   117.1
>>
>> Each process is a Pharo object (an instance of ERringElement) that
>> contains a counter, a reference to the next ERringElement, and an
>> "ErlangProcess" that is a Process that contains a reference to an instance
>> of SharedQueue (its "mailbox").
>>
>> The good news is that up to 50k processes, it didn't crash.  But it did
>> run with increasing sloth.
>>
>> I can imagine that the increasing process-creation time is due to beating
>> on the memory manager.  But why the increasing message-sending time as the
>> number of processes increases?  (Recall that exactly one process is runnable
>> at any given time).  I'm wondering if the scheduler is somehow getting
>> overwhelmed by all of the non-runable processes that are blocked on
>> Semaphores in SharedQueue.  Any ideas?
>
> If you're using Cog then one reason performance falls off with number of
> processes is context-to-stack mapping, see 08 Under Cover Contexts and the
> Big Frame-Up.  Once there are more processes than stack pages every process
> switch faults out a(t least one) frame to a heap context and faults in a
> heap context to a frame.  You can experiment by changing the number of stack
> pages (see vmAttributeAt:put:) but you can't have thousands of stack pages;
> it uses too much C stack memory.  I think the default is 64 pages and
> Teleplace uses ~ 112.  Each stack page can hold up to approximately 50
> activations.
> But to be sure what the cause of the slowdown is one could use my
> VMProfiler.  Has anyone ported this to Pharo yet?


Hmm, as to me it doesn't explains why messages/second degrading
linearly to number of processes.
Because after hitting certain limit (all stack pages are full), the
messages/time proportion should not degrade anymore.
Given your explanation, i would expect  something like following:
10000           -   49.57
... somewhere here we hit stack pages limit ...
20000           -   65.035
30000           -  65.035
40000           -  65.035
50000           -  65.035

>>
>> (My code is on Squeaksource in project Erlang.  But be warned that there
>> is a simulation of the Erlang "universal server" in there too.  To run this
>> code, look for class ErlangRingTest.)
>
>
>
> --
> best,
> Eliot
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Stefan Marr-4
In reply to this post by Prof. Andrew P. Black
Hi Andrew:

On 17 Aug 2011, at 22:50, Andrew P. Black wrote:

> I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?
Vague from memory, and might confused things (we have tried too many different things here ;)):
Shouldn't the standard interpreter VM remove the process from the scheduler list when it is waiting on a semaphore?
I think, when there is only a single runable process for a given priority, then the list only contains that one.

Best regards
Stefan

>
> (My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)
>
>
>
>
>



--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Igor Stasenko
In reply to this post by Igor Stasenko
On 18 August 2011 00:20, Igor Stasenko <[hidden email]> wrote:

> On 18 August 2011 00:02, Eliot Miranda <[hidden email]> wrote:
>> Hi Andrew,
>>
>> On Wed, Aug 17, 2011 at 1:50 PM, Andrew P. Black <[hidden email]> wrote:
>>>
>>> I was looking at some old slides of Joe Armstrong on Concurrency-orinted
>>> programming.  He set the following challenge:
>>>
>>> Put N processes in a ring:
>>> Send a simple message round the ring M times.
>>> Increase N until the system crashes.
>>> How long did it take to start the ring?
>>> How long did it take to send a message?
>>> When did it crash?
>>>
>>> He gave graphs comparing Erlang, Java and C#.  I decided to compare Pharo.
>>>  Here is what I got; the creation times are PER PROCESS and the messaging
>>> times are PER MESSAGE.
>>>
>>> first run
>>>
>>> procs    creation/µs    msg/µs
>>>
>>>  200            0       7.0
>>>  500            0       9.7
>>> 1000            2       15.4
>>> 2000            1       21.6
>>> 5000            13      31.5
>>> 10000           19.9    40.7
>>> 20000           46.5    55.4
>>> 50000           130.9   98.0
>>>
>>> second run
>>>
>>> procs    creation/µs    msg/µs
>>>
>>>  200            0.0     7.0
>>>  500            0.0     10.12
>>> 1000            0.0     16.53
>>> 2000            1.5     24.26
>>> 5000            12.8    32.15
>>> 10000           28.1    39.497
>>> 20000           58.15   52.0295
>>> 50000           75.1    95.581
>>>
>>> third run
>>>
>>> procs    creation/µs    msg/µs
>>>
>>>  200            0.0     7.0
>>>  500            0.0     8.6
>>> 1000            2.0     11.0
>>> 2000            1.0     16.55
>>> 5000            10.2    21.76
>>> 10000           12.0    49.57
>>> 20000           52.35   65.035
>>> 50000           91.76   117.1
>>>
>>> Each process is a Pharo object (an instance of ERringElement) that
>>> contains a counter, a reference to the next ERringElement, and an
>>> "ErlangProcess" that is a Process that contains a reference to an instance
>>> of SharedQueue (its "mailbox").
>>>
>>> The good news is that up to 50k processes, it didn't crash.  But it did
>>> run with increasing sloth.
>>>
>>> I can imagine that the increasing process-creation time is due to beating
>>> on the memory manager.  But why the increasing message-sending time as the
>>> number of processes increases?  (Recall that exactly one process is runnable
>>> at any given time).  I'm wondering if the scheduler is somehow getting
>>> overwhelmed by all of the non-runable processes that are blocked on
>>> Semaphores in SharedQueue.  Any ideas?
>>
>> If you're using Cog then one reason performance falls off with number of
>> processes is context-to-stack mapping, see 08 Under Cover Contexts and the
>> Big Frame-Up.  Once there are more processes than stack pages every process
>> switch faults out a(t least one) frame to a heap context and faults in a
>> heap context to a frame.  You can experiment by changing the number of stack
>> pages (see vmAttributeAt:put:) but you can't have thousands of stack pages;
>> it uses too much C stack memory.  I think the default is 64 pages and
>> Teleplace uses ~ 112.  Each stack page can hold up to approximately 50
>> activations.
>> But to be sure what the cause of the slowdown is one could use my
>> VMProfiler.  Has anyone ported this to Pharo yet?
>
>
> Hmm, as to me it doesn't explains why messages/second degrading
> linearly to number of processes.
> Because after hitting certain limit (all stack pages are full), the
> messages/time proportion should not degrade anymore.
> Given your explanation, i would expect  something like following:
> 10000           -   49.57
> ... somewhere here we hit stack pages limit ...
> 20000           -   65.035
> 30000           -  65.035
> 40000           -  65.035
> 50000           -  65.035
>

yes.. except one thing: since contexts are allocated on heap, it means
more work for GC,
then it explains that it degrading linearly.

Andrew, can you play with GC parameters, like increase number of
allocations between incremental GCs etc?

>>>
>>> (My code is on Squeaksource in project Erlang.  But be warned that there
>>> is a simulation of the Erlang "universal server" in there too.  To run this
>>> code, look for class ErlangRingTest.)
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Igor Stasenko
In reply to this post by Stefan Marr-4
On 18 August 2011 00:24, Stefan Marr <[hidden email]> wrote:
> Hi Andrew:
>
> On 17 Aug 2011, at 22:50, Andrew P. Black wrote:
>
>> I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?
> Vague from memory, and might confused things (we have tried too many different things here ;)):
> Shouldn't the standard interpreter VM remove the process from the scheduler list when it is waiting on a semaphore?
> I think, when there is only a single runable process for a given priority, then the list only contains that one.
>
Yes, but as far as i remember removing from head/adding to list tail
is O(1) operation.
It should not depend linearly from the size of a list.

> Best regards
> Stefan
>
>>
>> (My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)
>>
>>
>>
>>
>>
>
>
>
> --
> Stefan Marr
> Software Languages Lab
> Vrije Universiteit Brussel
> Pleinlaan 2 / B-1050 Brussels / Belgium
> http://soft.vub.ac.be/~smarr
> Phone: +32 2 629 2974
> Fax:   +32 2 629 3525
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Eliot Miranda-2
In reply to this post by Stefan Marr-4


On Wed, Aug 17, 2011 at 2:24 PM, Stefan Marr <[hidden email]> wrote:
Hi Andrew:

On 17 Aug 2011, at 22:50, Andrew P. Black wrote:

> I can imagine that the increasing process-creation time is due to beating on the memory manager.  But why the increasing message-sending time as the number of processes increases?  (Recall that exactly one process is runnable at any given time).  I'm wondering if the scheduler is somehow getting overwhelmed by all of the non-runable processes that are blocked on Semaphores in SharedQueue.  Any ideas?
Vague from memory, and might confused things (we have tried too many different things here ;)):
Shouldn't the standard interpreter VM remove the process from the scheduler list when it is waiting on a semaphore?

Yes, you're right.  And any flavour VM will do this.  All the processes in the ring except the running one are waiting on the semaphore.

 
I think, when there is only a single runable process for a given priority, then the list only contains that one.

Best regards
Stefan

>
> (My code is on Squeaksource in project Erlang.  But be warned that there is a simulation of the Erlang "universal server" in there too.  To run this code, look for class ErlangRingTest.)
>
>
>
>
>



--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525">+32 2 629 3525





--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Prof. Andrew P. Black
In reply to this post by Igor Stasenko

On 17 Aug 2011, at 14:24 , Igor Stasenko wrote:

> yes.. except one thing: since contexts are allocated on heap, it means
> more work for GC,
> then it explains that it degrading linearly.
>
> Andrew, can you play with GC parameters, like increase number of
> allocations between incremental GCs etc?

How do I do that?

Is there a gc monitor, that will show me how much time the gc is taking, or how many times the incremental gc is running?

        Andrew


Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Mariano Martinez Peck


On Tue, Aug 23, 2011 at 11:11 AM, Andrew P. Black <[hidden email]> wrote:

On 17 Aug 2011, at 14:24 , Igor Stasenko wrote:

> yes.. except one thing: since contexts are allocated on heap, it means
> more work for GC,
> then it explains that it degrading linearly.
>
> Andrew, can you play with GC parameters, like increase number of
> allocations between incremental GCs etc?

How do I do that?



Check method category "vm parameters" in SmalltalkImage.
 
Is there a gc monitor, that will show me how much time the gc is taking, or how many times the incremental gc is running?


Yes, open "System" -> "Vm statistics"
 
       Andrew





--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: Concurrency-Oriented Programming in Pharo

Adrian Lienhard
For me,

   SmalltalkImage current vmStatisticsReportString

...works best. If you run this twice, the second time it prints the diff compared to the previous run.

So if you print the following three lines

   SmalltalkImage current vmStatisticsReportString.
   self runSomeCode.
   SmalltalkImage current vmStatisticsReportString

you'll get the number and time of full and incremental GC cycles while running your code (of course there may have been other processes active in between which may affect the results). It's also a good idea to force a full GC before.

Adrian

On Aug 23, 2011, at 12:57 , Mariano Martinez Peck wrote:

> On Tue, Aug 23, 2011 at 11:11 AM, Andrew P. Black <[hidden email]> wrote:
>
>>
>> On 17 Aug 2011, at 14:24 , Igor Stasenko wrote:
>>
>>> yes.. except one thing: since contexts are allocated on heap, it means
>>> more work for GC,
>>> then it explains that it degrading linearly.
>>>
>>> Andrew, can you play with GC parameters, like increase number of
>>> allocations between incremental GCs etc?
>>
>> How do I do that?
>>
>>
>
> Check method category "vm parameters" in SmalltalkImage.
>
>
>> Is there a gc monitor, that will show me how much time the gc is taking, or
>> how many times the incremental gc is running?
>>
>>
> Yes, open "System" -> "Vm statistics"
>
>
>>       Andrew
>>
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com