Reducing the activity of the image

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Reducing the activity of the image

NorbertHartl

Am 10.02.2015 um 11:23 schrieb Sven Van Caekenberghe <[hidden email]>:


On 10 Feb 2015, at 11:19, Norbert Hartl <[hidden email]> wrote:

Sven,

Am 10.02.2015 um 10:36 schrieb Sven Van Caekenberghe <[hidden email]>:


On 10 Feb 2015, at 09:51, Norbert Hartl <[hidden email]> wrote:


Am 10.02.2015 um 09:23 schrieb Clément Bera <[hidden email]>:

Hello,

About the Morphic rendering loop, the delay between rendering is handled in WorldState>>#interCyclePause:. The best solution to reduce the cost of the Morphic rendering loop is to put it in server mode by executing in Pharo: WorldState serverMode: true. In squeak you have to set that in the Preferences.

I'll play with it and see what can be gained.

I tried the following on an otherwise idle DigitalOcean VM running Ubuntu 13.10

$ mkdir pharo4
$ curl get.pharo.org/40+vm | bash
$ ./pharo Pharo.image save Server

First patch (slower event handling, extra delay of 50ms):

$ ./pharo Server.image eval --save 'WorldState serverMode: true'

Second patch (give time back to OS while idle for 10ms instead of for 1ms):

$ cat ProcessorScheduler-class-idleProcess.st 
'From Pharo4.0 of 18 March 2013 [Latest update: #40484] on 10 February 2015 at 9:49:15.412839 am!ProcessorScheduler class methodsFor: 'background process' stamp: 'SvenVanCaekenberghe 2/10/2015idleProc[true] w[self relinquishProcessorForMicroseconds: 10000]! !

$ ./pharo Server.image eval "'ProcessorScheduler-class-idleProcess.st' asFileReference fileIn"
$ ./pharo Server.image eval '(ProcessorScheduler class>>#idleProcess) sourceCode'
'idleProcess
"A default background process which is invisible."

[true] whileTrue:
[self relinquishProcessorForMicroseconds: 10000]'

Run an image with a basic Zn HTTP server in background:

$ ./pharo Server.image eval --no-quit 'ZnServer startDefaultOn: 1701' &
$ curl http://localhost:1701

Overall load is 0.01% but this is virtual/shared hardware, so who knows.

CPU load of the pharo process hovers around a couple of %, I am not seeing much difference, maybe it is a bit lower, but that might be wishful thinking.

my findings are similar. I have a CPU usage of 6%. WorldState serverMode adds a Delay for 50ms. Setting a higher number in the idle process does not seem to have any effect until the number is too high, then the image does not start anymore. 
I tuned all of these things and it is not faster sometimes it appears to take more CPU which probably is not true. 

I am afraid that we as a community do not fully understand what is happening or how we can control it.

On the other hand, on a machine with many images running, things are still totally fine, so we should not worry too much. It is only in specific case like yours where it becomes a concern.

I can say that 

pharo-vm-nox --noevents --nohandlers  --notimer --headless -vm-sound-null /opt/nted/image/NTed.image --no-quit eval "RFBServer stop; reset. ZnServer managedServers do: #stop. UIManager default uiProcess suspend. WorldState serverMode: true. ProcessorScheduler class compile: 'idleProcess',String cr,'[true] whileTrue: [self relinquishProcessorForMicroseconds: 10000]'. ProcessorScheduler startUp"

does not make a difference at all. My assumption here is to switch everything off, don't use sockets, try to sleep as much as possible. But….nothing. 

Norbert


But as it was discussed, the cpu consumption most probably does not come from Morphic but comes from the idle loop, which can be solved by doing an event-driven VM. 

I am particularly willing to have an event-driven VM because it then means that the VM performance would then be directly proportional to the cpu consumption. For example, theoretically, with an event-driven VM, having the VM twice faster with Spur would also mean that the VM consumes twice less energy. Go Green IT :-)

That is exactly my point. While consumed energy is turned into heat the act of saving energy is the same as having a cool device (pun intended). 

So I would like to take my consortium hat to state my upvote on this.

Norbert

2015-02-10 8:00 GMT+01:00 Eliot Miranda <[hidden email]>:



On Feb 9, 2015, at 10:41 PM, Sven Van Caekenberghe <[hidden email]> wrote:


On 10 Feb 2015, at 01:55, Eliot Miranda <[hidden email]> wrote:

Hi Sven,

On Mon, Feb 9, 2015 at 1:43 PM, Sven Van Caekenberghe <[hidden email]> wrote:
There is some timer thread between the image and the vm that ticks every millisecond, that is the cause. I don't know what it does but it is apparently needed.

Anyway, that is how I understood it from Igor and Eliot, long ago.

So basically, the VM is always slightly busy.

Yet the VM is always slightly busy with the heartbeat thread, but this is very cheap.  The actual idle cost comes form the idle loop in the background process that sends relinquishProcessorForMicroseconds:, which is a primitive that eventually calls the select system call.  This is the source of the cost.

Can we change something about that ?
Maybe just as an experiment to prove your point ?

What do you think halving or doubling the argument to relinquishProcessorForMicroseconds: should do if this is the major source of overhead?  Processor usage at idle should be closely inversely proportional right?


On 09 Feb 2015, at 21:11, Norbert Hartl <[hidden email]> wrote:

I have an installation where a pharo powered hardware is used in a closed case. Over time that collects quite some heat. One reason for this is that the pharo vm is taking approx. 6% CPU all the time. The only thing that happens is network/sockets. I suspended the ui thread in the image but on this platform it doesn't help.
Are there any tweaks to lower the polling and the activity of the image/vm even more?

thanks,

Norbert
--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Reducing the activity of the image

johnmci
It's a bit more complicated and what platform you are on does matter. Just hunt in the squeak mailing list 10 years back for getNextWakeupTick

Possibly the mac vm still calls getNextWakeupTick() which returns the next time the VM has to wake up to service a delay pop. 
Normally that is less than 1/50 of a second out due to the Morphic polling cycle, say 16 - 20 milliseconds.

The idea I had was to sleep until the VM needs to wakeup since when the ioRelinquishProcessorForMicroseconds is made we know we can sleep and the VM knows exactly when the next time to wake up is. Unfortunately we have to deal with user interrupts (i/o sockets ui)

Some platforms might use nanosleep() (#if defined(HAVE_NANOSLEEP)) which might wake when a socket interrupt arrives, but I've never confirmed that.  Anyway off then to call aioPoll() where the bulk of the cpu is consumed. I note that obviously avoiding calling aioPoll() will affect socket performance of course. 
http://www.squeakvm.org/svn/squeak/branches/Cog/platforms/unix/vm/aio.c

I note you can't properly calculate next wakeup tick in smalltalk code due to the rather brittle code base in the Delay logic. Attempts I made a decade back always resulted in a deadlock situation, which is why that calculation is done in the VM. I had last taken a serious look at this back in 2010 and found very strange oddities such as calling ioRelinquishProcessorForMicroseconds yet a wakeup time is now, or in the past.. Obviously one needed to explore the stack traces to understand why no process was runnable, yet a process was scheduled to be woken...

Anyway compare ioRelinquishProcessorForMicroseconds

Against whatever is being compiled for your target platform VM and what exactly HAVE_NANOSLEEP is when the VM is compiled.
Also check idle CPU usage for say a OS X Squeak 4.2.5 VM against I'm assume a unix vm flavor as you can run both on the same os-x machine for comparison using the same image/etc.


On Tue, Feb 10, 2015 at 3:03 AM, Norbert Hartl <[hidden email]> wrote:

Am 10.02.2015 um 11:23 schrieb Sven Van Caekenberghe <[hidden email]>:


On 10 Feb 2015, at 11:19, Norbert Hartl <[hidden email]> wrote:

Sven,

Am 10.02.2015 um 10:36 schrieb Sven Van Caekenberghe <[hidden email]>:


On 10 Feb 2015, at 09:51, Norbert Hartl <[hidden email]> wrote:


Am 10.02.2015 um 09:23 schrieb Clément Bera <[hidden email]>:

Hello,

About the Morphic rendering loop, the delay between rendering is handled in WorldState>>#interCyclePause:. The best solution to reduce the cost of the Morphic rendering loop is to put it in server mode by executing in Pharo: WorldState serverMode: true. In squeak you have to set that in the Preferences.

I'll play with it and see what can be gained.

I tried the following on an otherwise idle DigitalOcean VM running Ubuntu 13.10

$ mkdir pharo4
$ curl get.pharo.org/40+vm | bash
$ ./pharo Pharo.image save Server

First patch (slower event handling, extra delay of 50ms):

$ ./pharo Server.image eval --save 'WorldState serverMode: true'

Second patch (give time back to OS while idle for 10ms instead of for 1ms):

$ cat ProcessorScheduler-class-idleProcess.st 
'From Pharo4.0 of 18 March 2013 [Latest update: #40484] on 10 February 2015 at 9:49:15.412839 am!ProcessorScheduler class methodsFor: 'background process' stamp: 'SvenVanCaekenberghe 2/10/2015idleProc[true] w[self relinquishProcessorForMicroseconds: 10000]! !

$ ./pharo Server.image eval "'ProcessorScheduler-class-idleProcess.st' asFileReference fileIn"
$ ./pharo Server.image eval '(ProcessorScheduler class>>#idleProcess) sourceCode'
'idleProcess
"A default background process which is invisible."

[true] whileTrue:
[self relinquishProcessorForMicroseconds: 10000]'

Run an image with a basic Zn HTTP server in background:

$ ./pharo Server.image eval --no-quit 'ZnServer startDefaultOn: 1701' &
$ curl http://localhost:1701

Overall load is 0.01% but this is virtual/shared hardware, so who knows.

CPU load of the pharo process hovers around a couple of %, I am not seeing much difference, maybe it is a bit lower, but that might be wishful thinking.

my findings are similar. I have a CPU usage of 6%. WorldState serverMode adds a Delay for 50ms. Setting a higher number in the idle process does not seem to have any effect until the number is too high, then the image does not start anymore. 
I tuned all of these things and it is not faster sometimes it appears to take more CPU which probably is not true. 

I am afraid that we as a community do not fully understand what is happening or how we can control it.

On the other hand, on a machine with many images running, things are still totally fine, so we should not worry too much. It is only in specific case like yours where it becomes a concern.

I can say that 

pharo-vm-nox --noevents --nohandlers  --notimer --headless -vm-sound-null /opt/nted/image/NTed.image --no-quit eval "RFBServer stop; reset. ZnServer managedServers do: #stop. UIManager default uiProcess suspend. WorldState serverMode: true. ProcessorScheduler class compile: 'idleProcess',String cr,'[true] whileTrue: [self relinquishProcessorForMicroseconds: 10000]'. ProcessorScheduler startUp"

does not make a difference at all. My assumption here is to switch everything off, don't use sockets, try to sleep as much as possible. But….nothing. 

Norbert


But as it was discussed, the cpu consumption most probably does not come from Morphic but comes from the idle loop, which can be solved by doing an event-driven VM. 

I am particularly willing to have an event-driven VM because it then means that the VM performance would then be directly proportional to the cpu consumption. For example, theoretically, with an event-driven VM, having the VM twice faster with Spur would also mean that the VM consumes twice less energy. Go Green IT :-)

That is exactly my point. While consumed energy is turned into heat the act of saving energy is the same as having a cool device (pun intended). 

So I would like to take my consortium hat to state my upvote on this.

Norbert

2015-02-10 8:00 GMT+01:00 Eliot Miranda <[hidden email]>:



On Feb 9, 2015, at 10:41 PM, Sven Van Caekenberghe <[hidden email]> wrote:


On 10 Feb 2015, at 01:55, Eliot Miranda <[hidden email]> wrote:

Hi Sven,

On Mon, Feb 9, 2015 at 1:43 PM, Sven Van Caekenberghe <[hidden email]> wrote:
There is some timer thread between the image and the vm that ticks every millisecond, that is the cause. I don't know what it does but it is apparently needed.

Anyway, that is how I understood it from Igor and Eliot, long ago.

So basically, the VM is always slightly busy.

Yet the VM is always slightly busy with the heartbeat thread, but this is very cheap.  The actual idle cost comes form the idle loop in the background process that sends relinquishProcessorForMicroseconds:, which is a primitive that eventually calls the select system call.  This is the source of the cost.

Can we change something about that ?
Maybe just as an experiment to prove your point ?

What do you think halving or doubling the argument to relinquishProcessorForMicroseconds: should do if this is the major source of overhead?  Processor usage at idle should be closely inversely proportional right?


On 09 Feb 2015, at 21:11, Norbert Hartl <[hidden email]> wrote:

I have an installation where a pharo powered hardware is used in a closed case. Over time that collects quite some heat. One reason for this is that the pharo vm is taking approx. 6% CPU all the time. The only thing that happens is network/sockets. I suspended the ui thread in the image but on this platform it doesn't help.
Are there any tweaks to lower the polling and the activity of the image/vm even more?

thanks,

Norbert
--
best,
Eliot




--
===========================================================================
John M. McIntosh <[hidden email]https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] re: Reducing the activity of the image

johnmci
In reply to this post by philippeback
Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...) 

On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:


Hoi Norbert--

     In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

-    select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
-    pthread_cond_timedwait()
-    nanosleep()

     pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.


     thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)




--
===========================================================================
John M. McIntosh <[hidden email]https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] re: Reducing the activity of the image

Eliot Miranda-2


On Thu, Feb 12, 2015 at 10:45 AM, John McIntosh <[hidden email]> wrote:
 
Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...) 

+1.  What he said.  The problem with pthread_cond_timed_wait, or any other merely delaying call is that, unless all file descriptors have been set up to send signals on read/writability and unless the blocking call is interruptible, the call may block for as long as it is asked, not until that or the read/writeability of the file descriptor.

IMO a better solution here is to a) use epoll or its equivalent kqueue; these are like select but the state of which selectors to examine is kept in kernel space, so the set-up overhead is vastly reduced, and b) wait for no longer than the next scheduled delay if one is in progres.


Of course, the VM can do both of these things, and then there's no need for a background process at all.  Instead, when the V< scheduler finds there's nothing to run it calls epoll or kqueue with either an infinite timeout (if no delay is in progress) or the time until the next delay expiration.

Now, if only there was more time ;-)

It strikes me that the VM can have a flag that makes it behave like this so that e.g. some time in the Spur release cycle we can set the flag, nuke the background process and get on with our lives.



On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:


Hoi Norbert--

     In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

-    select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
-    pthread_cond_timedwait()
-    nanosleep()

     pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.


     thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)




--
===========================================================================
John M. McIntosh <[hidden email]https://www.linkedin.com/in/smalltalk
===========================================================================




--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] re: Reducing the activity of the image

johnmci
I did look at using pthread_delay_np to delay the heartbeat thread as my thought was if the image is sleeping why wake up to service the clock, etc. 
Difficult to measure the outcome, but one should consider that option too. 

On Thu, Feb 12, 2015 at 10:55 AM, Eliot Miranda <[hidden email]> wrote:
 


On Thu, Feb 12, 2015 at 10:45 AM, John McIntosh <[hidden email]> wrote:
 
Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...) 

+1.  What he said.  The problem with pthread_cond_timed_wait, or any other merely delaying call is that, unless all file descriptors have been set up to send signals on read/writability and unless the blocking call is interruptible, the call may block for as long as it is asked, not until that or the read/writeability of the file descriptor.

IMO a better solution here is to a) use epoll or its equivalent kqueue; these are like select but the state of which selectors to examine is kept in kernel space, so the set-up overhead is vastly reduced, and b) wait for no longer than the next scheduled delay if one is in progres.


Of course, the VM can do both of these things, and then there's no need for a background process at all.  Instead, when the V< scheduler finds there's nothing to run it calls epoll or kqueue with either an infinite timeout (if no delay is in progress) or the time until the next delay expiration.

Now, if only there was more time ;-)

It strikes me that the VM can have a flag that makes it behave like this so that e.g. some time in the Spur release cycle we can set the flag, nuke the background process and get on with our lives.



On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:


Hoi Norbert--

     In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

-    select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
-    pthread_cond_timedwait()
-    nanosleep()

     pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.


     thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)




--
===========================================================================
John M. McIntosh <[hidden email]https://www.linkedin.com/in/smalltalk
===========================================================================




--
best,
Eliot




--
===========================================================================
John M. McIntosh <[hidden email]https://www.linkedin.com/in/smalltalk
===========================================================================
12