Use less CPU (improve battery life or reduce cost in the cloud)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Use less CPU (improve battery life or reduce cost in the cloud)

Holger Freyther
 
Hi,

I have done some early prototype for the Unix VM end of 2015(?) and I have improved and repeated these for MacOS now and the thread based heartbeat (now that it is the universal default). I won't make it to ESUG this year but this might be something to play with?

The motivations is simple: Polling increases the CPU usage which will reduce your battery life, takes away resources from other processes (e.g. more Pharo images) or these days increases your cloud computing bill. On top of that it might increase network latency (time from socket becoming readable to the time the semaphore is signaled).

To complete the work we have work inside the Image and the VM and some of it is on the way and others might need more discussion.


The idle process:

ProcessorScheduler>># idleProcess
        "A default background process which is invisible."

        [true] whileTrue:
                [self relinquishProcessorForMicroseconds: 1000]

Let's please yield the CPU for more than a 1ms. Unless I am missing something an expired Delay or network IO would make us wake up earlier anyway?


The delay scheduler:

The VM supports that when the next wake-up time is set to 0, the VM can sleep indefinitely. There is a pending patch to sleep "0" in our Delay scheduler. Currently we force a wake-up earlier than that. I think we should trust the VM to do wake us up even if it is a second away.


Morphic UI:

I don't understand the WorldState>>#interCyclePause but then I never looked at Morphic. Do we really need to poll like that? Under which circumstances does the world update? We get an event (where we have the event semaphore), we get some I/O (where we have a semaphore) or we have a timeout (where we sleep on a semaphore). Did anyone ever look at removing the tick?


VM I/O:

Currently we receive a SIGIO but from what I can see (and I still need to write a benchmark) the processing might be delayed 20ms? My hack removes the usage of nextPollUsecs and instead checks a variable that is set by the SIGIO handler. Besides missing memory barriers this should work(tm).

The biggest issue seems that for macos/ios the input is driven by polling. E.g. some wheel events seem to require to pump the event queue. Is this something we could trigger from the image in the future? I had hoped to get a fd to a machport we could get SIGIO for.. but that doesn't seem to exist. I have hacked out the honoring of the relinquish delay, added the polling into a iOS specific routine and thanks to the Morphic Delay we bump the event loop frequently enough.


VM heartbeat thread:

The process keeps ticking even if the VM doesn't run. E.g. sleeps and waits for an event. There is a cost in deciding when to halt the thread so there must be a cut-off for which delays we bother to disable the heartbeat thread. I think the current code would allow the hearbeat to drift so the new code might just make it a bit worse.


Where are we now?

I have pushed my changes to https://github.com/zecke/opensmalltalk-vm/tree/mac-use-less-cpu and would be happy to have people look at it, look at the memory synchronization, maybe run to see if they notice extra delays or such?


I started the same image with the plain-vm and my hacked one and let it run for about 20min. The output is coming from top.

COMMAND        %CPU                  TIME
Pharo          4.3                   00:48.49
Pharo          0.8                   00:10.20


Looking for comments and feedback.

regards
        holger
Reply | Threaded
Open this post in threaded view
|

Re: Use less CPU (improve battery life or reduce cost in the cloud)

marcel.taeumel
 
Hi Holger,

The Morphic main loop empties the event queue (mouse & keyboard) in each cycle. It also checks for drawing requests and stepping. If that processing goes too fast, it tries to free CPU time via a delay (so that the idle process can run if no other).

No, it does not make sense to get rid of that pause/delay/tick. ;-) The system is NOT event-based at that lower level. It is just polling/looping. A single UI process, which sleeps occasionally. By design.

Note that if you want to run that in the cloud, just avoid busy morphs and Morphic should use very little CPU resources.

Best,
Marcel

Am 29.08.2017 23:29:21 schrieb Holger Freyther <[hidden email]>:


Hi,

I have done some early prototype for the Unix VM end of 2015(?) and I have improved and repeated these for MacOS now and the thread based heartbeat (now that it is the universal default). I won't make it to ESUG this year but this might be something to play with?

The motivations is simple: Polling increases the CPU usage which will reduce your battery life, takes away resources from other processes (e.g. more Pharo images) or these days increases your cloud computing bill. On top of that it might increase network latency (time from socket becoming readable to the time the semaphore is signaled).

To complete the work we have work inside the Image and the VM and some of it is on the way and others might need more discussion.


The idle process:

ProcessorScheduler>># idleProcess
"A default background process which is invisible."

[true] whileTrue:
[self relinquishProcessorForMicroseconds: 1000]

Let's please yield the CPU for more than a 1ms. Unless I am missing something an expired Delay or network IO would make us wake up earlier anyway?


The delay scheduler:

The VM supports that when the next wake-up time is set to 0, the VM can sleep indefinitely. There is a pending patch to sleep "0" in our Delay scheduler. Currently we force a wake-up earlier than that. I think we should trust the VM to do wake us up even if it is a second away.


Morphic UI:

I don't understand the WorldState>>#interCyclePause but then I never looked at Morphic. Do we really need to poll like that? Under which circumstances does the world update? We get an event (where we have the event semaphore), we get some I/O (where we have a semaphore) or we have a timeout (where we sleep on a semaphore). Did anyone ever look at removing the tick?


VM I/O:

Currently we receive a SIGIO but from what I can see (and I still need to write a benchmark) the processing might be delayed 20ms? My hack removes the usage of nextPollUsecs and instead checks a variable that is set by the SIGIO handler. Besides missing memory barriers this should work(tm).

The biggest issue seems that for macos/ios the input is driven by polling. E.g. some wheel events seem to require to pump the event queue. Is this something we could trigger from the image in the future? I had hoped to get a fd to a machport we could get SIGIO for.. but that doesn't seem to exist. I have hacked out the honoring of the relinquish delay, added the polling into a iOS specific routine and thanks to the Morphic Delay we bump the event loop frequently enough.


VM heartbeat thread:

The process keeps ticking even if the VM doesn't run. E.g. sleeps and waits for an event. There is a cost in deciding when to halt the thread so there must be a cut-off for which delays we bother to disable the heartbeat thread. I think the current code would allow the hearbeat to drift so the new code might just make it a bit worse.


Where are we now?

I have pushed my changes to https://github.com/zecke/opensmalltalk-vm/tree/mac-use-less-cpu and would be happy to have people look at it, look at the memory synchronization, maybe run to see if they notice extra delays or such?


I started the same image with the plain-vm and my hacked one and let it run for about 20min. The output is coming from top.

COMMAND %CPU TIME
Pharo 4.3 00:48.49
Pharo 0.8 00:10.20


Looking for comments and feedback.

regards
holger
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Use less CPU (improve battery life or reduce cost in the cloud)

Michael Rueger
 


On 31/08/17 3:36 AM, Bert Freudenberg wrote:
> I think nowadays we could actually switch to a fully event/timer driven main loop for Morphic without major drawbacks.
> This was not true when Morphic was designed, since the VM back then was not event-based.
>
> Instead of a fixed interCyclePause we simply need to wait on a semaphore that should get signaled when a new event is
> available, and timed-out when the next Morphic alarm / step message is due. It probably also needs to be triggered when
> a deferred UI message is added.

about ten years ago I did some work for ESUG, where I changed the InputSensor to be fully event driven, but then added a
polling process that faked an event driven VM. I was supposed to do the VM side as well, but gave up after staring at it
for a while...
If I only remembered where I put that code...

Hmm, just looked at Pharo, looks somewhat similar to what I did, too long ago, don't remember the details...

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Use less CPU (improve battery life or reduce cost in the cloud)

johnmci
In reply to this post by Holger Freyther
 
self relinquishProcessorForMicroseconds: 1000

In ioRelinquishProcessorForMicroseconds it would call getNextWakeupUsecs() to get the time when the next Process had to wake up. This always would be the Morphic event loop. In Cuis Smalltalk this could be as much as 20 seconds btw.

I noticed someone changed it to limit to the microseconds given, shrug. The version I have for testing on iOS 64 devices is like so. 

sqInt

ioRelinquishProcessorForMicroseconds(sqInt microSeconds)

{

    long realTimeToWait;

extern usqLong getNextWakeupUsecs();

usqLong nextWakeupUsecs = getNextWakeupUsecs();

usqLong utcNow = get64(utcMicrosecondClock);


    if (nextWakeupUsecs <= utcNow) {

/* if nextWakeupUsecs is non-zero the next wakeup time has already

* passed and we should not wait.

*/

        if (nextWakeupUsecs != 0)

return 0;

realTimeToWait = microSeconds;

    }

    else {

        realTimeToWait = nextWakeupUsecs - utcNow;

if (realTimeToWait < microSeconds)  // so wait 1000 microseconds or longer not just 1000 microseconds 

realTimeToWait = microSeconds;

}


aioSleepForUsecs(realTimeToWait);


return 0;

}


In talking to Eliot a few months back I also changed aioSleepForUsecs and just do the aioPoll as we thought the poll should complete if there is any form of interrupt pending from the UI.

 

long 

aioSleepForUsecs(long microSeconds)

{

return aioPoll(microSeconds);

}


With those changes a pharo image runs at about 1.6%


I'm sure I changed the Mac VM to signal the event semaphore on pending events about 15 years ago.


BTW also lost over the years was a check for the interrupt keyboard sequence that would fire the interrupt semaphore. That code I think got moved into the image event processing logic. Perhaps not workable if that logic is busy processing bad (or too much) data.

On Tue, Aug 29, 2017 at 2:29 PM, Holger Freyther <[hidden email]> wrote:

Hi,

I have done some early prototype for the Unix VM end of 2015(?) and I have improved and repeated these for MacOS now and the thread based heartbeat (now that it is the universal default). I won't make it to ESUG this year but this might be something to play with?

The motivations is simple: Polling increases the CPU usage which will reduce your battery life, takes away resources from other processes (e.g. more Pharo images) or these days increases your cloud computing bill. On top of that it might increase network latency (time from socket becoming readable to the time the semaphore is signaled).

To complete the work we have work inside the Image and the VM and some of it is on the way and others might need more discussion.


The idle process:

ProcessorScheduler>># idleProcess
        "A default background process which is invisible."

        [true] whileTrue:
                [self relinquishProcessorForMicroseconds: 1000]

Let's please yield the CPU for more than a 1ms. Unless I am missing something an expired Delay or network IO would make us wake up earlier anyway?


The delay scheduler:

The VM supports that when the next wake-up time is set to 0, the VM can sleep indefinitely. There is a pending patch to sleep "0" in our Delay scheduler. Currently we force a wake-up earlier than that. I think we should trust the VM to do wake us up even if it is a second away.


Morphic UI:

I don't understand the WorldState>>#interCyclePause but then I never looked at Morphic. Do we really need to poll like that? Under which circumstances does the world update? We get an event (where we have the event semaphore), we get some I/O (where we have a semaphore) or we have a timeout (where we sleep on a semaphore). Did anyone ever look at removing the tick?


VM I/O:

Currently we receive a SIGIO but from what I can see (and I still need to write a benchmark) the processing might be delayed 20ms? My hack removes the usage of nextPollUsecs and instead checks a variable that is set by the SIGIO handler. Besides missing memory barriers this should work(tm).

The biggest issue seems that for macos/ios the input is driven by polling. E.g. some wheel events seem to require to pump the event queue. Is this something we could trigger from the image in the future? I had hoped to get a fd to a machport we could get SIGIO for.. but that doesn't seem to exist. I have hacked out the honoring of the relinquish delay, added the polling into a iOS specific routine and thanks to the Morphic Delay we bump the event loop frequently enough.


VM heartbeat thread:

The process keeps ticking even if the VM doesn't run. E.g. sleeps and waits for an event. There is a cost in deciding when to halt the thread so there must be a cut-off for which delays we bother to disable the heartbeat thread. I think the current code would allow the hearbeat to drift so the new code might just make it a bit worse.


Where are we now?

I have pushed my changes to https://github.com/zecke/opensmalltalk-vm/tree/mac-use-less-cpu and would be happy to have people look at it, look at the memory synchronization, maybe run to see if they notice extra delays or such?


I started the same image with the plain-vm and my hacked one and let it run for about 20min. The output is coming from top.

COMMAND        %CPU                  TIME
Pharo          4.3                   00:48.49
Pharo          0.8                   00:10.20


Looking for comments and feedback.

regards
        holger



--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: Use less CPU (improve battery life or reduce cost in the cloud)

tblanchard
 
Ok, since this is vaguely related...has there been any thought to trying to take advantage of GCD/libdispatch?

These days when writing iOS apps I am very careful to do the bare minimum of work on the main thread (event receipt/ui manipulation) and as much as possible on other queues.

I myself do not fully understand the way the VM works but my naive guess is it just does everything on the main thread.  Please correct me if that is wrong.
 
On Sep 5, 2017, at 8:45 PM, John McIntosh <[hidden email]> wrote:

self relinquishProcessorForMicroseconds: 1000

In ioRelinquishProcessorForMicroseconds it would call getNextWakeupUsecs() to get the time when the next Process had to wake up. This always would be the Morphic event loop. In Cuis Smalltalk this could be as much as 20 seconds btw.

I noticed someone changed it to limit to the microseconds given, shrug. The version I have for testing on iOS 64 devices is like so. 

sqInt
ioRelinquishProcessorForMicroseconds(sqInt microSeconds)
{
    long realTimeToWait;
extern usqLong getNextWakeupUsecs();
usqLong nextWakeupUsecs = getNextWakeupUsecs();
usqLong utcNow = get64(utcMicrosecondClock);

    if (nextWakeupUsecs <= utcNow) {
/* if nextWakeupUsecs is non-zero the next wakeup time has already
* passed and we should not wait.
*/
        if (nextWakeupUsecs != 0)
return 0;
realTimeToWait = microSeconds;
    }
    else {
        realTimeToWait = nextWakeupUsecs - utcNow;
if (realTimeToWait < microSeconds)  // so wait 1000 microseconds or longer not just 1000 microseconds 
realTimeToWait = microSeconds;
}

aioSleepForUsecs(realTimeToWait);

return 0;

}

In talking to Eliot a few months back I also changed aioSleepForUsecs and just do the aioPoll as we thought the poll should complete if there is any form of interrupt pending from the UI.

long 
aioSleepForUsecs(long microSeconds)
{
return aioPoll(microSeconds);

}

With those changes a pharo image runs at about 1.6%


I'm sure I changed the Mac VM to signal the event semaphore on pending events about 15 years ago.


BTW also lost over the years was a check for the interrupt keyboard sequence that would fire the interrupt semaphore. That code I think got moved into the image event processing logic. Perhaps not workable if that logic is busy processing bad (or too much) data.

On Tue, Aug 29, 2017 at 2:29 PM, Holger Freyther <[hidden email]> wrote:

Hi,

I have done some early prototype for the Unix VM end of 2015(?) and I have improved and repeated these for MacOS now and the thread based heartbeat (now that it is the universal default). I won't make it to ESUG this year but this might be something to play with?

The motivations is simple: Polling increases the CPU usage which will reduce your battery life, takes away resources from other processes (e.g. more Pharo images) or these days increases your cloud computing bill. On top of that it might increase network latency (time from socket becoming readable to the time the semaphore is signaled).

To complete the work we have work inside the Image and the VM and some of it is on the way and others might need more discussion.


The idle process:

ProcessorScheduler>># idleProcess
        "A default background process which is invisible."

        [true] whileTrue:
                [self relinquishProcessorForMicroseconds: 1000]

Let's please yield the CPU for more than a 1ms. Unless I am missing something an expired Delay or network IO would make us wake up earlier anyway?


The delay scheduler:

The VM supports that when the next wake-up time is set to 0, the VM can sleep indefinitely. There is a pending patch to sleep "0" in our Delay scheduler. Currently we force a wake-up earlier than that. I think we should trust the VM to do wake us up even if it is a second away.


Morphic UI:

I don't understand the WorldState>>#interCyclePause but then I never looked at Morphic. Do we really need to poll like that? Under which circumstances does the world update? We get an event (where we have the event semaphore), we get some I/O (where we have a semaphore) or we have a timeout (where we sleep on a semaphore). Did anyone ever look at removing the tick?


VM I/O:

Currently we receive a SIGIO but from what I can see (and I still need to write a benchmark) the processing might be delayed 20ms? My hack removes the usage of nextPollUsecs and instead checks a variable that is set by the SIGIO handler. Besides missing memory barriers this should work(tm).

The biggest issue seems that for macos/ios the input is driven by polling. E.g. some wheel events seem to require to pump the event queue. Is this something we could trigger from the image in the future? I had hoped to get a fd to a machport we could get SIGIO for.. but that doesn't seem to exist. I have hacked out the honoring of the relinquish delay, added the polling into a iOS specific routine and thanks to the Morphic Delay we bump the event loop frequently enough.


VM heartbeat thread:

The process keeps ticking even if the VM doesn't run. E.g. sleeps and waits for an event. There is a cost in deciding when to halt the thread so there must be a cut-off for which delays we bother to disable the heartbeat thread. I think the current code would allow the hearbeat to drift so the new code might just make it a bit worse.


Where are we now?

I have pushed my changes to https://github.com/zecke/opensmalltalk-vm/tree/mac-use-less-cpu and would be happy to have people look at it, look at the memory synchronization, maybe run to see if they notice extra delays or such?


I started the same image with the plain-vm and my hacked one and let it run for about 20min. The output is coming from top.

COMMAND        %CPU                  TIME
Pharo          4.3                   00:48.49
Pharo          0.8                   00:10.20


Looking for comments and feedback.

regards
        holger



--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================

Reply | Threaded
Open this post in threaded view
|

Re: Use less CPU (improve battery life or reduce cost in the cloud)

timrowledge
In reply to this post by johnmci
 

> On 05-09-2017, at 8:45 PM, John McIntosh <[hidden email]> wrote:
>
> BTW also lost over the years was a check for the interrupt keyboard sequence that would fire the interrupt semaphore. That code I think got moved into the image event processing logic. Perhaps not workable if that logic is busy processing bad (or too much) data.

Yeah, I’ve mentioned this a few times too. I still claim the VM needs to handle the lowest level of this since I see too many opportunities for the image being a little distracted by the very problem that you want tointerrupt.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
A computer's attention span is only as long as its extension cord.


Reply | Threaded
Open this post in threaded view
|

Re: Use less CPU (improve battery life or reduce cost in the cloud)

Holger Freyther
In reply to this post by johnmci
 

> On 6. Sep 2017, at 05:45, John McIntosh <[hidden email]> wrote:
>

Hey!



> self relinquishProcessorForMicroseconds: 1000
>
> In ioRelinquishProcessorForMicroseconds it would call getNextWakeupUsecs() to get the time when the next Process had to wake up. This always would be the Morphic event loop. In Cuis Smalltalk this could be as much as 20 seconds btw.


I think Squeak and Pharo should increase the timeout parameter for the relinquish primitive as well (or as proposed by Bert a new primitive to go to sleep)




> sqInt
> ioRelinquishProcessorForMicroseconds(sqInt microSeconds)
> {


>     else {
>         realTimeToWait = nextWakeupUsecs - utcNow;
> if (realTimeToWait < microSeconds)  // so wait 1000 microseconds or longer not just 1000 microseconds
> realTimeToWait = microSeconds;

So this is:

        timeToWait = MAX(MicroSecondsToNextDelayTimeout, microSeconds)


and it means we will sleep longer than when the next delay is due? This might be the reason the system is consuming less CPU? Do I understand this correctly?



> I'm sure I changed the Mac VM to signal the event semaphore on pending events about 15 years ago.

Yes it does signal the semaphore but it seems that it requires the event loop to be "pumped" periodically? Can you confirm it? Is there a way to get a fd for the event loop?