DelayIdleScheduler experiment

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DelayIdleScheduler experiment

Ben Coman
Cross posting since at this level systems are still very similar.

I've sometimes wondered about how the idle process 
relinquishing the CPU interacted with delay scheduling, 
since they operate at the extreme opposite ends of process priorities.

Historically there was...
ProcessorScheduler class >> idleProcess
[    "A default background process which is invisible."
     [true] whileTrue:
         [self relinquishProcessorForMicroseconds: 1000]     

which had the following performance this for later comparison...
    (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
    ==> "#(2 5 4 2 4 4 2 4 4 3)"

To improve battery life yield time Pharo changed this to 50000 in... 
but had a negative impact on delays... 
    (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
   ==> "#(36 51 50 51 50 51 51 30 50 50)"

The problem seems to be that #relinquishProcessorForMicroseconds: suspends the main VM thread for a fixed amount during which expired delays cannot be dealt with.  I'm not sure of the exact mechanism in the VM but this looks related...
I'm not sure about Linux

One idea was to have a setting, but another path is to mix the idle-relinquish 
into the delay-scheduling to dynamically relinquish up to exactly when the activeDelay expires.

After loading...
you can do...
    DelayIdleTicker installExperiment.
    (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
==>"#(1 1 2 2 2 2 2 2 2 2)" which btw is half the latency of the original
    DelayIdleTicker debug: true.  "CPU usage goes way up"
    DelayIdleTicker uninstall.

I'm now seeking:
* comment on how this might interact with parts of the VM I'm not familiar with.
* help to discover and stress test corner cases.  
* help to determine if there actually is a benefit and characterize what that is,
especially anyone familiar with headless images with reduced UI and I/O events.
I was surprised to see the existing system already often showed 0% CPU usage (in Windows).

cheers -ben


Reply | Threaded
Open this post in threaded view
|

Re: DelayIdleScheduler experiment

timrowledge
I may be out of date on this but I think that the Cog setup makes it possible to have a timer check every 2mS by default. The relinquishProcessor stuff was a definite hack to try to reduce cpu usage on laptoppy things and I don't think anyone ever expected it to be used for more than 1mS sleeps. It was one of those "I wonder if this will improve anything?" ideas that got left in place.

Surely the ideal would be to
a) as you point out, add this delay to the main delay list
b) find a way to provide an outside timer to trigger the delay at the top of the list rather than relying on the vm eventually checking the tick etc.

Interestingly, we have the threaded ticker for many systems as part of the Cog vm. I don't know if it works on all systems; I do know it would be 'interesting' to implement on RISC OS, but that's life. A quick look at unix/sqUnixHeartbeat.c suggests to me that it might be plausible to use variable length beats and tie the time to any sooner-than-2mS event wanted. We'd (probably) change Delay handling to behave as if there were always a 1 or 2mS (fake)delay even if the nearest actual Delay were far beyond that.


> On 2018-10-07, at 1:11 AM, Ben Coman <[hidden email]> wrote:
>
> Cross posting since at this level systems are still very similar.
>
> I've sometimes wondered about how the idle process
> relinquishing the CPU interacted with delay scheduling,
> since they operate at the extreme opposite ends of process priorities.
>
> Historically there was...
> ProcessorScheduler class >> idleProcess
> [    "A default background process which is invisible."
>      [true] whileTrue:
>          [self relinquishProcessorForMicroseconds: 1000]    
>
> which had the following performance this for later comparison...
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
>     ==> "#(2 5 4 2 4 4 2 4 4 3)"
>
> To improve battery life yield time Pharo changed this to 50000 in...
> * https://pharo.fogbugz.com/f/cases/20425/Yield-longer-in-ProcessorScheduler-idleProcess
> * https://github.com/pharo-project/pharo/commit/0b0d12dc
> but had a negative impact on delays...
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
>    ==> "#(36 51 50 51 50 51 51 30 50 50)"
> as reported... https://pharo.fogbugz.com/f/cases/22400/Delays-are-not-working-properly
>
> The problem seems to be that #relinquishProcessorForMicroseconds: suspends the main VM thread for a fixed amount during which expired delays cannot be dealt with.  I'm not sure of the exact mechanism in the VM but this looks related...
> Win: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/e2fa2d1/platforms/win32/vm/sqWin32Window.c#L1593
> Mac: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/a8a1dc1/platforms/unix/vm/sqUnixHeartbeat.c#L263 
> I'm not sure about Linux
>
> One idea was to have a setting, but another path is to mix the idle-relinquish
> into the delay-scheduling to dynamically relinquish up to exactly when the activeDelay expires.
>
> After loading...
>    https://github.com/pharo-project/pharo/pull/1887
> you can do...
>     DelayIdleTicker installExperiment.
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
> ==>"#(1 1 2 2 2 2 2 2 2 2)" which btw is half the latency of the original
>     DelayIdleTicker debug: true.  "CPU usage goes way up"
>     DelayIdleTicker uninstall.
>
> I'm now seeking:
> * comment on how this might interact with parts of the VM I'm not familiar with.
> * help to discover and stress test corner cases.  
> * help to determine if there actually is a benefit and characterize what that is,
> especially anyone familiar with headless images with reduced UI and I/O events.
> I was surprised to see the existing system already often showed 0% CPU usage (in Windows).
>
> cheers -ben
>


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Bayard(n): a person armed with the self-confidence of ignorance


Reply | Threaded
Open this post in threaded view
|

Re: DelayIdleScheduler experiment

Ben Coman
Hi Tim, 

Thanks for your feedback

On Mon, 8 Oct 2018 at 01:27, tim Rowledge <[hidden email]> wrote:
I may be out of date on this but I think that the Cog setup makes it possible to have a timer check every 2mS by default. The relinquishProcessor stuff was a definite hack to try to reduce cpu usage on laptoppy things and I don't think anyone ever expected it to be used for more than 1mS sleeps. It was one of those "I wonder if this will improve anything?" ideas that got left in place.

Surely the ideal would be to
a) as you point out, add this delay to the main delay list

Its not a separate delay.  
The "activeDelay" is split into separate pieces relinquished around asynchronous I/O like mouse, etc. 
 

b) find a way to provide an outside timer to trigger the delay at the top of the list rather than relying on the vm eventually checking the tick etc.

The key thing is to inform the VM that there are no runnable processes 
and pause the main-native-thread and heartbeat-thread until the next image-scheduled wakeup time.

One alternative may be to adapt #primSignal:atUTCMicroseconds:
to be #primSignal:atUTCMicroseconds:idle:

 

Interestingly, we have the threaded ticker for many systems as part of the Cog vm. I don't know if it works on all systems; I do know it would be 'interesting' to implement on RISC OS, but that's life. A quick look at unix/sqUnixHeartbeat.c suggests to me that it might be plausible to use variable length beats

Q: does the main-native-thread stop running if the heatbeat stops?

 
and tie the time to any sooner-than-2mS event wanted.
We'd (probably) change Delay handling to behave as if there were always a 1 or 2mS (fake)delay even if the nearest actual Delay were far beyond that.

I can't follow where you are going with that.  If your on a cloud server, you may want to idle for minutes or hours without processing delays,
and restarting delays when I/O comes in.

cheers -ben

 

> On 2018-10-07, at 1:11 AM, Ben Coman <[hidden email]> wrote:
>
> Cross posting since at this level systems are still very similar.
>
> I've sometimes wondered about how the idle process
> relinquishing the CPU interacted with delay scheduling,
> since they operate at the extreme opposite ends of process priorities.
>
> Historically there was...
> ProcessorScheduler class >> idleProcess
> [    "A default background process which is invisible."
>      [true] whileTrue:
>          [self relinquishProcessorForMicroseconds: 1000]     
>
> which had the following performance this for later comparison...
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
>     ==> "#(2 5 4 2 4 4 2 4 4 3)"
>
> To improve battery life yield time Pharo changed this to 50000 in...
> * https://pharo.fogbugz.com/f/cases/20425/Yield-longer-in-ProcessorScheduler-idleProcess
> * https://github.com/pharo-project/pharo/commit/0b0d12dc
> but had a negative impact on delays...
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
>    ==> "#(36 51 50 51 50 51 51 30 50 50)"
> as reported... https://pharo.fogbugz.com/f/cases/22400/Delays-are-not-working-properly
>
> The problem seems to be that #relinquishProcessorForMicroseconds: suspends the main VM thread for a fixed amount during which expired delays cannot be dealt with.  I'm not sure of the exact mechanism in the VM but this looks related...
> Win: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/e2fa2d1/platforms/win32/vm/sqWin32Window.c#L1593
> Mac: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/a8a1dc1/platforms/unix/vm/sqUnixHeartbeat.c#L263
> I'm not sure about Linux
>
> One idea was to have a setting, but another path is to mix the idle-relinquish
> into the delay-scheduling to dynamically relinquish up to exactly when the activeDelay expires.
>
> After loading...
>    https://github.com/pharo-project/pharo/pull/1887
> you can do...
>     DelayIdleTicker installExperiment.
>     (1 to: 10) collect: [:i | [ (Delay forMilliseconds: 1) wait ] timeToRun asMilliSeconds].
> ==>"#(1 1 2 2 2 2 2 2 2 2)" which btw is half the latency of the original
>     DelayIdleTicker debug: true.  "CPU usage goes way up"
>     DelayIdleTicker uninstall.
>
> I'm now seeking:
> * comment on how this might interact with parts of the VM I'm not familiar with.
> * help to discover and stress test corner cases. 
> * help to determine if there actually is a benefit and characterize what that is,
> especially anyone familiar with headless images with reduced UI and I/O events.
> I was surprised to see the existing system already often showed 0% CPU usage (in Windows).
>
> cheers -ben
>


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Bayard(n): a person armed with the self-confidence of ignorance