Smalltalk › Squeak › Squeak VM

Re: [Pharo-dev] Reducing the activity of the image

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

18 messages Options

philippeback

Re: [Pharo-dev] Reducing the activity of the image

Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?

This CPU usage is really annoying indeed.

Phil

Le 9 févr. 2015 21:11, "Norbert Hartl" <[hidden email]> a écrit :

I have an installation where a pharo powered hardware is used in a closed case. Over time that collects quite some heat. One reason for this is that the pharo vm is taking approx. 6% CPU all the time. The only thing that happens is network/sockets. I suspended the ui thread in the image but on this platform it doesn't help.
Are there any tweaks to lower the polling and the activity of the image/vm even more?

thanks,

Norbert

timrowledge

Re: [Pharo-dev] Reducing the activity of the image

On 09-02-2015, at 12:33 PM, [hidden email] wrote:

> Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?
>
> This CPU usage is really annoying indeed.

Assuming you are using a stack or Cog vm, that will mostly be the heartbeat that checks for inputs and process switches and GC limits etc. Plus any remaining morphic loop etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Two wrongs are only the beginning.

NorbertHartl

Re: [Pharo-dev] Reducing the activity of the image

In reply to this post by philippeback

Am 09.02.2015 um 21:33 schrieb [hidden email]:

Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?

Nope, the device is an access point that serves seaside and websockets to tablets. There are close to no options having itself switch off.

This CPU usage is really annoying indeed.

Yes it is. It is just one thing like 32 bit where we are way behind and no resources available to fix it.

Norbert

Phil

Le 9 févr. 2015 21:11, "Norbert Hartl" <[hidden email]> a écrit :
I have an installation where a pharo powered hardware is used in a closed case. Over time that collects quite some heat. One reason for this is that the pharo vm is taking approx. 6% CPU all the time. The only thing that happens is network/sockets. I suspended the ui thread in the image but on this platform it doesn't help.
Are there any tweaks to lower the polling and the activity of the image/vm even more?

thanks,

Norbert

NorbertHartl

Re: [Pharo-dev] Reducing the activity of the image

In reply to this post by timrowledge

> Am 09.02.2015 um 21:39 schrieb tim Rowledge <[hidden email]>:
>
>
>
> On 09-02-2015, at 12:33 PM, [hidden email] wrote:
>
>> Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?
>>
>> This CPU usage is really annoying indeed.
>
> Assuming you are using a stack or Cog vm, that will mostly be the heartbeat that checks for inputs and process switches and GC limits etc. Plus any remaining morphic loop etc.
>

Thanks for the analysis. Not being an expert on _all_ those topics :) I'm asking myself if there are some tweaks (monkey patching is fine) to get rid of those? I certainly do not need any morphic loop as long as it doesn't steer the network reception *cough* :) Can the intervals to check inputs and process switches be stretched without open the door to hell because timing loops are tightly aligned?

thanks,

Norbert

timrowledge

Re: [Pharo-dev] Reducing the activity of the image

On 09-02-2015, at 1:00 PM, Norbert Hartl <[hidden email]> wrote:

>
>
>> Am 09.02.2015 um 21:39 schrieb tim Rowledge <[hidden email]>:
>>
>>
>>
>> On 09-02-2015, at 12:33 PM, [hidden email] wrote:
>>
>>> Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?
>>>
>>> This CPU usage is really annoying indeed.
>>
>> Assuming you are using a stack or Cog vm, that will mostly be the heartbeat that checks for inputs and process switches and GC limits etc. Plus any remaining morphic loop etc.
>>
> Thanks for the analysis. Not being an expert on _all_ those topics :) I'm asking myself if there are some tweaks (monkey patching is fine) to get rid of those? I certainly do not need any morphic loop as long as it doesn't steer the network reception *cough* :)

We’d need some input from someone that really knows Morphic. You could try running in an MVC project and see if it makes any difference?

> Can the intervals to check inputs and process switches be stretched without open the door to hell because timing loops are tightly aligned?

Maybe. Eliot?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
What passes for common sense is always revisable

NorbertHartl

Re: [Pharo-dev] Reducing the activity of the image

Am 09.02.2015 um 22:05 schrieb tim Rowledge <[hidden email]>:

On 09-02-2015, at 1:00 PM, Norbert Hartl <[hidden email]> wrote:

Am 09.02.2015 um 21:39 schrieb tim Rowledge <[hidden email]>:

On 09-02-2015, at 12:33 PM, [hidden email] wrote:

Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?

This CPU usage is really annoying indeed.

Assuming you are using a stack or Cog vm, that will mostly be the heartbeat that checks for inputs and process switches and GC limits etc. Plus any remaining morphic loop etc.

Thanks for the analysis. Not being an expert on _all_ those topics :) I'm asking myself if there are some tweaks (monkey patching is fine) to get rid of those? I certainly do not need any morphic loop as long as it doesn't steer the network reception *cough* :)

We’d need some input from someone that really knows Morphic. You could try running in an MVC project and see if it makes any difference?

Pharo does not have MVC anymore.

Norbert

Can the intervals to check inputs and process switches be stretched without open the door to hell because timing loops are tightly aligned?

Maybe. Eliot?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
What passes for common sense is always revisable

Eliot Miranda-2

Re: [Pharo-dev] Reducing the activity of the image

In reply to this post by timrowledge

Hi Tim,

On Mon, Feb 9, 2015 at 12:39 PM, tim Rowledge <[hidden email]> wrote:

On 09-02-2015, at 12:33 PM, [hidden email] wrote:

> Can't the box be setup 5o do some WoL thing and go back to sleep when idling for a while?
>
> This CPU usage is really annoying indeed.

Assuming you are using a stack or Cog vm, that will mostly be the heartbeat that checks for inputs and process switches and GC limits etc. Plus any remaining morphic loop etc.

No. The heartbeat is extremely cheap. It is the idle loop that calls ioRelinquishProcessorForMicroseconds which in turn calls aioSleepForUsecs which calls select:

(Delay forSeconds: 60) wait

gc prior. clear prior.

60.002 seconds; sampling frequency 1385 hz

7 samples in the VM (83120 samples in the entire program) 0.01% of total

3 samples in generated vm code 42.86% of entire vm ( 0.00% of total)

4 samples in vanilla vm code 57.14% of entire vm ( 0.00% of total)

% of generated vm code (% of total) (samples) (cumulative)

100.0% ( 0.00%) ...others... (3) (100.0%)

% of vanilla vm code (% of total) (samples) (cumulative)

100.0% ( 0.00%) ...others... (4) (100.0%)

83113 samples in the rest 99.99% of total

% of rest (% of total) (samples) (cumulative)

99.98% (99.97%) select$DARWIN_EXTSN (83095) (99.98%)

0.01% ( 0.01%) mach_msg_trap (10) (99.99%)

0.01% ( 0.01%) ...others... (8) (100.0%)

Now using epoll would make the select cheaper and I have changes for that. But the real solution is to combine this with an event-driven VM.

best,

Eliot

timrowledge

Re: [Pharo-dev] Reducing the activity of the image

On 09-02-2015, at 4:53 PM, Eliot Miranda <[hidden email]> wrote:
>
> No. The heartbeat is extremely cheap. It is the idle loop that calls ioRelinquishProcessorForMicroseconds which in turn calls aioSleepForUsecs which calls select:
>

Happy to be shown wrong. That means it most likely is morphic and/or any other tucked away processes. On a Pi it seems to be about 15% cpu time, so there is certainly some interest in reducing it!

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
There are no stupid questions. But, there are a lot of inquisitive idiots.

Eliot Miranda-2

Re: [Pharo-dev] Reducing the activity of the image

Hi Tim,

On Mon, Feb 9, 2015 at 5:02 PM, tim Rowledge <[hidden email]> wrote:

On 09-02-2015, at 4:53 PM, Eliot Miranda <[hidden email]> wrote:
>
> No. The heartbeat is extremely cheap. It is the idle loop that calls ioRelinquishProcessorForMicroseconds which in turn calls aioSleepForUsecs which calls select:
>

Happy to be shown wrong. That means it most likely is morphic and/or any other tucked away processes. On a Pi it seems to be about 15% cpu time, so there is certainly some interest in reducing it!

it is this one:

ProcessorScheduler>>idleProcess

"A default background process which is invisible."

[self relinquishProcessorForMicroseconds: 1000] repeat

If you recall the VW VM, that got rid of the background process and when the VM scheduling loop finds nothing to run it calls a blocking routine.

best,

Eliot

timrowledge

Re: [Pharo-dev] Reducing the activity of the image

On 09-02-2015, at 5:05 PM, Eliot Miranda <[hidden email]> wrote:
> it is this one:
>
> ProcessorScheduler>>idleProcess
> "A default background process which is invisible."
>
> [self relinquishProcessorForMicroseconds: 1000] repeat
>
> If you recall the VW VM, that got rid of the background process and when the VM scheduling loop finds nothing to run it calls a blocking routine.

How embarrassingly obvious. Not having a good day today, pi camera stuff is going to drive me to hairlessness at this rate. Updating everything actually made it even worse! Sigh.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: JTC: Jump To Conclusions

Eliot Miranda-2

Re: [Pharo-dev] Reducing the activity of the image

On Mon, Feb 9, 2015 at 5:27 PM, tim Rowledge <[hidden email]> wrote:

On 09-02-2015, at 5:05 PM, Eliot Miranda <[hidden email]> wrote:
> it is this one:
>
> ProcessorScheduler>>idleProcess
> "A default background process which is invisible."
>
> [self relinquishProcessorForMicroseconds: 1000] repeat
>
> If you recall the VW VM, that got rid of the background process and when the VM scheduling loop finds nothing to run it calls a blocking routine.

How embarrassingly obvious. Not having a good day today, pi camera stuff is going to drive me to hairlessness at this rate. Updating everything actually made it even worse! Sigh.

Don't beat yourself up. I'm sure you were in good company and now you know...

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: JTC: Jump To Conclusions

best,

Eliot

johnmci

Re: [Pharo-dev] Reducing the activity of the image

In reply to this post by philippeback

It's a bit more complicated and what platform you are on does matter. Just hunt in the squeak mailing list 10 years back for getNextWakeupTick

Possibly the mac vm still calls getNextWakeupTick() which returns the next time the VM has to wake up to service a delay pop.

Normally that is less than 1/50 of a second out due to the Morphic polling cycle, say 16 - 20 milliseconds.

The idea I had was to sleep until the VM needs to wakeup since when the ioRelinquishProcessorForMicroseconds is made we know we can sleep and the VM knows exactly when the next time to wake up is. Unfortunately we have to deal with user interrupts (i/o sockets ui)

Some platforms might use nanosleep() (#if defined(HAVE_NANOSLEEP)) which might wake when a socket interrupt arrives, but I've never confirmed that. Anyway off then to call aioPoll() where the bulk of the cpu is consumed. I note that obviously avoiding calling aioPoll() will affect socket performance of course.

http://www.squeakvm.org/svn/squeak/branches/Cog/platforms/unix/vm/aio.c

I note you can't properly calculate next wakeup tick in smalltalk code due to the rather brittle code base in the Delay logic. Attempts I made a decade back always resulted in a deadlock situation, which is why that calculation is done in the VM. I had last taken a serious look at this back in 2010 and found very strange oddities such as calling ioRelinquishProcessorForMicroseconds yet a wakeup time is now, or in the past.. Obviously one needed to explore the stack traces to understand why no process was runnable, yet a process was scheduled to be woken...

Anyway compare ioRelinquishProcessorForMicroseconds

http://www.squeakvm.org/svn/squeak/branches/Cog/platforms/iOS/vm/Common/Classes/sqMacV2Time.c

Against whatever is being compiled for your target platform VM and what exactly HAVE_NANOSLEEP is when the VM is compiled.

Also check idle CPU usage for say a OS X Squeak 4.2.5 VM against I'm assume a unix vm flavor as you can run both on the same os-x machine for comparison using the same image/etc.

On Tue, Feb 10, 2015 at 3:03 AM, Norbert Hartl <[hidden email]> wrote:

Am 10.02.2015 um 11:23 schrieb Sven Van Caekenberghe <[hidden email]>:

On 10 Feb 2015, at 11:19, Norbert Hartl <[hidden email]> wrote:

Sven,

Am 10.02.2015 um 10:36 schrieb Sven Van Caekenberghe <[hidden email]>:

On 10 Feb 2015, at 09:51, Norbert Hartl <[hidden email]> wrote:

Am 10.02.2015 um 09:23 schrieb Clément Bera <[hidden email]>:

Hello,

About the Morphic rendering loop, the delay between rendering is handled in WorldState>>#interCyclePause:. The best solution to reduce the cost of the Morphic rendering loop is to put it in server mode by executing in Pharo: WorldState serverMode: true. In squeak you have to set that in the Preferences.

I'll play with it and see what can be gained.

I tried the following on an otherwise idle DigitalOcean VM running Ubuntu 13.10

$ mkdir pharo4
$ curl get.pharo.org/40+vm | bash
$ ./pharo Pharo.image save Server

First patch (slower event handling, extra delay of 50ms):

$ ./pharo Server.image eval --save 'WorldState serverMode: true'

Second patch (give time back to OS while idle for 10ms instead of for 1ms):

$ cat ProcessorScheduler-class-idleProcess.st
'From Pharo4.0 of 18 March 2013 [Latest update: #40484] on 10 February 2015 at 9:49:15.412839 am!ProcessorScheduler class methodsFor: 'background process' stamp: 'SvenVanCaekenberghe 2/10/2015idleProc[true] w[self relinquishProcessorForMicroseconds: 10000]! !

$ ./pharo Server.image eval "'ProcessorScheduler-class-idleProcess.st' asFileReference fileIn"
$ ./pharo Server.image eval '(ProcessorScheduler class>>#idleProcess) sourceCode'
'idleProcess
"A default background process which is invisible."

[true] whileTrue:
[self relinquishProcessorForMicroseconds: 10000]'

Run an image with a basic Zn HTTP server in background:

$ ./pharo Server.image eval --no-quit 'ZnServer startDefaultOn: 1701' &
$ curl http://localhost:1701

Overall load is 0.01% but this is virtual/shared hardware, so who knows.

CPU load of the pharo process hovers around a couple of %, I am not seeing much difference, maybe it is a bit lower, but that might be wishful thinking.

my findings are similar. I have a CPU usage of 6%. WorldState serverMode adds a Delay for 50ms. Setting a higher number in the idle process does not seem to have any effect until the number is too high, then the image does not start anymore.
I tuned all of these things and it is not faster sometimes it appears to take more CPU which probably is not true.

I am afraid that we as a community do not fully understand what is happening or how we can control it.

On the other hand, on a machine with many images running, things are still totally fine, so we should not worry too much. It is only in specific case like yours where it becomes a concern.

I can say that

pharo-vm-nox --noevents --nohandlers --notimer --headless -vm-sound-null /opt/nted/image/NTed.image --no-quit eval "RFBServer stop; reset. ZnServer managedServers do: #stop. UIManager default uiProcess suspend. WorldState serverMode: true. ProcessorScheduler class compile: 'idleProcess',String cr,'[true] whileTrue: [self relinquishProcessorForMicroseconds: 10000]'. ProcessorScheduler startUp"

does not make a difference at all. My assumption here is to switch everything off, don't use sockets, try to sleep as much as possible. But….nothing.

Norbert

But as it was discussed, the cpu consumption most probably does not come from Morphic but comes from the idle loop, which can be solved by doing an event-driven VM.

I am particularly willing to have an event-driven VM because it then means that the VM performance would then be directly proportional to the cpu consumption. For example, theoretically, with an event-driven VM, having the VM twice faster with Spur would also mean that the VM consumes twice less energy. Go Green IT :-)

That is exactly my point. While consumed energy is turned into heat the act of saving energy is the same as having a cool device (pun intended).

So I would like to take my consortium hat to state my upvote on this.

Norbert

2015-02-10 8:00 GMT+01:00 Eliot Miranda <[hidden email]>:

On Feb 9, 2015, at 10:41 PM, Sven Van Caekenberghe <[hidden email]> wrote:

On 10 Feb 2015, at 01:55, Eliot Miranda <[hidden email]> wrote:

Hi Sven,

On Mon, Feb 9, 2015 at 1:43 PM, Sven Van Caekenberghe <[hidden email]> wrote:
There is some timer thread between the image and the vm that ticks every millisecond, that is the cause. I don't know what it does but it is apparently needed.

Anyway, that is how I understood it from Igor and Eliot, long ago.

So basically, the VM is always slightly busy.

Yet the VM is always slightly busy with the heartbeat thread, but this is very cheap. The actual idle cost comes form the idle loop in the background process that sends relinquishProcessorForMicroseconds:, which is a primitive that eventually calls the select system call. This is the source of the cost.

Can we change something about that ?
Maybe just as an experiment to prove your point ?

What do you think halving or doubling the argument to relinquishProcessorForMicroseconds: should do if this is the major source of overhead? Processor usage at idle should be closely inversely proportional right?

On 09 Feb 2015, at 21:11, Norbert Hartl <[hidden email]> wrote:

I have an installation where a pharo powered hardware is used in a closed case. Over time that collects quite some heat. One reason for this is that the pharo vm is taking approx. 6% CPU all the time. The only thing that happens is network/sockets. I suspended the ui thread in the image but on this platform it doesn't help.
Are there any tweaks to lower the polling and the activity of the image/vm even more?

thanks,

Norbert
--
best,
Eliot

===========================================================================
John M. McIntosh <[hidden email]> https://www.linkedin.com/in/smalltalk
===========================================================================

ccrraaiigg

re: Reducing the activity of the image

In reply to this post by NorbertHartl

Hoi Norbert--

In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

- select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
- pthread_cond_timedwait()
- nanosleep()

pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.

thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

johnmci

re: Reducing the activity of the image

Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...)

On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:

Hoi Norbert--

In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

- select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
- pthread_cond_timedwait()
- nanosleep()

pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.

thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

Eliot Miranda-2

re: Reducing the activity of the image

On Thu, Feb 12, 2015 at 10:45 AM, John McIntosh <[hidden email]> wrote:

Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...)

+1. What he said. The problem with pthread_cond_timed_wait, or any other merely delaying call is that, unless all file descriptors have been set up to send signals on read/writability and unless the blocking call is interruptible, the call may block for as long as it is asked, not until that or the read/writeability of the file descriptor.

IMO a better solution here is to a) use epoll or its equivalent kqueue; these are like select but the state of which selectors to examine is kept in kernel space, so the set-up overhead is vastly reduced, and b) wait for no longer than the next scheduled delay if one is in progres.

Of course, the VM can do both of these things, and then there's no need for a background process at all. Instead, when the V< scheduler finds there's nothing to run it calls epoll or kqueue with either an infinite timeout (if no delay is in progress) or the time until the next delay expiration.

Now, if only there was more time ;-)

It strikes me that the VM can have a flag that makes it behave like this so that e.g. some time in the Spur release cycle we can set the flag, nuke the background process and get on with our lives.

On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:

Hoi Norbert--

In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

- select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
- pthread_cond_timedwait()
- nanosleep()

pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.

thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
===========================================================================
John M. McIntosh <[hidden email]> https://www.linkedin.com/in/smalltalk
===========================================================================

best,

Eliot

johnmci

re: Reducing the activity of the image

I did look at using pthread_delay_np to delay the heartbeat thread as my thought was if the image is sleeping why wake up to service the clock, etc.

Difficult to measure the outcome, but one should consider that option too.

On Thu, Feb 12, 2015 at 10:55 AM, Eliot Miranda <[hidden email]> wrote:

On Thu, Feb 12, 2015 at 10:45 AM, John McIntosh <[hidden email]> wrote:

Craig so how does using pthread_cond_timedwait affect socket processing? The promise of nanosleep was to wake up if an interrupt arrived say on a socket (Mind I never actually confirmed this the case, complete hearsay...)

+1. What he said. The problem with pthread_cond_timed_wait, or any other merely delaying call is that, unless all file descriptors have been set up to send signals on read/writability and unless the blocking call is interruptible, the call may block for as long as it is asked, not until that or the read/writeability of the file descriptor.

IMO a better solution here is to a) use epoll or its equivalent kqueue; these are like select but the state of which selectors to examine is kept in kernel space, so the set-up overhead is vastly reduced, and b) wait for no longer than the next scheduled delay if one is in progres.

Of course, the VM can do both of these things, and then there's no need for a background process at all. Instead, when the V< scheduler finds there's nothing to run it calls epoll or kqueue with either an infinite timeout (if no delay is in progress) or the time until the next delay expiration.

Now, if only there was more time ;-)

It strikes me that the VM can have a flag that makes it behave like this so that e.g. some time in the Spur release cycle we can set the flag, nuke the background process and get on with our lives.

On Thu, Feb 12, 2015 at 2:40 AM, Craig Latta <[hidden email]> wrote:

Hoi Norbert--

In 2003, while implementing remote messaging for what became the
Naiad distributed module system[1], I noticed excessive CPU usage during
idle by Squeak on MacOSX (and extremely poor remote messaging
performance). I prepared alternate versions of
ioRelinquishProcessorForMicroseconds, comparing:

- select() (AKA aioSleepForUsecs in Ian's aio API, my starting point)
- pthread_cond_timedwait()
- nanosleep()

pthread_cond_timedwait was the clear winner at the time. I wrote my
own relinquish primitive as part of the Flow external streaming
plugin[2], and I've been using it ever since. Still seems fine. I've
mentioned this before.

thanks,

-C

[1] http://netjam.org/naiad
[1] http://netjam.org/flow

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
===========================================================================
John M. McIntosh <[hidden email]> https://www.linkedin.com/in/smalltalk
===========================================================================

--
best,
Eliot

ccrraaiigg

re: Reducing the activity of the image

In reply to this post by Eliot Miranda-2

Hi all--

Apologies, my newsreader's thread database got trashed, and I
missed the responses to my previous message until now.

John McIntosh writes:

> Craig so how does using pthread_cond_timedwait affect socket
> processing?

It makes it actually work well. :) This was the whole point of
using pthread_cond_timedwait. Please read the manpage at [1]. It waits
until either a condition is met (hence the "cond") or a timeout elapses.

In the Flow virtual machine plugin, I have a
synchronizedSignalSemaphoreWithIndex function that calls the usual
signalSemaphoreWithIndex provided by the virtual machine, and also sets
the activity condition that the relinquish primitive cares about. The
host threads which service external I/O requests from primitives use
synchronizedSignalSemaphoreWithIndex when signalling the semaphores on
which Smalltalk-level code is waiting. This includes not only the
semaphores for reading and writing sockets, but also those for
activities with other external resources entirely, like MIDI ports.

So you get a generalized scheme which is not tied to the arcana of
any particular kind of external resource, and it works the same way on
any platform which supports the POSIX API (which now is all the Unix-ish
ones). This has seemed the obvious way to go for over ten years now.

Until I implemented this scheme, remote messaging throughput (and
MIDI throughput) was horrible. Believe me, I tried all the other schemes
that everyone has mentioned in the Squeak community and its descendants
since 1996, and none of them were anything better than deeply embarrassing.

From the Flow plugin, check out flow.c[2], which implements
synchronizedSignalSemaphoreWithIndex, the activity condition, and the
relinquish primitive, and ip.c[3] which creates host threads to do
background work for external resource primitives and uses
synchronizedSignalSemaphoreWithIndex to coordinate with the
Smalltalk-level code and the relinquish primitive.

It's so frustrating and weird that we're still talking about this
in 2015.

> The promise of nanosleep was to wake up if an interrupt arrived say
> on a socket (Mind I never actually confirmed this the case, complete
> hearsay...)

Right, nanosleep promises this and doesn't deliver on MacOS, so I
say forget it. pthread_cond_timedwait works as advertised on MacOS and
Linux (all distros).

Eliot writes:

> +1. What [John] said.

...except John admitted himself that he hadn't verified his
suggestion, and you both assumed for some reason that I didn't have the
same goals in mind.

> The problem with pthread_cond_timed_wait, or any other merely
> delaying call...

But pthread_cond_timedwait is *not* a "merely delaying call". It
does exactly what we want (wait until *either* a condition is met or a
timeout elapses), and it actually works, and the code is the same across
POSIX platforms.

What you go on to say is based on a false premise.

> ...is that, unless all file descriptors have been set up to send
> signals on read/writability and unless the blocking call is
> interruptible, the call may block for as long as it is asked, not
> until that or the read/writeability of the file descriptor.

In the scheme I described above, we can do what we need without
using formal Unix signals at all (happily avoiding that whole can of
worms). The notion of interruptible blocking calls is a red herring
generally. All the blocking calls in Flow happen in host threads which
are decoupled from any function call a Smalltalk primitive would make.

> IMO a better solution here is to a) use epoll or its equivalent
> kqueue; these are like select but the state of which selectors to
> examine is kept in kernel space, so the set-up overhead is vastly
> reduced, and b) wait for no longer than the next scheduled delay if
> one is in progres.

I claim they are not better solutions, because they don't work for
all kinds of external resources (e.g., MIDI ports). Also, I found that
"waiting for no longer than the next scheduled delay" is often still far
too long, when there is external resource activity before that time comes.

> Of course, the VM can do both of these things, and then there's no
> need for a background [Smalltalk] process at all. Instead, when the
> VM scheduler finds there's nothing to run it calls epoll or kqueue
> with either an infinite timeout (if no delay is in progress) or the
> time until the next delay expiration.

This would still leave us with poor performance when using new
kinds of external resources that don't use selectors. (That is, the
external resource access would perform poorly; I'm sure the main virtual
machine would scream right along, blissfully oblivious to it all. :)

> It strikes me that the VM can have a flag that makes it behave like
> this so that e.g. some time in the Spur release cycle we can set the
> flag, nuke the background process and get on with our lives.

If the only external resources in our lives were selector-using
ones, I might agree.

thanks,

-C

[1] http://linux.die.net/man/3/pthread_cond_timedwait
[2] https://github.com/ccrraaiigg/flow/blob/master/flow.c
[3] https://github.com/ccrraaiigg/flow/blob/master/ip.c

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Ben Coman

re: Reducing the activity of the image

On Fri, Mar 6, 2015 at 10:47 PM, Craig Latta <[hidden email]> wrote:

Hi all--

Apologies, my newsreader's thread database got trashed, and I
missed the responses to my previous message until now.

John McIntosh writes:

> Craig so how does using pthread_cond_timedwait affect socket
> processing?

It makes it actually work well. :) This was the whole point of
using pthread_cond_timedwait. Please read the manpage at [1]. It waits
until either a condition is met (hence the "cond") or a timeout elapses.

In the Flow virtual machine plugin, I have a
synchronizedSignalSemaphoreWithIndex function that calls the usual
signalSemaphoreWithIndex provided by the virtual machine, and also sets
the activity condition that the relinquish primitive cares about. The
host threads which service external I/O requests from primitives use
synchronizedSignalSemaphoreWithIndex when signalling the semaphores on
which Smalltalk-level code is waiting. This includes not only the
semaphores for reading and writing sockets, but also those for
activities with other external resources entirely, like MIDI ports.

So you get a generalized scheme which is not tied to the arcana of
any particular kind of external resource, and it works the same way on
any platform which supports the POSIX API (which now is all the Unix-ish
ones). This has seemed the obvious way to go for over ten years now.

Until I implemented this scheme, remote messaging throughput (and
MIDI throughput) was horrible. Believe me, I tried all the other schemes
that everyone has mentioned in the Squeak community and its descendants
since 1996, and none of them were anything better than deeply embarrassing.

From the Flow plugin, check out flow.c[2], which implements
synchronizedSignalSemaphoreWithIndex, the activity condition, and the
relinquish primitive, and ip.c[3] which creates host threads to do
background work for external resource primitives and uses
synchronizedSignalSemaphoreWithIndex to coordinate with the
Smalltalk-level code and the relinquish primitive.

It's so frustrating and weird that we're still talking about this
in 2015.

> The promise of nanosleep was to wake up if an interrupt arrived say
> on a socket (Mind I never actually confirmed this the case, complete
> hearsay...)

Right, nanosleep promises this and doesn't deliver on MacOS, so I
say forget it. pthread_cond_timedwait works as advertised on MacOS and
Linux (all distros).

Eliot writes:

> +1. What [John] said.

...except John admitted himself that he hadn't verified his
suggestion, and you both assumed for some reason that I didn't have the
same goals in mind.

> The problem with pthread_cond_timed_wait, or any other merely
> delaying call...

But pthread_cond_timedwait is *not* a "merely delaying call". It
does exactly what we want (wait until *either* a condition is met or a
timeout elapses), and it actually works, and the code is the same across
POSIX platforms.

What you go on to say is based on a false premise.

> ...is that, unless all file descriptors have been set up to send
> signals on read/writability and unless the blocking call is
> interruptible, the call may block for as long as it is asked, not
> until that or the read/writeability of the file descriptor.

In the scheme I described above, we can do what we need without
using formal Unix signals at all (happily avoiding that whole can of
worms). The notion of interruptible blocking calls is a red herring
generally. All the blocking calls in Flow happen in host threads which
are decoupled from any function call a Smalltalk primitive would make.

> IMO a better solution here is to a) use epoll or its equivalent
> kqueue; these are like select but the state of which selectors to
> examine is kept in kernel space, so the set-up overhead is vastly
> reduced, and b) wait for no longer than the next scheduled delay if
> one is in progres.

I claim they are not better solutions, because they don't work for
all kinds of external resources (e.g., MIDI ports). Also, I found that
"waiting for no longer than the next scheduled delay" is often still far
too long, when there is external resource activity before that time comes.

> Of course, the VM can do both of these things, and then there's no
> need for a background [Smalltalk] process at all. Instead, when the
> VM scheduler finds there's nothing to run it calls epoll or kqueue
> with either an infinite timeout (if no delay is in progress) or the
> time until the next delay expiration.

This would still leave us with poor performance when using new
kinds of external resources that don't use selectors. (That is, the
external resource access would perform poorly; I'm sure the main virtual
machine would scream right along, blissfully oblivious to it all. :)

> It strikes me that the VM can have a flag that makes it behave like
> this so that e.g. some time in the Spur release cycle we can set the
> flag, nuke the background process and get on with our lives.

If the only external resources in our lives were selector-using
ones, I might agree.

thanks,

-C

[1] http://linux.die.net/man/3/pthread_cond_timedwait
[2] https://github.com/ccrraaiigg/flow/blob/master/flow.c
[3] https://github.com/ccrraaiigg/flow/blob/master/ip.c

(Sorry for this late response. I discovered it sitting in my Draft folder.)

Finding this an interesting topic, I googled around to learn more and bumped into a few things maybe of random interest for some.

* Condition variables performance of boost, Win32, and the C++11 standard library
https://codesequoia.wordpress.com/2013/03/27/condition-variables-performance-of-boost-win32-and-the-c11-standard-library/

* pthread_cond_timedwait behaving differently on different platforms
http://blogs.msdn.com/b/cellfish/archive/2009/09/01/pthread-cond-timedwait-behaving-differently-on-different-platforms.aspx

* pthread-win32 pthread_cond_timedwait is SLOW?
http://comp.programming.threads.narkive.com/fZU5gh0K/pthread-win32-pthread-cond-timedwait-is-slow

* Fast Event Processing in SDL (since Pharo is getting SDL)

http://gameprogrammer.com/fastevents/fastevents1.html

cheers -ben