Smalltalk › Squeak › Squeak VM

Suspend

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Michael Rueger-6

Suspend

Hi all,

Bert and I were discussing the implications of power usage (or rather
non usage) of Squeak on the OLPC.

IIRC Squeak still does some polling (event tickler?).
This has also caused quite some pain for people running Squeak on a server:
Squeak will always stay in the working set using a few percent CPU
(=power!) constantly, even if it effectively has been idle for a long time.

What would it take to change the VM and Squeak to make it truly event
driven?

Michael

Ian Piumarta

Re: Suspend

Mike,

There are two things to investigate. The first is running with '-
notimer' to see if it's the millisecond clock interrupts that are
keeping you at a few %.

The second may be much more complicated than the following suggests
but here are some vague clues for you as I understand them.

> IIRC Squeak still does some polling (event tickler?).

While it's 'polling' the support code is happy to go to sleep (in
select()) for as long as the image tells it to (or until something
happens on a file descriptor: network, display, etc.). The CPU
should be pinned at 0.0% when idle (with no loss of UI or network
reactivity) but not, I suspect, with the image/Interpreter behaving
the way it does.

> This has also caused quite some pain for people running Squeak on a
> server:
> Squeak will always stay in the working set using a few percent CPU
> (=power!) constantly, even if it effectively has been idle for a
> long time.

This started many years ago when something surreptitiously changed in
the image and/or Interpreter to cause relinquishProcessor to be
called with very small arguments (around the millisecond mark). This
is undoubtedly essential for good performance on some platform,
somewhere, but on Unix it is a disaster; there is no portable way to
sleep for such little time while also responding to changes on
descriptors in a timely (read: immediate) fashion. Depending on the
make and model of your kernel, a sub-timeslice timeout in select() is
either rounded up (maybe implicitly by the process scheduler) to an
unpredictable (but almost always large) fraction a timeslice, or it
is quantized down to zero. The first causes the famous Delay
inaccuracies, the second causes the famous 100% CPU usage. That's
the reason for the byzantine checks and adjustments of the timeout
argument that someone commented on a few weeks ago.

> What would it take to change the VM and Squeak to make it truly
> event driven?

First try the notimer thing. If that doesn't work, try multiplying
the argument to ioRelinquishProcessor by 100. If that doesn't work,
we have to resort to science and engineering: profile the VM and find
out empirically where it spends its time while sitting idle at 2% for
an hour or two.

HTH,
Ian

johnmci

Re: Suspend

On Jul 18, 2007, at 10:01 AM, Ian Piumarta wrote:

> This started many years ago when something surreptitiously changed
> in the image and/or Interpreter to cause relinquishProcessor to be
> called with very small arguments (around the millisecond mark).
> This is undoubtedly essential for good performance on some
> platform, somewhere, but on Unix it is a disaster; there is no
> portable way to sleep for such little time while also responding to
> changes on descriptors in a timely (read: immediate) fashion.
> Depending on the make and model of your kernel, a sub-timeslice
> timeout in select() is either rounded up (maybe implicitly by the
> process scheduler) to an unpredictable (but almost always large)
> fraction a timeslice, or it is quantized down to zero. The first
> causes the famous Delay inaccuracies, the second causes the famous
> 100% CPU usage. That's the reason for the byzantine checks and
> adjustments of the timeout argument that someone commented on a few
> weeks ago.

Ok well let me give a bit of history here. The call as coded by John
Maloney in 97 says

idleProcess
"A default background process which is invisible."

[true] whileTrue:
[self relinquishProcessorForMicroseconds: 1000]

It had occurred to me that we should be able to sleep upto the next
wakeup tick if you ignored the issue of incoming interrupts. Since
incoming interrupts would terminate the sleep this is not an issue.

In the late 90's I changed the logic here to go to the Delay class
and calculate where the next wakeup tick was to provide a different
value than 1000. This was pushed out into the update stream and
lasted about an hour when Scott Wallace found out the hard way by
toasting his day's work that on restarting an image and if everything
was correct you would enter a deadly embrace in the Delay logic which
would make the idleProcess unrunnable and the process schedular quits
because no process is runable.

In re-evaluating this I pushed the logic into the VM, so in the Mac
VM I have

setInterruptCheckCounter(0);
now = (ioMSecs() & MillisecondClockMask);
if (getNextWakeupTick() <= now)
if (getNextWakeupTick() == 0)
realTimeToWait = 16; <========== could be higher I
guess, likely it's never zero tho, actually I doubt getNextWakeupTick
() is ever zero.
else {
return 0;
}
else
realTimeToWait = getNextWakeupTick() - now;

aioSleep(realTimeToWait*1000);

At some point in the past the UNIX code read
sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
int nwt= getNextWakeupTick();
int ms= 0;

if (nwt)
{
int now= (ioMSecs() & 0x1fffffff);
ms= ((nwt <= now) ? (1000/60) : nwt - now);
}

if (ms < (1000/60)) /* < 1 timeslice? */
{
# if defined(__MACH__) /* can sleep with 1ms resolution */
if (!aioPoll(0))
{
struct timespec rqtp= { 0, ms * 1000*1000 };
struct timespec rmtp;
while ((nanosleep(&rqtp, &rmtp) < 0) && (errno == EINTR))
rqtp= rmtp;
}
# endif
ms= 0; /* poll but don't block */
}
dpy->ioRelinquishProcessorForMicroseconds(ms*1000);
setInterruptCheckCounter(0);
return 0;
}

But currently it reads this below which takes the bogus 1000
microsecond value, thus waking up more often and not really sleeping
much.

I'm not sure why you can't again do the getNextWakeupTick()
calcuation, or was there some other problem that was being hidden here?
Perhaps the Unix system wouldn't properly service sleep times < 100
ms? Could it be a startup parm to turn on or off the logic?

Current Unix Code:

sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
int now;
dpy->ioRelinquishProcessorForMicroseconds(us);
now= ioLowResMSecs();
if (now - lastInterruptCheck > (1000/25)) /* avoid thrashing intr
checks from 1ms loop in idle proc */
{
setInterruptCheckCounter(-1000); /* ensure timely poll for
semaphore activity */
lastInterruptCheck= now;
}
return 0;
}

X11 display has

static sqInt display_ioRelinquishProcessorForMicroseconds(sqInt
microSeconds)
{
aioSleep(handleEvents() ? 0 : microSeconds);
return 0;
}

/* sleep for microSeconds or until i/o becomes possible, avoiding
sleeping in select() is timeout too small */

int aioSleep(int microSeconds)
{
#if defined(HAVE_NANOSLEEP)
if (microSeconds < (1000000/60)) /* < 1 timeslice? */
{
if (!aioPoll(0))
{
struct timespec rqtp= { 0, microSeconds * 1000 };
struct timespec rmtp;
nanosleep(&rqtp, &rmtp);
microSeconds= 0; /* poll but don't block */
}
}
#endif
return aioPoll(microSeconds);
}

>
>> What would it take to change the VM and Squeak to make it truly
>> event driven?
>
> First try the notimer thing. If that doesn't work, try multiplying
> the argument to ioRelinquishProcessor by 100. If that doesn't
> work, we have to resort to science and engineering: profile the VM
> and find out empirically where it spends its time while sitting
> idle at 2% for an hour or two.
>
> HTH,
> Ian
>
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

timrowledge

Re: Suspend

While the subject is in the air, perhaps it would be worth
considering what the ideal situation would be.

For example, I would suggest we might drop the idea of a suspend-for-
a-fixed-time and have a suspend until next alarm. That would be
called by the background process when nothing else wants time and
would suspend the vm until the time specified by the timer queue;
perhaps some emergency backstop value might be used in case some
twerp gets us into a state with no timer active and no Process doing
anything. That ought to remove the problem of looping on a tiny
delay, though if certain OSs can't sleep for very short periods I
have no idea how they can do much at that point.

OSs that have suitable asynchronous or interrupt driven input stuff
can trigger the return from the suspend at need. OSs that don't (the
only one I know of is RISC OS right now and I'll almost certainly be
retiring from maintaining that soon) can probably set some vm-
internal polling.

I'm sure it would require some changes in the image to make this work
properly but so what. Older images can run on older vms.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Shift to the left! Shift to the right! Pop up, push down, byte,
byte, byte!

johnmci

Re: Suspend

That is what the getNextWakeupTick() does, it returns the time the vm
has to wakeup to service the next delay since at that point all
processes
are:

(a) waiting on a semaphore
(b) waiting on a Delay to terminate in the future which is found via
getNextWakeupTick()

On Jul 18, 2007, at 12:15 PM, tim Rowledge wrote:

> While the subject is in the air, perhaps it would be worth
> considering what the ideal situation would be.
>
> For example, I would suggest we might drop the idea of a suspend-
> for-a-fixed-time and have a suspend until next alarm. That would be
> called by the background process when nothing else wants time and
> would suspend the vm until the time specified by the timer queue;
> perhaps some emergency backstop value might be used in case some
> twerp gets us into a state with no timer active and no Process
> doing anything. That ought to remove the problem of looping on a
> tiny delay, though if certain OSs can't sleep for very short
> periods I have no idea how they can do much at that point.
>
> OSs that have suitable asynchronous or interrupt driven input stuff
> can trigger the return from the suspend at need. OSs that don't
> (the only one I know of is RISC OS right now and I'll almost
> certainly be retiring from maintaining that soon) can probably set
> some vm-internal polling.
>
> I'm sure it would require some changes in the image to make this
> work properly but so what. Older images can run on older vms.
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Shift to the left! Shift to the right! Pop up, push down, byte,
> byte, byte!
>
>

timrowledge

Re: Suspend

On 18-Jul-07, at 18-Jul;12:27 PM, John M McIntosh wrote:

> That is what the getNextWakeupTick() does, it returns the time the
> vm has to wakeup to service the next delay since at that point all
> processes
> are:
>
> (a) waiting on a semaphore
> (b) waiting on a Delay to terminate in the future which is found
> via getNextWakeupTick()
Well yes; so perhaps one might change the relinquish so that it no
longer effectively adds a fake wakeuptick.

Whatever the details let's see if we can work out a good mechanism so
that then there is nothing to do, nothing is done, and when stuff
needs doing it gets done promptly. I'm sure there will be some
platforms that can do it beautifully and others that have to fudge it
but that's life.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BNE: Buy Non-IBM Equipment

johnmci

Re: Suspend

On Jul 18, 2007, at 1:09 PM, tim Rowledge wrote:

>
> On 18-Jul-07, at 18-Jul;12:27 PM, John M McIntosh wrote:
>
>> That is what the getNextWakeupTick() does, it returns the time the
>> vm has to wakeup to service the next delay since at that point all
>> processes
>> are:
>>
>> (a) waiting on a semaphore
>> (b) waiting on a Delay to terminate in the future which is found
>> via getNextWakeupTick()
> Well yes; so perhaps one might change the relinquish so that it no
> longer effectively adds a fake wakeuptick.

I'm not sure what you mean by adding a fake wakeup tick?

>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===