Suspend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Suspend

Michael Rueger-6
 
Hi all,

Bert and I were discussing the implications of power usage (or rather
non usage) of Squeak on the OLPC.

IIRC Squeak still does some polling (event tickler?).
This has also caused quite some pain for people running Squeak on a server:
Squeak will always stay in the working set using a few percent CPU
(=power!) constantly, even if it effectively has been idle for a long time.

What would it take to change the VM and Squeak to make it truly event
driven?

Michael


Reply | Threaded
Open this post in threaded view
|

Re: Suspend

Ian Piumarta
 
Mike,

There are two things to investigate.  The first is running with '-
notimer' to see if it's the millisecond clock interrupts that are  
keeping you at a few %.

The second may be much more complicated than the following suggests  
but here are some vague clues for you as I understand them.

> IIRC Squeak still does some polling (event tickler?).

While it's 'polling' the support code is happy to go to sleep (in  
select()) for as long as the image tells it to (or until something  
happens on a file descriptor: network, display, etc.).  The CPU  
should be pinned at 0.0% when idle (with no loss of UI or network  
reactivity) but not, I suspect, with the image/Interpreter behaving  
the way it does.

> This has also caused quite some pain for people running Squeak on a  
> server:
> Squeak will always stay in the working set using a few percent CPU  
> (=power!) constantly, even if it effectively has been idle for a  
> long time.

This started many years ago when something surreptitiously changed in  
the image and/or Interpreter to cause relinquishProcessor to be  
called with very small arguments (around the millisecond mark).  This  
is undoubtedly essential for good performance on some platform,  
somewhere, but on Unix it is a disaster; there is no portable way to  
sleep for such little time while also responding to changes on  
descriptors in a timely (read: immediate) fashion.  Depending on the  
make and model of your kernel, a sub-timeslice timeout in select() is  
either rounded up (maybe implicitly by the process scheduler) to an  
unpredictable (but almost always large) fraction a timeslice, or it  
is quantized down to zero.  The first causes the famous Delay  
inaccuracies, the second causes the famous 100% CPU usage.  That's  
the reason for the byzantine checks and adjustments of the timeout  
argument that someone commented on a few weeks ago.

> What would it take to change the VM and Squeak to make it truly  
> event driven?

First try the notimer thing.  If that doesn't work, try multiplying  
the argument to ioRelinquishProcessor by 100.  If that doesn't work,  
we have to resort to science and engineering: profile the VM and find  
out empirically where it spends its time while sitting idle at 2% for  
an hour or two.

HTH,
Ian



Reply | Threaded
Open this post in threaded view
|

Re: Suspend

johnmci
 

On Jul 18, 2007, at 10:01 AM, Ian Piumarta wrote:

> This started many years ago when something surreptitiously changed  
> in the image and/or Interpreter to cause relinquishProcessor to be  
> called with very small arguments (around the millisecond mark).  
> This is undoubtedly essential for good performance on some  
> platform, somewhere, but on Unix it is a disaster; there is no  
> portable way to sleep for such little time while also responding to  
> changes on descriptors in a timely (read: immediate) fashion.  
> Depending on the make and model of your kernel, a sub-timeslice  
> timeout in select() is either rounded up (maybe implicitly by the  
> process scheduler) to an unpredictable (but almost always large)  
> fraction a timeslice, or it is quantized down to zero.  The first  
> causes the famous Delay inaccuracies, the second causes the famous  
> 100% CPU usage.  That's the reason for the byzantine checks and  
> adjustments of the timeout argument that someone commented on a few  
> weeks ago.

Ok well let me give a bit of history here. The call as coded by John  
Maloney in 97 says

idleProcess
        "A default background process which is invisible."

        [true] whileTrue:
                [self relinquishProcessorForMicroseconds: 1000]

It had occurred to me that we should be able to sleep upto the next  
wakeup tick if you ignored the issue of incoming interrupts. Since  
incoming interrupts would terminate the sleep this is not an issue.

In the late 90's I changed the logic here to go to the Delay class  
and calculate where the next wakeup tick was to provide a different  
value than 1000.  This was pushed out into the update stream and  
lasted about an hour when Scott Wallace found out the hard way by  
toasting his day's work that on restarting an image and if everything  
was correct you would enter a deadly embrace in the Delay logic which  
would make the idleProcess unrunnable and the process schedular quits  
because no process is runable.

In re-evaluating this I pushed the logic into the VM, so in the Mac  
VM I have

    setInterruptCheckCounter(0);
     now = (ioMSecs() & MillisecondClockMask);
     if (getNextWakeupTick() <= now)
         if (getNextWakeupTick() == 0)
             realTimeToWait = 16;    <==========  could be higher I  
guess, likely it's never zero tho, actually I doubt getNextWakeupTick
() is ever zero.
         else {
             return 0;
     }
     else
         realTimeToWait = getNextWakeupTick() - now;

        aioSleep(realTimeToWait*1000);


At some point in the past the UNIX code read
sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
   int nwt= getNextWakeupTick();
   int ms=  0;

   if (nwt)
     {
       int now= (ioMSecs() & 0x1fffffff);
       ms= ((nwt <= now) ? (1000/60) : nwt - now);
     }

   if (ms < (1000/60)) /* < 1 timeslice? */
     {
#    if defined(__MACH__) /* can sleep with 1ms resolution */
       if (!aioPoll(0))
        {
          struct timespec rqtp= { 0, ms * 1000*1000 };
          struct timespec rmtp;
          while ((nanosleep(&rqtp, &rmtp) < 0) && (errno == EINTR))
            rqtp= rmtp;
        }
#    endif
       ms= 0; /* poll but don't block */
     }
   dpy->ioRelinquishProcessorForMicroseconds(ms*1000);
   setInterruptCheckCounter(0);
   return 0;
}


But currently it reads this below which takes the bogus 1000  
microsecond value, thus waking up more often and not really sleeping  
much.

I'm not sure why you can't again do the getNextWakeupTick()  
calcuation, or was there some other problem that was being hidden here?
Perhaps the Unix system wouldn't properly service sleep times < 100  
ms? Could it be a startup parm to turn on or off the logic?

Current Unix Code:

sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
   int now;
   dpy->ioRelinquishProcessorForMicroseconds(us);
   now= ioLowResMSecs();
   if (now - lastInterruptCheck > (1000/25)) /* avoid thrashing intr  
checks from 1ms loop in idle proc  */
     {
       setInterruptCheckCounter(-1000); /* ensure timely poll for  
semaphore activity */
       lastInterruptCheck= now;
     }
   return 0;
}

X11 display has

static sqInt display_ioRelinquishProcessorForMicroseconds(sqInt  
microSeconds)
{
   aioSleep(handleEvents() ? 0 : microSeconds);
   return 0;
}


/* sleep for microSeconds or until i/o becomes possible, avoiding
    sleeping in select() is timeout too small */

int aioSleep(int microSeconds)
{
#if defined(HAVE_NANOSLEEP)
   if (microSeconds < (1000000/60)) /* < 1 timeslice? */
     {
       if (!aioPoll(0))
        {
          struct timespec rqtp= { 0, microSeconds * 1000 };
          struct timespec rmtp;
          nanosleep(&rqtp, &rmtp);
          microSeconds= 0; /* poll but don't block */
        }
     }
#endif
   return aioPoll(microSeconds);
}





>
>> What would it take to change the VM and Squeak to make it truly  
>> event driven?
>
> First try the notimer thing.  If that doesn't work, try multiplying  
> the argument to ioRelinquishProcessor by 100.  If that doesn't  
> work, we have to resort to science and engineering: profile the VM  
> and find out empirically where it spends its time while sitting  
> idle at 2% for an hour or two.
>
> HTH,
> Ian
>
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: Suspend

timrowledge
 
While the subject is in the air, perhaps it would be worth  
considering what the ideal situation would be.

For example, I would suggest we might drop the idea of a suspend-for-
a-fixed-time and have a suspend until next alarm. That would be  
called by the background process when nothing else wants time and  
would suspend the vm until the time specified by the timer queue;  
perhaps some emergency backstop value might be used in case some  
twerp gets us into a state with no timer active and no Process doing  
anything. That ought to remove the problem of looping on a tiny  
delay, though if certain OSs can't sleep for very short periods I  
have no idea how they can do much at that point.

OSs that have suitable asynchronous or interrupt driven input stuff  
can trigger the return from the suspend at need. OSs that don't (the  
only one I know of is RISC OS right now and I'll almost certainly be  
retiring from maintaining that soon) can probably set some vm-
internal polling.

I'm sure it would require some changes in the image to make this work  
properly but so what. Older images can run on older vms.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Shift to the left!  Shift to the right!  Pop up, push down, byte,  
byte, byte!


Reply | Threaded
Open this post in threaded view
|

Re: Suspend

johnmci
 
That is what the getNextWakeupTick() does, it returns the time the vm  
has to wakeup to service the next delay since at that point all  
processes
are:

(a) waiting on a semaphore
(b) waiting on a Delay to terminate in the future which is found via  
getNextWakeupTick()


On Jul 18, 2007, at 12:15 PM, tim Rowledge wrote:

> While the subject is in the air, perhaps it would be worth  
> considering what the ideal situation would be.
>
> For example, I would suggest we might drop the idea of a suspend-
> for-a-fixed-time and have a suspend until next alarm. That would be  
> called by the background process when nothing else wants time and  
> would suspend the vm until the time specified by the timer queue;  
> perhaps some emergency backstop value might be used in case some  
> twerp gets us into a state with no timer active and no Process  
> doing anything. That ought to remove the problem of looping on a  
> tiny delay, though if certain OSs can't sleep for very short  
> periods I have no idea how they can do much at that point.
>
> OSs that have suitable asynchronous or interrupt driven input stuff  
> can trigger the return from the suspend at need. OSs that don't  
> (the only one I know of is RISC OS right now and I'll almost  
> certainly be retiring from maintaining that soon) can probably set  
> some vm-internal polling.
>
> I'm sure it would require some changes in the image to make this  
> work properly but so what. Older images can run on older vms.
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Shift to the left!  Shift to the right!  Pop up, push down, byte,  
> byte, byte!
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: Suspend

timrowledge
 

On 18-Jul-07, at 18-Jul;12:27 PM, John M McIntosh wrote:

> That is what the getNextWakeupTick() does, it returns the time the  
> vm has to wakeup to service the next delay since at that point all  
> processes
> are:
>
> (a) waiting on a semaphore
> (b) waiting on a Delay to terminate in the future which is found  
> via getNextWakeupTick()
Well yes; so perhaps one might change the relinquish so that it no  
longer effectively adds a fake wakeuptick.

Whatever the details let's see if we can work out a good mechanism so  
that then there is nothing to do, nothing is done, and when stuff  
needs doing it gets done promptly. I'm sure there will be some  
platforms that can do it beautifully and others that have to fudge it  
but that's life.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BNE: Buy Non-IBM Equipment


Reply | Threaded
Open this post in threaded view
|

Re: Suspend

johnmci
 

On Jul 18, 2007, at 1:09 PM, tim Rowledge wrote:

>
> On 18-Jul-07, at 18-Jul;12:27 PM, John M McIntosh wrote:
>
>> That is what the getNextWakeupTick() does, it returns the time the  
>> vm has to wakeup to service the next delay since at that point all  
>> processes
>> are:
>>
>> (a) waiting on a semaphore
>> (b) waiting on a Delay to terminate in the future which is found  
>> via getNextWakeupTick()
> Well yes; so perhaps one might change the relinquish so that it no  
> longer effectively adds a fake wakeuptick.


I'm not sure what you mean by adding a fake wakeup tick?

>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===