Profiling and delay accuracy

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Profiling and delay accuracy

Andreas.Raab
 
Hi Folks -

For a long time now I had this feeling that MessageTally reports pretty
bogus results in many situations without being able to precisely track
down why this would be the case. Today I was looking at
primitiveResponse() and, much to my surprise, noticed that the
checkForInterrupts() that used to be there for years is gone (see
Squeak3.6 for example which still has it). The sole reason for this
check was to get both high delay accuracy and good profiling behavior.
Consider this example:

fc := FormCanvas on: Display.
MessageTally spyOn:[
        1 to: 100 do:[:i|
                fc fillColor: Color white.
                fc printString.
        ].
].

Using 3.6 this will be measured correctly as:

**Tree**
87.5% {243ms} FormCanvas>>fillColor:
   |87.5% {243ms} GrafPort>>fillRect:offset:
   |  87.5% {243ms} GrafPort>>copyBits
12.5% {35ms} FormCanvas(Object)>>printString
   12.5% {35ms} FormCanvas(Object)>>printStringLimitedTo:

whereas the 3.10 reports it as

**Tree**
99.6% {670ms} FormCanvas(Object)>>printString
   94.9% {639ms} primitives
   4.6% {31ms} FormCanvas(Object)>>printStringLimitedTo:

I think you can see the problem and just imagine what this does to any
cumulative results. I also wonder what that change does to platforms
that don't have high-resolution asynchronous timers. This used to be the
one place where we would catch timer changes at high resolution and
unless the platform forces high resolution timer interrupts
asynchronously (by resetting the interrupt counter) we must be loosing a
lot in terms of delay accuracy. (it just so happens that the Windows VM
*does* do asynchronous timer interrupts so I never noticed this but I'm
very curious about the difference on Linux which typically still has
pretty bad timer granularity).

Any opinions? My feeling is that the timer check should go back in for
all of the above reasons. At the very least a quickCheckForInterrupt()
is required but I actually think putting the full check back in is worth it.

Cheers,
   - Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Profiling and delay accuracy

timrowledge
 
The last big change in timer related stuff that I can recall was  
about 3 years ago when I removed the extraordinarily expensive pre-  
and post- primitive call timer checking and reworked many prims to  
explicitly do timer checks if we thought they were plausibly long-
running. It's when the messing around with the timer feedback stuff  
got done to avoid the catastrophic feedback runaway I discovered.

See the thread on 'primitive timing issues' starting march 16 2004  
and other threads around the same time such as 'User interrupts no  
longer interrupt locked UI process'

To quote from my VMMaker 3.7 beta announcement
"The cumbersome timing of primitives is removed from  
primitiveResponse since
most internal numbered prims are quick. Those that are not use
#forceInterruptCheck to require an ASAP checkForInterrupts. fIC is
essentially idempotent and can be called several times in a prim  
without harm.
I use this instead of simply setting the interruptCheckCounter to 0  
because
that seems to mess up the feedback mechanism. The external call prim
now has the primitive timing code since many named prims are quite long
operations; I'm still looking for a nice way to improve that. Maybe
changing the codegen to do something clever. I guess the ffi prim ought
to do something similar. Note that this change could be backed out if
the latter two points cause a problem for anyone.  Allowing plugin code
to call forceInterruptCheck requires a step in sqVirtualMachine - but
then Andreas' isArray code had already caused that. Internal code like
ioGetNextEvent(), ioShowDisplay() etc that takes a long time can freely
call fIC if it seems warranted. The evidence so far is that the timer
change is very worthwhile on RISC OS and mildly beneficial on other
platforms. It also allows the more direct usage of function pointers in
the prim table which will have other uses later. :-)"

Around about the same time we did a lot of work to get the delay  
accuracy reasonable and at least plausibly montonic on all platforms.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Fractured Idiom:- ICH LIEBE RICH - I'm really crazy about having dough


Reply | Threaded
Open this post in threaded view
|

Re: Profiling and delay accuracy

Andreas.Raab
 
Hi Tim -

tim Rowledge wrote:
> See the thread on 'primitive timing issues' starting march 16 2004 and
> other threads around the same time such as 'User interrupts no longer
> interrupt locked UI process'

Hm ... I don't really remember those discussions and unfortunately
Google comes up empty when looking for any of the following:

http://www.google.com/search?q=primitive+timing+issues
http://www.google.com/search?q=User+interrupts+no+longer+interrupt+locked+UI+process

Do you by any chance have a pointer to those discussions?

> To quote from my VMMaker 3.7 beta announcement
> "The cumbersome timing of primitives is removed from primitiveResponse
> since most internal numbered prims are quick. Those that are not use
> #forceInterruptCheck to require an ASAP checkForInterrupts.

With my profiling hat on this decision seems questionable. Even though
the quick primitives may be quick, once you start profiling you *really*
want them measured.

But more importantly, even if internal numbered primitves are quick,
shouldn't at least primitiveExternalCall and primitiveCalloutToFFI to
have timer checks?

Cheers,
   - Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Profiling and delay accuracy

David T. Lewis
In reply to this post by Andreas.Raab
 
I need to ask some guidance. The problem that Andreas describes below has
been left unresolved for an embarassingly long time, so I would like to
bring some closure to it. Unfortunately I'm not an expert here, so I need
to ask advice.

The problem is tracked on Mantis here:
  http://bugs.squeak.org/view.php?id=7653

And I think that this one is basically the same issue:
  http://bugs.squeak.org/view.php?id=7515

I have tried three different approaches to fixing this:

1) Re-apply the logic from Andreas' original version of primitiveResponse
(stamped 'ar 2/20/2001). I looked at the old 2001 version and merged that
timer check logic into the current primitiveResponse.

2) Add a call to checkForInterrupts in primitiveResponse. This is just adding
a 'self checkForInterrupts' before leaving the primitiveResponse method.

3) Add a call to quickCheckForInterrupts in primitiveResponse. Same as above,
but call quickCheckForInterrupts, which reduces the number of times that
checkForInterrupts gets called.

None of these changes seems to result in significant performance reductions
that I could measure. Option 2 (add primitiveResponse) seems to be most
beneficial in terms of TimeProfileBrowser (but I am uncertain of this).

My concern would be that I am testing with a slow interpreter and relatively
fast hardware, so the performance characteristics might be quite different
when running on a slow machine or when running under Cog.

Any advice would be welcome. If there are no objections or further comment,
I'll update the trunk VMMaker with the second option (add checkForInterrupts).

Thanks,
Dave

On Thu, Jun 14, 2007 at 09:21:59PM -0700, Andreas Raab wrote:

>
> Hi Folks -
>
> For a long time now I had this feeling that MessageTally reports pretty
> bogus results in many situations without being able to precisely track
> down why this would be the case. Today I was looking at
> primitiveResponse() and, much to my surprise, noticed that the
> checkForInterrupts() that used to be there for years is gone (see
> Squeak3.6 for example which still has it). The sole reason for this
> check was to get both high delay accuracy and good profiling behavior.
> Consider this example:
>
> fc := FormCanvas on: Display.
> MessageTally spyOn:[
> 1 to: 100 do:[:i|
> fc fillColor: Color white.
> fc printString.
> ].
> ].
>
> Using 3.6 this will be measured correctly as:
>
> **Tree**
> 87.5% {243ms} FormCanvas>>fillColor:
>   |87.5% {243ms} GrafPort>>fillRect:offset:
>   |  87.5% {243ms} GrafPort>>copyBits
> 12.5% {35ms} FormCanvas(Object)>>printString
>   12.5% {35ms} FormCanvas(Object)>>printStringLimitedTo:
>
> whereas the 3.10 reports it as
>
> **Tree**
> 99.6% {670ms} FormCanvas(Object)>>printString
>   94.9% {639ms} primitives
>   4.6% {31ms} FormCanvas(Object)>>printStringLimitedTo:
>
> I think you can see the problem and just imagine what this does to any
> cumulative results. I also wonder what that change does to platforms
> that don't have high-resolution asynchronous timers. This used to be the
> one place where we would catch timer changes at high resolution and
> unless the platform forces high resolution timer interrupts
> asynchronously (by resetting the interrupt counter) we must be loosing a
> lot in terms of delay accuracy. (it just so happens that the Windows VM
> *does* do asynchronous timer interrupts so I never noticed this but I'm
> very curious about the difference on Linux which typically still has
> pretty bad timer granularity).
>
> Any opinions? My feeling is that the timer check should go back in for
> all of the above reasons. At the very least a quickCheckForInterrupt()
> is required but I actually think putting the full check back in is worth it.
>
> Cheers,
>   - Andreas