A gprof listing of memory access functions in the interpreter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A gprof listing of memory access functions in the interpreter

David T. Lewis
 
I don't know if this is of any general interest, but attached is a
gprof output listing of a Squeak VM with the memory access routines
from sqMemoryAccess.h recoded in Slang.

I now have the Slang inlining working so that it can fully inline all
of these methods. With the Slang inlining activiated, performance is
essentially identical to that of the normal macros in sqMemoryAccess.h.
By turning the Slang inlining off, the functions are all called
individually, which is what I used to generated the attached profile.

The host is 64-bit Linux AMD. The profile was run by opening a largish
image, running "0 tinyBenchmarks" a half dozen times, and exiting the
image without saving.

So far, the advantages of putting the memory access routines into
the image as Slang seem to be:
- You can step into the methods in a debugger
- The methods can be profiled
- Exposes type declaration problems previously hidden by the macros

Is anyone interested in this?

Dave


gprof.out.gz (49K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A gprof listing of memory access functions in the interpreter

Andrew Gaylard
 
On Sun, Jul 20, 2008 at 11:49 PM, David T. Lewis <[hidden email]> wrote:
 
I don't know if this is of any general interest, but attached is a
gprof output listing of a Squeak VM with the memory access routines
from sqMemoryAccess.h recoded in Slang.

I now have the Slang inlining working so that it can fully inline all
of these methods. With the Slang inlining activiated, performance is
essentially identical to that of the normal macros in sqMemoryAccess.h.
By turning the Slang inlining off, the functions are all called
individually, which is what I used to generated the attached profile.

The host is 64-bit Linux AMD. The profile was run by opening a largish
image, running "0 tinyBenchmarks" a half dozen times, and exiting the
image without saving.

So far, the advantages of putting the memory access routines into
the image as Slang seem to be:
- You can step into the methods in a debugger
- The methods can be profiled
- Exposes type declaration problems previously hidden by the macros

Is anyone interested in this?

Dave

Yeah, I'm interested in this.  I'm always interested in speeding up the interpreter.

What I find interesting is that there's no particular hot spot in the profile. I find
that profiling typically reveals a single function taking up 50+% of the runtime,
but that isn't the case here.  There's a fairly even tail.

Even if we *doubled* the speed of interpret() and pointerForOop(), we'd still
only gain a 20% speed-up.

So there's isn't any easy improvement, unless I misunderstand the data
(which is quite possible!).  To gain significant speed-ups, we'd have to make
hundreds of micro-optimisations throughout the code-base, which would
probably complicate the code (which is pretty clean at the moment).

Andrew