CogVM Benchmarking

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

CogVM Benchmarking

Stefan Marr

Hi Eliot:

When exactly is the JIT compilation triggered in Cog?

I would like to provide a configuration for the SMark benchmarking framework that ensures that everything got jitted, without doing to much unnecessary warmup.

All the benchmark code is implemented in standard methods and executed very similarly to how SUnit works, thus, they are called like 'suite perform: aBenchmarkSelector'.


How many iterations should I run the code before I can be sure that it is jitted?
From your blog, I assume, it should be jitted after its second execution. (see snippet below)

And I supposed a 'Smalltalk garbageCollect' should not interfere with this, and can be safely performed before a timed run?
Is there anything else, I should cover in the framework? Any particular heuristics/mechanism that should be taken into account when trying to reach a stable state?


What I found on your blog is the following:
<<<
So a simple way to implement the interpret on single-use policy is to only compile to machine code when finding a method in the first-level method lookup cache. We avoid compiling large methods, which are typically initializers, and rarely performance-critical inner loops, by refusing to compile methods whose number of literals exceeds a limit, settable via the command line that defaults to 60 literals, which excludes a mere 0.3% of methods in my image.
>>>

Thanks a lot
Stefan




--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525

Reply | Threaded
Open this post in threaded view
|

Re: CogVM Benchmarking

Eliot Miranda-2
 


On Mon, May 16, 2011 at 8:17 AM, Stefan Marr <[hidden email]> wrote:

Hi Eliot:

When exactly is the JIT compilation triggered in Cog?

There are currently four triggers and one caveat.

1. when a method is found in the method lookup cache.  This translates to the second send of a message.  Of course there are caveats.  Cache collisions could cause one method to mask another etc.  But the method cache uses three probes to avoid collisions and is less used in the JIT because inline caches relieve cache pressure.

2. when two successive block evaluations are of blocks in the same method (i.e. the interpreted value prim remembers the method for the previous block evaluation and if the next block evaluation is to the same method it compiles the method).

3. evaluation of a method via withArgs:executeMethod: (i.e. doits).  Hence a doit (but not necessarily the methods it calls) will always run jitted (but see caveat).

4. the Nth evaluation of a loop in an interpreted method, where N defaults to 10.  This has a command-line argument to alter the value (-cogminjumps). The value should be tuned, e.g. to optimize start-up time.  But that's work that remains to be done.

The caveat is that no method will be jitted if it has too many  literals.  The literal count is used to avoid jitting very large methods.  The literal count defaults to 60.  It s a command-line switch to control it (-cogmaxlits).

So to be sure that you're measuring pure performance, and not any overhead you really need to measure the third evaluation of any benchmark. The first evaluation will load the method lookup caches.  The second will compile everything, but compilation and linking might introduce some overhead.  The third evaluation should be running at full speed.  But you might want to time the successive evaluations.  In my experience compilation and linking overhead is very low so I expect you'll be hard-pressed to see much difference between the second and third runs.

Of course, running the GC before each evaluation is necessary to sync the GC and avoid unequal GC activity in each run.

Tricky :)


I would like to provide a configuration for the SMark benchmarking framework that ensures that everything got jitted, without doing to much unnecessary warmup.

All the benchmark code is implemented in standard methods and executed very similarly to how SUnit works, thus, they are called like 'suite perform: aBenchmarkSelector'.


How many iterations should I run the code before I can be sure that it is jitted?
>From your blog, I assume, it should be jitted after its second execution. (see snippet below)

And I supposed a 'Smalltalk garbageCollect' should not interfere with this, and can be safely performed before a timed run?
Is there anything else, I should cover in the framework? Any particular heuristics/mechanism that should be taken into account when trying to reach a stable state?

Forcing finalization?  Again in my experience having a still machine is very important.  You'll see variations in timing caused by other activities on the machine (Time Machine, your mailer uploading mail from the server etc).  So this area is now quite difficult.  These are issues that Alexandre Bergel has articulated well and are good reasons for going with his method count approach.  Of course method counting doesn't apply to trying to profile specific activities in the VM.  For that you do need a traditrional profiler.  But for tuning Smalltalk applications Alexandre's approach seems to make most sense.

HTH
Eliot


What I found on your blog is the following:
<<<
So a simple way to implement the interpret on single-use policy is to only compile to machine code when finding a method in the first-level method lookup cache. We avoid compiling large methods, which are typically initializers, and rarely performance-critical inner loops, by refusing to compile methods whose number of literals exceeds a limit, settable via the command line that defaults to 60 literals, which excludes a mere 0.3% of methods in my image.
>>>

Thanks a lot
Stefan




--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525">+32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: CogVM Benchmarking

Stefan Marr

Hi Eliot:


On 16 May 2011, at 17:42, Eliot Miranda wrote:

> So to be sure that you're measuring pure performance, and not any overhead you really need to measure the third evaluation of any benchmark. The first evaluation will load the method lookup caches.  The second will compile everything, but compilation and linking might introduce some overhead.  The third evaluation should be running at full speed.  But you might want to time the successive evaluations.  In my experience compilation and linking overhead is very low so I expect you'll be hard-pressed to see much difference between the second and third runs.

Great, thanks. That's what I wanted to verify.


> Forcing finalization?  Again in my experience having a still machine is very important.  You'll see variations in timing caused by other activities on the machine (Time Machine, your mailer uploading mail from the server etc).
Sure, that's what the framework, a few iterations and some statistics will take care off.
And, a dedicated benchmark machine...

Now let's how many cores I need to overtake the CogVM ;)

Thanks
Stefan




--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525