Comparing benchmarks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Comparing benchmarks

Chris Muller-3
Benchmarking is ultimately about finding what performs best, and this invariably involves a lot of manual benching, recording, and comparing of different individual expressions.  I've been wanting to optimize this process for a long time, and today I finally took the time to spank out a first draft of this new feature in Chronology-Core-cmm.52.

Here's an example inspired by my discovery about creating Dictionary's with a pre-allocated size:

   { [Dictionary new].
   [Dictionary new: 4]} benchCompare     

answers an Array with corresponding measurements, along with an extra field at the beginning:

   {'[Dictionary new]'->'100% of baseline rate, 26,300,000 per second. 38.1 nanoseconds per run. 7.85843 % GC time.' . 
   '[Dictionary new: 4]'->'13% of baseline rate, 3,510,000 per second. 285 nanoseconds per run. 1.18 % GC time.'}"

The first one is the baseline, and so will always report "100%".  The subsequent ones are relative to that baseline.

Since this is a new feature, it won't make it into 5.3, but I would like put this into trunk after the release.  Please let me know your thoughts.

Best,
  Chris


Reply | Threaded
Open this post in threaded view
|

Re: Comparing benchmarks

Eliot Miranda-2
Hi Chris,

On Dec 17, 2019, at 4:36 PM, Chris Muller <[hidden email]> wrote:


Benchmarking is ultimately about finding what performs best, and this invariably involves a lot of manual benching, recording, and comparing of different individual expressions.  I've been wanting to optimize this process for a long time, and today I finally took the time to spank out a first draft of this new feature in Chronology-Core-cmm.52.

Here's an example inspired by my discovery about creating Dictionary's with a pre-allocated size:

   { [Dictionary new].
   [Dictionary new: 4]} benchCompare     

answers an Array with corresponding measurements, along with an extra field at the beginning:

   {'[Dictionary new]'->'100% of baseline rate, 26,300,000 per second. 38.1 nanoseconds per run. 7.85843 % GC time.' . 
   '[Dictionary new: 4]'->'13% of baseline rate, 3,510,000 per second. 285 nanoseconds per run. 1.18 % GC time.'}"

The first one is the baseline, and so will always report "100%".  The subsequent ones are relative to that baseline.

Since this is a new feature, it won't make it into 5.3, but I would like put this into trunk after the release.  Please let me know your thoughts.

The two key things with something like this above our current vm technology (especially now we’re moving to full blocks) are
- evaluate each block a few times (if it is short running) to ensure it has been jitted
- subtract the cost of a simple block.  For example in the above [Dictionary] is a useful null block that costs to evaluate the block and to indirect through the #Dictionary-> Dictionary association.

So if you changed your scheme to 

  { [Dictionary new].
   [Dictionary new: 4]} benchCompareWithNullCase: [Dictionary]

you could get more meaningful results.

There are other repeatability concerns; running the scavenger before each individual case is measured to ensure they run in a more fair way with an empty eden, or even running a full GC.  These are useful parameyerisations.  So eg

 { [Dictionary new].
   [Dictionary new: 4]}
        benchCompareWithNullCase: [Dictionary]
        beforeEachDo: [Smalltalk garbageCollectMost]

and automatically compiling a benchmark block with more repetitions, to avoid results getting swamped by block evaluation overhead (an issue for micro benchmarks).  So eg if any element of the initial array was an association from block to integer the method would construct a block with that many repetitions (which could be gone textually or by bytecode magic) and measure that instead.


Best,
  Chris