Smalltalk › Pharo › Pharo Smalltalk Users

Performance Testing Tools

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

Evan Donahue

Performance Testing Tools

Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan

Luke Gorrie

Re: Performance Testing Tools

Hi Evan,

I am also really interesting in this topic and have been doing a bunch of work on automating statistical benchmarks. I don't have a background in statistics or formal QA but I am learning as I go along :).

The tools I'm building are outside Smalltalk. Our full performance test suite takes about a week of machine time to run because tests ~15,000 QEMU VMs with different software versions / configurations / workloads. There is a CI server that runs all those tests, getting pretty fast turnarounds by distributing across a cluster of servers and reusing results from unmodified software branches, and spits out a CSV with one row per test result (giving the benchmark score and the parameters of the test.)

Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown to make a report on the distribution of results and then manually inspect that to check for interesting differences. I lump all of the different configurations in together and treat them as one population at the moment. Here is an example report:

https://hydra.snabb.co/build/1604171/download/2/report.html

It's a bit primitive but it is getting the job done for release engineering. I'm reasonably confident that new software releases don't break or slow down in obscure configurations. We are building network equipment and performance regressions are generally not acceptable.

I'm looking into more clever ways to automatically interpret the results, e.g. fumbling around at https://stats.stackexchange.com/questions/288416/non-parametric-test-if-two-samples-are-drawn-from-the-same-distribution.

Could relate to your ambitions somehow?

On 19 July 2017 at 02:00, Evan Donahue <[hidden email]> wrote:

Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan

Mariano Martinez Peck

Re: Performance Testing Tools

The ones I remember are Smark [1] and CalipeL [2]

Cheers,

[1] http://www.smalltalkhub.com/#!/~StefanMarr/SMark

[2] https://bitbucket.org/janvrany/jv-calipel

On Wed, Jul 19, 2017 at 4:17 AM, Luke Gorrie <[hidden email]> wrote:

Hi Evan,

I am also really interesting in this topic and have been doing a bunch of work on automating statistical benchmarks. I don't have a background in statistics or formal QA but I am learning as I go along :).

The tools I'm building are outside Smalltalk. Our full performance test suite takes about a week of machine time to run because tests ~15,000 QEMU VMs with different software versions / configurations / workloads. There is a CI server that runs all those tests, getting pretty fast turnarounds by distributing across a cluster of servers and reusing results from unmodified software branches, and spits out a CSV with one row per test result (giving the benchmark score and the parameters of the test.)

Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown to make a report on the distribution of results and then manually inspect that to check for interesting differences. I lump all of the different configurations in together and treat them as one population at the moment. Here is an example report:
https://hydra.snabb.co/build/1604171/download/2/report.html

It's a bit primitive but it is getting the job done for release engineering. I'm reasonably confident that new software releases don't break or slow down in obscure configurations. We are building network equipment and performance regressions are generally not acceptable.

I'm looking into more clever ways to automatically interpret the results, e.g. fumbling around at https://stats.stackexchange.com/questions/288416/non-parametric-test-if-two-samples-are-drawn-from-the-same-distribution.

Could relate to your ambitions somehow?

On 19 July 2017 at 02:00, Evan Donahue <[hidden email]> wrote:
Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan

Mariano
http://marianopeck.wordpress.com