Performance Testing Tools

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance Testing Tools

Evan Donahue
Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan
Reply | Threaded
Open this post in threaded view
|

Re: Performance Testing Tools

Luke Gorrie
Hi Evan,

I am also really interesting in this topic and have been doing a bunch of work on automating statistical benchmarks. I don't have a background in statistics or formal QA but I am learning as I go along :).

The tools I'm building are outside Smalltalk. Our full performance test suite takes about a week of machine time to run because tests ~15,000 QEMU VMs with different software versions / configurations / workloads. There is a CI server that runs all those tests, getting pretty fast turnarounds by distributing across a cluster of servers and reusing results from unmodified software branches, and spits out a CSV with one row per test result (giving the benchmark score and the parameters of the test.)

Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown to make a report on the distribution of results and then manually inspect that to check for interesting differences. I lump all of the different configurations in together and treat them as one population at the moment. Here is an example report:

It's a bit primitive but it is getting the job done for release engineering. I'm reasonably confident that new software releases don't break or slow down in obscure configurations. We are building network equipment and performance regressions are generally not acceptable.

I'm looking into more clever ways to automatically interpret the results, e.g. fumbling around at https://stats.stackexchange.com/questions/288416/non-parametric-test-if-two-samples-are-drawn-from-the-same-distribution.

Could relate to your ambitions somehow?


On 19 July 2017 at 02:00, Evan Donahue <[hidden email]> wrote:
Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan

Reply | Threaded
Open this post in threaded view
|

Re: Performance Testing Tools

Mariano Martinez Peck
The ones I remember are Smark [1] and CalipeL [2]

Cheers, 


On Wed, Jul 19, 2017 at 4:17 AM, Luke Gorrie <[hidden email]> wrote:
Hi Evan,

I am also really interesting in this topic and have been doing a bunch of work on automating statistical benchmarks. I don't have a background in statistics or formal QA but I am learning as I go along :).

The tools I'm building are outside Smalltalk. Our full performance test suite takes about a week of machine time to run because tests ~15,000 QEMU VMs with different software versions / configurations / workloads. There is a CI server that runs all those tests, getting pretty fast turnarounds by distributing across a cluster of servers and reusing results from unmodified software branches, and spits out a CSV with one row per test result (giving the benchmark score and the parameters of the test.)

Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown to make a report on the distribution of results and then manually inspect that to check for interesting differences. I lump all of the different configurations in together and treat them as one population at the moment. Here is an example report:

It's a bit primitive but it is getting the job done for release engineering. I'm reasonably confident that new software releases don't break or slow down in obscure configurations. We are building network equipment and performance regressions are generally not acceptable.

I'm looking into more clever ways to automatically interpret the results, e.g. fumbling around at https://stats.stackexchange.com/questions/288416/non-parametric-test-if-two-samples-are-drawn-from-the-same-distribution.

Could relate to your ambitions somehow?


On 19 July 2017 at 02:00, Evan Donahue <[hidden email]> wrote:
Hi,

I've been doing a lot of performance testing lately, and I've found myself wanting to upgrade my methods from ad hoc use of bench and message tally. Is there any kind of framework for like, statistically comparing improvements in performance benchmarks across different versions of code, or anything that generally helps manage the test-tweak-test loop? Just curious what's out there before I go writing something. Too many useful little libraries to keep track of!

Evan




--