Hi, i'm having trouble doing benchmarks in Pharo, at least i don't understand the results (maybe i'm missing something). I want to compare anonymous subclasses lookup speed with standard pharo classes lookup speed. I have a method m and a method m2 that execute the following code: ^ 100 printString. I have implemented these 2 methods in one class then in one anonymous subclass. I thought that measuring the execution speed of the same code in these two different cases would give me different times if the lookup speed was not the same. So I did the following: |o results repetitions | o := theClass new. results := OrderedCollection new. repetitions := 100. repetitions timesRepeat:[ results add: [1000000 timesRepeat:[o m]] timeToRun]. I do the same for m2, and then i compute an average of all the values measured in results. What i don't understand is that, for example, for an average on 100 measurements, m is 1% slower and m2 is 2% faster in the Pharo class case than with anonymous subclasses. But for 1 000 measurements, m is 11% faster but m2 is 3% slower. Results continue to vary as i change the number of measurements, but they do not increase with it (seems not to be linear).
I don't have enough knowledge about how to benchmark code, or what will make a difference in Pharo. For now the only explanations i have is that maybe the results are too slow to be significant and then they can vary, or i have done something wrong in how i measure it.
How would you measure the lookup speed of a method ?
Steven. |
Steven The question you need to consider is how much the results vary from one repetition to another. If they are very variable, then the average will still be variable. In the case of your 100 repetitions, for instance, you can work out a confidence interval for the mean quite easily. Find the mean and standard deviation of the 100 repetitions; the standard error of the mean is the sample standard deviation divided by the square root of the number of values averaged (i.e. 10 in this case); the approximate 95% confidence interval is the mean +/- twice the standard error of the mean. (This makes a lot of simplifying assumptions, but should be sufficient for your purposes.) If, as I suspect, the width of the confidence interval is quite large in relation to the mean, this means that you cannot consistently measure the speed of operations like this. You could try greatly increasing the number of repetitions, but the square root law is working against you. 10,000 repetitions would give you an interval about 10% of the width of 100 repetitions, 1,000,000 repetitions would reduce it to 1%. Hope this is helpful Peter Kenny From: Pharo-users [mailto:[hidden email]] On Behalf Of Steven Costiou Hi, i'm having trouble doing benchmarks in Pharo, at least i don't understand the results (maybe i'm missing something). I want to compare anonymous subclasses lookup speed with standard pharo classes lookup speed. I have a method m and a method m2 that execute the following code: ^ 100 printString. I have implemented these 2 methods in one class then in one anonymous subclass. I thought that measuring the execution speed of the same code in these two different cases would give me different times if the lookup speed was not the same. So I did the following: |o results repetitions | o := theClass new. results := OrderedCollection new. repetitions := 100. repetitions timesRepeat:[ results add: [1000000 timesRepeat:[o m]] timeToRun]. I do the same for m2, and then i compute an average of all the values measured in results. What i don't understand is that, for example, for an average on 100 measurements, m is 1% slower and m2 is 2% faster in the Pharo class case than with anonymous subclasses. But for 1 000 measurements, m is 11% faster but m2 is 3% slower. Results continue to vary as i change the number of measurements, but they do not increase with it (seems not to be linear). I don't have enough knowledge about how to benchmark code, or what will make a difference in Pharo. For now the only explanations i have is that maybe the results are too slow to be significant and then they can vary, or i have done something wrong in how i measure it. How would you measure the lookup speed of a method ? Steven. |
Thanks Peter :) I will make these computations tomorrow at the lab and see what the confidence interval looks like. Steven. Le 2017-06-14 20:03, PBKResearch a écrit :
|
On 06/14/2017 09:12 PM, Steven Costiou
wrote:
repetitions timesRepeat:[ results add: [1000000 timesRepeat:[o m]] timeToRun]. Hi Steven, the influence of the garbage collector is in cases like these
often somewhat difficult to forecast. i would eventually change
the code to: repetitions timesRepeat:[ Smalltalk garbageCollect. results add:
[1000000 timesRepeat:[o m]] timeToRun]. and if i wanted to see the influence of the garbage collector i
would enlarge repetitions and diminish 1000000 in such a way that
the product remains constant. of course one would not want to
destroy that influence completely as in the real world it always
takes its toll. |
Hi, i added the garbage collect and the numbers are better ! There are fluctuations but no more "random-like" variations of the measured values. I computed the standard error which is, for all measures i did, inferior to 0.5% of the mean (then the confidence interval is good).
Thank you both :)
Steven.
Le 2017-06-14 22:08, werner kassens a écrit :
|
Free forum by Nabble | Edit this page |