On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
<[hidden email]> wrote: >We run benchmarks every day on > http://speed.squeak.org/. Reviewing at the timeline http://speed.squeak.org/timeline/ I am curious about some of the performance improvements. Several significant improvements seem aligned with Cog commit 2016120519 for example AStar... http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on which seems to be "Merge pull request #105 from estebanlm/Cog" https://github.com/OpenSmalltalk/opensmalltalk-vm/network But then also aligned with the same Cog commit, there is a corresponding improvement in the rsqueak performance, for example ArrayAccess... http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on ...which seems to indicate a common cause from an in-Image improvement, for which between 2016120322 and 2016120519 I see "The various scanFor: and scanForEmptySlotFor: implementations only need to access the size of their array once." * Trunk: Kernel-eem.1050.mcz (MethodDictionary) http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html * Trunk: System-eem.920.mcz (SystemDictionary) http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html So I'm curious do the benchmarks track both Image and VM changes? Perhaps it would be useful to also benchmark Pharo to control for Image changes (now that its returned to the fold using the mainline opensmalltalk-vm) Now I'm further curious, the benchmarks below see a massive jump down for 2016120519 for all data series, but all results are relatively very close to zero, so I wonder are these valid results? ByteStringHash ClassVarBinding Compiler EqualBytes Fib FillArray FillByteArray FillString Graphsearch HashBytes HashWords InstVarAccess IntLoop IntegerByteCodes ModularConvolutionBytes ModularConvolutionWords ModularDotProductBytes ModularDotProductWords ModularSumBytes ModularSumWords PermutationCompositionArray PermutationCompositionWords Richards Send SendPrimitive SendWithManyArguments Slopstone WideStringHash Here all series jump down, and the result range seems valid... FloatLoop Here all series jump, and the result range seems valid. Rsqueak improves more... ShootoutSpectraNorm Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is very close to zero, so are they valid? rsqueakvm64 shows no change... Blowfish Here only Cog jumps down, RSqueak stays much higher, seems valid.. OrderedCollectionRandomInsert Nbody Here only Cog jumps down, RSqueak being already pretty low. The results seem valid AStar ArrayAccess BinaryTree BitBltExampleOne DeltaBlue DoesNotUnderstand Json Mandala MandelbrotIterative1Thread MandelbrotIterative2Thread MandelbrotIterative4Thread MandelbrotIterative8Thread MandelbrotRecursive1Thread MandelbrotRecursive2Thread MandelbrotRecursive4Thread MandelbrotRecursive8Thread OrderedCollectionInsertFirst Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid... Smopstone SplayTree ToolInteraction Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no change, seems valid... Fannkuck Here RSqueak improves, Cog stays the same, seems valid... ShootoutMandelbrot3 ShootoutNBody The follow have no significant change around 5 Dec... BitBltColorMapping - all already low DSAGen - all already low KMeans LRUCachePrintString Mandelbrot Polymorphy RaiseToLargeNumber RenderFont ShaLongString ShootoutBinarytrees ShootoutChameneosRedux ShootoutFannkuchRedux ShootoutFasta ShootoutFastaRedux ShootoutKnucleotide ShootoutMeteor ShootoutPidigits ShootoutRegexDNA ShootoutReverseComplement ShootoutThreadring I also see around that time on 2 Dec Fabio says "I have fixed the Squeak-trunk pipeline and we finally get daily updates again." So maybe there were suddenly a bundle of improvements that showed up in one go - but it seems the 2016120322 build should have picked those up and didn't. http://forum.world.st/Squeak-trunk-images-td4925570.html |
Probably the tests were run on different hardware.
Levente On Sun, 25 Dec 2016, Ben Coman wrote: > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff > <[hidden email]> wrote: >> We run benchmarks every day on >> http://speed.squeak.org/. > > Reviewing at the timeline http://speed.squeak.org/timeline/ > I am curious about some of the performance improvements. > > Several significant improvements seem aligned with Cog commit 2016120519 > for example AStar... > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on > > which seems to be "Merge pull request #105 from estebanlm/Cog" > https://github.com/OpenSmalltalk/opensmalltalk-vm/network > > But then also aligned with the same Cog commit, there is a > corresponding improvement in the rsqueak performance, for example > ArrayAccess... > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on > > ...which seems to indicate a common cause from an in-Image > improvement, for which between 2016120322 and 2016120519 I see "The > various scanFor: and scanForEmptySlotFor: implementations only need to > access the size of their array once." > * Trunk: Kernel-eem.1050.mcz (MethodDictionary) > http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html > * Trunk: System-eem.920.mcz (SystemDictionary) > http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html > > > So I'm curious do the benchmarks track both Image and VM changes? > Perhaps it would be useful to also benchmark Pharo to control for > Image changes (now that its returned to the fold using the mainline > opensmalltalk-vm) > > Now I'm further curious, the benchmarks below see a massive jump down > for 2016120519 for all data series, but all results are relatively > very close to zero, so I wonder are these valid results? > ByteStringHash > ClassVarBinding > Compiler > EqualBytes > Fib > FillArray > FillByteArray > FillString > Graphsearch > HashBytes > HashWords > InstVarAccess > IntLoop > IntegerByteCodes > ModularConvolutionBytes > ModularConvolutionWords > ModularDotProductBytes > ModularDotProductWords > ModularSumBytes > ModularSumWords > PermutationCompositionArray > PermutationCompositionWords > Richards > Send > SendPrimitive > SendWithManyArguments > Slopstone > WideStringHash > > Here all series jump down, and the result range seems valid... > FloatLoop > > Here all series jump, and the result range seems valid. Rsqueak improves more... > ShootoutSpectraNorm > > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is > very close to zero, so are they valid? rsqueakvm64 shows no change... > Blowfish > > Here only Cog jumps down, RSqueak stays much higher, seems valid.. > OrderedCollectionRandomInsert > Nbody > > Here only Cog jumps down, RSqueak being already pretty low. The > results seem valid > AStar > ArrayAccess > BinaryTree > BitBltExampleOne > DeltaBlue > DoesNotUnderstand > Json > Mandala > MandelbrotIterative1Thread > MandelbrotIterative2Thread > MandelbrotIterative4Thread > MandelbrotIterative8Thread > MandelbrotRecursive1Thread > MandelbrotRecursive2Thread > MandelbrotRecursive4Thread > MandelbrotRecursive8Thread > OrderedCollectionInsertFirst > > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid... > Smopstone > SplayTree > ToolInteraction > > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no > change, seems valid... > Fannkuck > > Here RSqueak improves, Cog stays the same, seems valid... > ShootoutMandelbrot3 > ShootoutNBody > > The follow have no significant change around 5 Dec... > BitBltColorMapping - all already low > DSAGen - all already low > KMeans > LRUCachePrintString > Mandelbrot > Polymorphy > RaiseToLargeNumber > RenderFont > ShaLongString > ShootoutBinarytrees > ShootoutChameneosRedux > ShootoutFannkuchRedux > ShootoutFasta > ShootoutFastaRedux > ShootoutKnucleotide > ShootoutMeteor > ShootoutPidigits > ShootoutRegexDNA > ShootoutReverseComplement > ShootoutThreadring > > > I also see around that time on 2 Dec Fabio says "I have fixed the > Squeak-trunk pipeline and we finally get daily updates again." So > maybe there were suddenly a bundle of improvements that showed up in > one go - but it seems the 2016120322 build should have picked those up > and didn't. > http://forum.world.st/Squeak-trunk-images-td4925570.html |
No, sorry, my bad. There was no change in hardware. The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run. Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took. I should probably just delete older results, because they are no longer comparable. And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image. Also, I should update the website to show the image version, too, because we update that sporadically, too. Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10: Probably the tests were run on different hardware. |
This gets more confusing when we look at RSqueak, because we still see massive changes in performance there due to ongoing refactorings and continued changes to the JIT. I will just delete the older results. Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:33:
|
One reason for relative changes between RSqueak and Cog as far as I can tell is that previously we were sometimes favoring the RSqueak JIT, especially for the tiny loops. When the number of iterations was a constant in the compiled method (rather than an instance variable determined in the first pass) that constant would end up in our assembler and thus the (already tiny) loop got even shorter because we no longer even read the literal. Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:36:
|
In reply to this post by timfelgentreff
On Sun, 25 Dec 2016, Tim Felgentreff wrote:
> > No, sorry, my bad. There was no change in hardware. > > The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time > it took for those to run. > > Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing > such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took. > > I should probably just delete older results, because they are no longer comparable. > > And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image. Why is it called nocounters? What is counters? > > Also, I should update the website to show the image version, too, because we update that sporadically, too. Yes, that would be helpful. Levente > > > Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10: > Probably the tests were run on different hardware. > > Levente > > On Sun, 25 Dec 2016, Ben Coman wrote: > > > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff > > <[hidden email]> wrote: > >> We run benchmarks every day on > >> http://speed.squeak.org/. > > > > Reviewing at the timeline http://speed.squeak.org/timeline/ > > I am curious about some of the performance improvements. > > > > Several significant improvements seem aligned with Cog commit 2016120519 > > for example AStar... > > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on > > > > which seems to be "Merge pull request #105 from estebanlm/Cog" > > https://github.com/OpenSmalltalk/opensmalltalk-vm/network > > > > But then also aligned with the same Cog commit, there is a > > corresponding improvement in the rsqueak performance, for example > > ArrayAccess... > > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on > > > > ...which seems to indicate a common cause from an in-Image > > improvement, for which between 2016120322 and 2016120519 I see "The > > various scanFor: and scanForEmptySlotFor: implementations only need to > > access the size of their array once." > > * Trunk: Kernel-eem.1050.mcz (MethodDictionary) > > http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html > > * Trunk: System-eem.920.mcz (SystemDictionary) > > http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html > > > > > > So I'm curious do the benchmarks track both Image and VM changes? > > Perhaps it would be useful to also benchmark Pharo to control for > > Image changes (now that its returned to the fold using the mainline > > opensmalltalk-vm) > > > > Now I'm further curious, the benchmarks below see a massive jump down > > for 2016120519 for all data series, but all results are relatively > > very close to zero, so I wonder are these valid results? > > ByteStringHash > > ClassVarBinding > > Compiler > > EqualBytes > > Fib > > FillArray > > FillByteArray > > FillString > > Graphsearch > > HashBytes > > HashWords > > InstVarAccess > > IntLoop > > IntegerByteCodes > > ModularConvolutionBytes > > ModularConvolutionWords > > ModularDotProductBytes > > ModularDotProductWords > > ModularSumBytes > > ModularSumWords > > PermutationCompositionArray > > PermutationCompositionWords > > Richards > > Send > > SendPrimitive > > SendWithManyArguments > > Slopstone > > WideStringHash > > > > Here all series jump down, and the result range seems valid... > > FloatLoop > > > > Here all series jump, and the result range seems valid. Rsqueak improves more... > > ShootoutSpectraNorm > > > > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is > > very close to zero, so are they valid? rsqueakvm64 shows no change... > > Blowfish > > > > Here only Cog jumps down, RSqueak stays much higher, seems valid.. > > OrderedCollectionRandomInsert > > Nbody > > > > Here only Cog jumps down, RSqueak being already pretty low. The > > results seem valid > > AStar > > ArrayAccess > > BinaryTree > > BitBltExampleOne > > DeltaBlue > > DoesNotUnderstand > > Json > > Mandala > > MandelbrotIterative1Thread > > MandelbrotIterative2Thread > > MandelbrotIterative4Thread > > MandelbrotIterative8Thread > > MandelbrotRecursive1Thread > > MandelbrotRecursive2Thread > > MandelbrotRecursive4Thread > > MandelbrotRecursive8Thread > > OrderedCollectionInsertFirst > > > > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid... > > Smopstone > > SplayTree > > ToolInteraction > > > > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no > > change, seems valid... > > Fannkuck > > > > Here RSqueak improves, Cog stays the same, seems valid... > > ShootoutMandelbrot3 > > ShootoutNBody > > > > The follow have no significant change around 5 Dec... > > BitBltColorMapping - all already low > > DSAGen - all already low > > KMeans > > LRUCachePrintString > > Mandelbrot > > Polymorphy > > RaiseToLargeNumber > > RenderFont > > ShaLongString > > ShootoutBinarytrees > > ShootoutChameneosRedux > > ShootoutFannkuchRedux > > ShootoutFasta > > ShootoutFastaRedux > > ShootoutKnucleotide > > ShootoutMeteor > > ShootoutPidigits > > ShootoutRegexDNA > > ShootoutReverseComplement > > ShootoutThreadring > > > > > > I also see around that time on 2 Dec Fabio says "I have fixed the > > Squeak-trunk pipeline and we finally get daily updates again." So > > maybe there were suddenly a bundle of improvements that showed up in > > one go - but it seems the 2016120322 build should have picked those up > > and didn't. > > http://forum.world.st/Squeak-trunk-images-td4925570.html > > > |
Nocounters does not have the activation counters for Sista, counters does (but without Sista active). We have both to see what the impact of the counters is. Those two and the Sista set of benchmarks all run on Pharo Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 23:02: On Sun, 25 Dec 2016, Tim Felgentreff wrote: |
On Mon, Dec 26, 2016 at 6:51 AM, Tim Felgentreff
<[hidden email]> wrote: > Nocounters does not have the activation counters for Sista, counters does > (but without Sista active). We have both to see what the impact of the > counters is. Those two and the Sista set of benchmarks all run on Pharo it would be useful to have that info on the About page, if not elsewhere. cheers -ben |
Free forum by Nabble | Edit this page |