Smalltalk › Squeak › Squeak - Dev

Squeak VM Speed Centre - validity and basis of improvements 5 Dec

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Ben Coman

Squeak VM Speed Centre - validity and basis of improvements 5 Dec

On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
<[hidden email]> wrote:
>We run benchmarks every day on
> http://speed.squeak.org/.

Reviewing at the timeline http://speed.squeak.org/timeline/
I am curious about some of the performance improvements.

Several significant improvements seem aligned with Cog commit 2016120519
for example AStar...
http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on

which seems to be "Merge pull request #105 from estebanlm/Cog"
https://github.com/OpenSmalltalk/opensmalltalk-vm/network

But then also aligned with the same Cog commit, there is a
corresponding improvement in the rsqueak performance, for example
ArrayAccess...
http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on

...which seems to indicate a common cause from an in-Image
improvement, for which between 2016120322 and 2016120519 I see "The
various scanFor: and scanForEmptySlotFor: implementations only need to
access the size of their array once."
* Trunk: Kernel-eem.1050.mcz (MethodDictionary)
http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
* Trunk: System-eem.920.mcz (SystemDictionary)
http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html

So I'm curious do the benchmarks track both Image and VM changes?
Perhaps it would be useful to also benchmark Pharo to control for
Image changes (now that its returned to the fold using the mainline
opensmalltalk-vm)

Now I'm further curious, the benchmarks below see a massive jump down
for 2016120519 for all data series, but all results are relatively
very close to zero, so I wonder are these valid results?
ByteStringHash
ClassVarBinding
Compiler
EqualBytes
Fib
FillArray
FillByteArray
FillString
Graphsearch
HashBytes
HashWords
InstVarAccess
IntLoop
IntegerByteCodes
ModularConvolutionBytes
ModularConvolutionWords
ModularDotProductBytes
ModularDotProductWords
ModularSumBytes
ModularSumWords
PermutationCompositionArray
PermutationCompositionWords
Richards
Send
SendPrimitive
SendWithManyArguments
Slopstone
WideStringHash

Here all series jump down, and the result range seems valid...
FloatLoop

Here all series jump, and the result range seems valid. Rsqueak improves more...
ShootoutSpectraNorm

Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
very close to zero, so are they valid? rsqueakvm64 shows no change...
Blowfish

Here only Cog jumps down, RSqueak stays much higher, seems valid..
OrderedCollectionRandomInsert
Nbody

Here only Cog jumps down, RSqueak being already pretty low. The
results seem valid
AStar
ArrayAccess
BinaryTree
BitBltExampleOne
DeltaBlue
DoesNotUnderstand
Json
Mandala
MandelbrotIterative1Thread
MandelbrotIterative2Thread
MandelbrotIterative4Thread
MandelbrotIterative8Thread
MandelbrotRecursive1Thread
MandelbrotRecursive2Thread
MandelbrotRecursive4Thread
MandelbrotRecursive8Thread
OrderedCollectionInsertFirst

Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
Smopstone
SplayTree
ToolInteraction

Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
change, seems valid...
Fannkuck

Here RSqueak improves, Cog stays the same, seems valid...
ShootoutMandelbrot3
ShootoutNBody

The follow have no significant change around 5 Dec...
BitBltColorMapping - all already low
DSAGen - all already low
KMeans
LRUCachePrintString
Mandelbrot
Polymorphy
RaiseToLargeNumber
RenderFont
ShaLongString
ShootoutBinarytrees
ShootoutChameneosRedux
ShootoutFannkuchRedux
ShootoutFasta
ShootoutFastaRedux
ShootoutKnucleotide
ShootoutMeteor
ShootoutPidigits
ShootoutRegexDNA
ShootoutReverseComplement
ShootoutThreadring

I also see around that time on 2 Dec Fabio says "I have fixed the
Squeak-trunk pipeline and we finally get daily updates again." So
maybe there were suddenly a bundle of improvements that showed up in
one go - but it seems the 2016120322 build should have picked those up
and didn't.
http://forum.world.st/Squeak-trunk-images-td4925570.html

Levente Uzonyi

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
> http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
> ByteStringHash
> ClassVarBinding
> Compiler
> EqualBytes
> Fib
> FillArray
> FillByteArray
> FillString
> Graphsearch
> HashBytes
> HashWords
> InstVarAccess
> IntLoop
> IntegerByteCodes
> ModularConvolutionBytes
> ModularConvolutionWords
> ModularDotProductBytes
> ModularDotProductWords
> ModularSumBytes
> ModularSumWords
> PermutationCompositionArray
> PermutationCompositionWords
> Richards
> Send
> SendPrimitive
> SendWithManyArguments
> Slopstone
> WideStringHash
>
> Here all series jump down, and the result range seems valid...
> FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
> ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid? rsqueakvm64 shows no change...
> Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
> OrderedCollectionRandomInsert
> Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
> AStar
> ArrayAccess
> BinaryTree
> BitBltExampleOne
> DeltaBlue
> DoesNotUnderstand
> Json
> Mandala
> MandelbrotIterative1Thread
> MandelbrotIterative2Thread
> MandelbrotIterative4Thread
> MandelbrotIterative8Thread
> MandelbrotRecursive1Thread
> MandelbrotRecursive2Thread
> MandelbrotRecursive4Thread
> MandelbrotRecursive8Thread
> OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> Smopstone
> SplayTree
> ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
> Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
> ShootoutMandelbrot3
> ShootoutNBody
>
> The follow have no significant change around 5 Dec...
> BitBltColorMapping - all already low
> DSAGen - all already low
> KMeans
> LRUCachePrintString
> Mandelbrot
> Polymorphy
> RaiseToLargeNumber
> RenderFont
> ShaLongString
> ShootoutBinarytrees
> ShootoutChameneosRedux
> ShootoutFannkuchRedux
> ShootoutFasta
> ShootoutFastaRedux
> ShootoutKnucleotide
> ShootoutMeteor
> ShootoutPidigits
> ShootoutRegexDNA
> ShootoutReverseComplement
> ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html

timfelgentreff

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.

Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:

Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
> http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
> ByteStringHash
> ClassVarBinding
> Compiler
> EqualBytes
> Fib
> FillArray
> FillByteArray
> FillString
> Graphsearch
> HashBytes
> HashWords
> InstVarAccess
> IntLoop
> IntegerByteCodes
> ModularConvolutionBytes
> ModularConvolutionWords
> ModularDotProductBytes
> ModularDotProductWords
> ModularSumBytes
> ModularSumWords
> PermutationCompositionArray
> PermutationCompositionWords
> Richards
> Send
> SendPrimitive
> SendWithManyArguments
> Slopstone
> WideStringHash
>
> Here all series jump down, and the result range seems valid...
> FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
> ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid? rsqueakvm64 shows no change...
> Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
> OrderedCollectionRandomInsert
> Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
> AStar
> ArrayAccess
> BinaryTree
> BitBltExampleOne
> DeltaBlue
> DoesNotUnderstand
> Json
> Mandala
> MandelbrotIterative1Thread
> MandelbrotIterative2Thread
> MandelbrotIterative4Thread
> MandelbrotIterative8Thread
> MandelbrotRecursive1Thread
> MandelbrotRecursive2Thread
> MandelbrotRecursive4Thread
> MandelbrotRecursive8Thread
> OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> Smopstone
> SplayTree
> ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
> Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
> ShootoutMandelbrot3
> ShootoutNBody
>
> The follow have no significant change around 5 Dec...
> BitBltColorMapping - all already low
> DSAGen - all already low
> KMeans
> LRUCachePrintString
> Mandelbrot
> Polymorphy
> RaiseToLargeNumber
> RenderFont
> ShaLongString
> ShootoutBinarytrees
> ShootoutChameneosRedux
> ShootoutFannkuchRedux
> ShootoutFasta
> ShootoutFastaRedux
> ShootoutKnucleotide
> ShootoutMeteor
> ShootoutPidigits
> ShootoutRegexDNA
> ShootoutReverseComplement
> ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html

timfelgentreff

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

This gets more confusing when we look at RSqueak, because we still see massive changes in performance there due to ongoing refactorings and continued changes to the JIT.

I will just delete the older results.

Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:33:

No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.

Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
> http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
> ByteStringHash
> ClassVarBinding
> Compiler
> EqualBytes
> Fib
> FillArray
> FillByteArray
> FillString
> Graphsearch
> HashBytes
> HashWords
> InstVarAccess
> IntLoop
> IntegerByteCodes
> ModularConvolutionBytes
> ModularConvolutionWords
> ModularDotProductBytes
> ModularDotProductWords
> ModularSumBytes
> ModularSumWords
> PermutationCompositionArray
> PermutationCompositionWords
> Richards
> Send
> SendPrimitive
> SendWithManyArguments
> Slopstone
> WideStringHash
>
> Here all series jump down, and the result range seems valid...
> FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
> ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid? rsqueakvm64 shows no change...
> Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
> OrderedCollectionRandomInsert
> Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
> AStar
> ArrayAccess
> BinaryTree
> BitBltExampleOne
> DeltaBlue
> DoesNotUnderstand
> Json
> Mandala
> MandelbrotIterative1Thread
> MandelbrotIterative2Thread
> MandelbrotIterative4Thread
> MandelbrotIterative8Thread
> MandelbrotRecursive1Thread
> MandelbrotRecursive2Thread
> MandelbrotRecursive4Thread
> MandelbrotRecursive8Thread
> OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> Smopstone
> SplayTree
> ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
> Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
> ShootoutMandelbrot3
> ShootoutNBody
>
> The follow have no significant change around 5 Dec...
> BitBltColorMapping - all already low
> DSAGen - all already low
> KMeans
> LRUCachePrintString
> Mandelbrot
> Polymorphy
> RaiseToLargeNumber
> RenderFont
> ShaLongString
> ShootoutBinarytrees
> ShootoutChameneosRedux
> ShootoutFannkuchRedux
> ShootoutFasta
> ShootoutFastaRedux
> ShootoutKnucleotide
> ShootoutMeteor
> ShootoutPidigits
> ShootoutRegexDNA
> ShootoutReverseComplement
> ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html

timfelgentreff

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

One reason for relative changes between RSqueak and Cog as far as I can tell is that previously we were sometimes favoring the RSqueak JIT, especially for the tiny loops. When the number of iterations was a constant in the compiled method (rather than an instance variable determined in the first pass) that constant would end up in our assembler and thus the (already tiny) loop got even shorter because we no longer even read the literal.

Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:36:

This gets more confusing when we look at RSqueak, because we still see massive changes in performance there due to ongoing refactorings and continued changes to the JIT.

I will just delete the older results.

Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:33:
No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.

Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
> http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
> ByteStringHash
> ClassVarBinding
> Compiler
> EqualBytes
> Fib
> FillArray
> FillByteArray
> FillString
> Graphsearch
> HashBytes
> HashWords
> InstVarAccess
> IntLoop
> IntegerByteCodes
> ModularConvolutionBytes
> ModularConvolutionWords
> ModularDotProductBytes
> ModularDotProductWords
> ModularSumBytes
> ModularSumWords
> PermutationCompositionArray
> PermutationCompositionWords
> Richards
> Send
> SendPrimitive
> SendWithManyArguments
> Slopstone
> WideStringHash
>
> Here all series jump down, and the result range seems valid...
> FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
> ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid? rsqueakvm64 shows no change...
> Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
> OrderedCollectionRandomInsert
> Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
> AStar
> ArrayAccess
> BinaryTree
> BitBltExampleOne
> DeltaBlue
> DoesNotUnderstand
> Json
> Mandala
> MandelbrotIterative1Thread
> MandelbrotIterative2Thread
> MandelbrotIterative4Thread
> MandelbrotIterative8Thread
> MandelbrotRecursive1Thread
> MandelbrotRecursive2Thread
> MandelbrotRecursive4Thread
> MandelbrotRecursive8Thread
> OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> Smopstone
> SplayTree
> ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
> Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
> ShootoutMandelbrot3
> ShootoutNBody
>
> The follow have no significant change around 5 Dec...
> BitBltColorMapping - all already low
> DSAGen - all already low
> KMeans
> LRUCachePrintString
> Mandelbrot
> Polymorphy
> RaiseToLargeNumber
> RenderFont
> ShaLongString
> ShootoutBinarytrees
> ShootoutChameneosRedux
> ShootoutFannkuchRedux
> ShootoutFasta
> ShootoutFastaRedux
> ShootoutKnucleotide
> ShootoutMeteor
> ShootoutPidigits
> ShootoutRegexDNA
> ShootoutReverseComplement
> ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html

Levente Uzonyi

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

In reply to this post by timfelgentreff

On Sun, 25 Dec 2016, Tim Felgentreff wrote:

>
> No, sorry, my bad. There was no change in hardware.
>
> The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time
> it took for those to run.
>
> Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing
> such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.
>
> I should probably just delete older results, because they are no longer comparable.

That sounds reasonable.

>
> And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Why is it called nocounters? What is counters?

>
> Also, I should update the website to show the image version, too, because we update that sporadically, too.

Yes, that would be helpful.

Levente

>
>
> Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
> Probably the tests were run on different hardware.
>
> Levente
>
> On Sun, 25 Dec 2016, Ben Coman wrote:
>
> > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> > <[hidden email]> wrote:
> >> We run benchmarks every day on
> >> http://speed.squeak.org/.
> >
> > Reviewing at the timeline http://speed.squeak.org/timeline/
> > I am curious about some of the performance improvements.
> >
> > Several significant improvements seem aligned with Cog commit 2016120519
> > for example AStar...
> > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
> >
> > which seems to be "Merge pull request #105 from estebanlm/Cog"
> > https://github.com/OpenSmalltalk/opensmalltalk-vm/network
> >
> > But then also aligned with the same Cog commit, there is a
> > corresponding improvement in the rsqueak performance, for example
> > ArrayAccess...
> > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
> >
> > ...which seems to indicate a common cause from an in-Image
> > improvement, for which between 2016120322 and 2016120519 I see "The
> > various scanFor: and scanForEmptySlotFor: implementations only need to
> > access the size of their array once."
> > * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> > http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> > * Trunk: System-eem.920.mcz (SystemDictionary)
> > http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
> >
> >
> > So I'm curious do the benchmarks track both Image and VM changes?
> > Perhaps it would be useful to also benchmark Pharo to control for
> > Image changes (now that its returned to the fold using the mainline
> > opensmalltalk-vm)
> >
> > Now I'm further curious, the benchmarks below see a massive jump down
> > for 2016120519 for all data series, but all results are relatively
> > very close to zero, so I wonder are these valid results?
> > ByteStringHash
> > ClassVarBinding
> > Compiler
> > EqualBytes
> > Fib
> > FillArray
> > FillByteArray
> > FillString
> > Graphsearch
> > HashBytes
> > HashWords
> > InstVarAccess
> > IntLoop
> > IntegerByteCodes
> > ModularConvolutionBytes
> > ModularConvolutionWords
> > ModularDotProductBytes
> > ModularDotProductWords
> > ModularSumBytes
> > ModularSumWords
> > PermutationCompositionArray
> > PermutationCompositionWords
> > Richards
> > Send
> > SendPrimitive
> > SendWithManyArguments
> > Slopstone
> > WideStringHash
> >
> > Here all series jump down, and the result range seems valid...
> > FloatLoop
> >
> > Here all series jump, and the result range seems valid. Rsqueak improves more...
> > ShootoutSpectraNorm
> >
> > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> > very close to zero, so are they valid? rsqueakvm64 shows no change...
> > Blowfish
> >
> > Here only Cog jumps down, RSqueak stays much higher, seems valid..
> > OrderedCollectionRandomInsert
> > Nbody
> >
> > Here only Cog jumps down, RSqueak being already pretty low. The
> > results seem valid
> > AStar
> > ArrayAccess
> > BinaryTree
> > BitBltExampleOne
> > DeltaBlue
> > DoesNotUnderstand
> > Json
> > Mandala
> > MandelbrotIterative1Thread
> > MandelbrotIterative2Thread
> > MandelbrotIterative4Thread
> > MandelbrotIterative8Thread
> > MandelbrotRecursive1Thread
> > MandelbrotRecursive2Thread
> > MandelbrotRecursive4Thread
> > MandelbrotRecursive8Thread
> > OrderedCollectionInsertFirst
> >
> > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> > Smopstone
> > SplayTree
> > ToolInteraction
> >
> > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> > change, seems valid...
> > Fannkuck
> >
> > Here RSqueak improves, Cog stays the same, seems valid...
> > ShootoutMandelbrot3
> > ShootoutNBody
> >
> > The follow have no significant change around 5 Dec...
> > BitBltColorMapping - all already low
> > DSAGen - all already low
> > KMeans
> > LRUCachePrintString
> > Mandelbrot
> > Polymorphy
> > RaiseToLargeNumber
> > RenderFont
> > ShaLongString
> > ShootoutBinarytrees
> > ShootoutChameneosRedux
> > ShootoutFannkuchRedux
> > ShootoutFasta
> > ShootoutFastaRedux
> > ShootoutKnucleotide
> > ShootoutMeteor
> > ShootoutPidigits
> > ShootoutRegexDNA
> > ShootoutReverseComplement
> > ShootoutThreadring
> >
> >
> > I also see around that time on 2 Dec Fabio says "I have fixed the
> > Squeak-trunk pipeline and we finally get daily updates again." So
> > maybe there were suddenly a bundle of improvements that showed up in
> > one go - but it seems the 2016120322 build should have picked those up
> > and didn't.
> > http://forum.world.st/Squeak-trunk-images-td4925570.html
>
>
>

timfelgentreff

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Nocounters does not have the activation counters for Sista, counters does (but without Sista active). We have both to see what the impact of the counters is. Those two and the Sista set of benchmarks all run on Pharo

Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 23:02:

On Sun, 25 Dec 2016, Tim Felgentreff wrote:

>
> No, sorry, my bad. There was no change in hardware.
>
> The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time
> it took for those to run.
>
> Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing
> such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.
>
> I should probably just delete older results, because they are no longer comparable.

That sounds reasonable.

>
> And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Why is it called nocounters? What is counters?

>
> Also, I should update the website to show the image version, too, because we update that sporadically, too.

Yes, that would be helpful.

Levente

>
>
> Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
> Probably the tests were run on different hardware.
>
> Levente
>
> On Sun, 25 Dec 2016, Ben Coman wrote:
>
> > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> > <[hidden email]> wrote:
> >> We run benchmarks every day on
> >> http://speed.squeak.org/.
> >
> > Reviewing at the timeline http://speed.squeak.org/timeline/
> > I am curious about some of the performance improvements.
> >
> > Several significant improvements seem aligned with Cog commit 2016120519
> > for example AStar...
> > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
> >
> > which seems to be "Merge pull request #105 from estebanlm/Cog"
> > https://github.com/OpenSmalltalk/opensmalltalk-vm/network
> >
> > But then also aligned with the same Cog commit, there is a
> > corresponding improvement in the rsqueak performance, for example
> > ArrayAccess...
> > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
> >
> > ...which seems to indicate a common cause from an in-Image
> > improvement, for which between 2016120322 and 2016120519 I see "The
> > various scanFor: and scanForEmptySlotFor: implementations only need to
> > access the size of their array once."
> > * Trunk: Kernel-eem.1050.mcz (MethodDictionary)
> > http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> > * Trunk: System-eem.920.mcz (SystemDictionary)
> > http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
> >
> >
> > So I'm curious do the benchmarks track both Image and VM changes?
> > Perhaps it would be useful to also benchmark Pharo to control for
> > Image changes (now that its returned to the fold using the mainline
> > opensmalltalk-vm)
> >
> > Now I'm further curious, the benchmarks below see a massive jump down
> > for 2016120519 for all data series, but all results are relatively
> > very close to zero, so I wonder are these valid results?
> > ByteStringHash
> > ClassVarBinding
> > Compiler
> > EqualBytes
> > Fib
> > FillArray
> > FillByteArray
> > FillString
> > Graphsearch
> > HashBytes
> > HashWords
> > InstVarAccess
> > IntLoop
> > IntegerByteCodes
> > ModularConvolutionBytes
> > ModularConvolutionWords
> > ModularDotProductBytes
> > ModularDotProductWords
> > ModularSumBytes
> > ModularSumWords
> > PermutationCompositionArray
> > PermutationCompositionWords
> > Richards
> > Send
> > SendPrimitive
> > SendWithManyArguments
> > Slopstone
> > WideStringHash
> >
> > Here all series jump down, and the result range seems valid...
> > FloatLoop
> >
> > Here all series jump, and the result range seems valid. Rsqueak improves more...
> > ShootoutSpectraNorm
> >
> > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> > very close to zero, so are they valid? rsqueakvm64 shows no change...
> > Blowfish
> >
> > Here only Cog jumps down, RSqueak stays much higher, seems valid..
> > OrderedCollectionRandomInsert
> > Nbody
> >
> > Here only Cog jumps down, RSqueak being already pretty low. The
> > results seem valid
> > AStar
> > ArrayAccess
> > BinaryTree
> > BitBltExampleOne
> > DeltaBlue
> > DoesNotUnderstand
> > Json
> > Mandala
> > MandelbrotIterative1Thread
> > MandelbrotIterative2Thread
> > MandelbrotIterative4Thread
> > MandelbrotIterative8Thread
> > MandelbrotRecursive1Thread
> > MandelbrotRecursive2Thread
> > MandelbrotRecursive4Thread
> > MandelbrotRecursive8Thread
> > OrderedCollectionInsertFirst
> >
> > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
> > Smopstone
> > SplayTree
> > ToolInteraction
> >
> > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> > change, seems valid...
> > Fannkuck
> >
> > Here RSqueak improves, Cog stays the same, seems valid...
> > ShootoutMandelbrot3
> > ShootoutNBody
> >
> > The follow have no significant change around 5 Dec...
> > BitBltColorMapping - all already low
> > DSAGen - all already low
> > KMeans
> > LRUCachePrintString
> > Mandelbrot
> > Polymorphy
> > RaiseToLargeNumber
> > RenderFont
> > ShaLongString
> > ShootoutBinarytrees
> > ShootoutChameneosRedux
> > ShootoutFannkuchRedux
> > ShootoutFasta
> > ShootoutFastaRedux
> > ShootoutKnucleotide
> > ShootoutMeteor
> > ShootoutPidigits
> > ShootoutRegexDNA
> > ShootoutReverseComplement
> > ShootoutThreadring
> >
> >
> > I also see around that time on 2 Dec Fabio says "I have fixed the
> > Squeak-trunk pipeline and we finally get daily updates again." So
> > maybe there were suddenly a bundle of improvements that showed up in
> > one go - but it seems the 2016120322 build should have picked those up
> > and didn't.
> > http://forum.world.st/Squeak-trunk-images-td4925570.html
>
>
>

Ben Coman

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

On Mon, Dec 26, 2016 at 6:51 AM, Tim Felgentreff
<[hidden email]> wrote:
> Nocounters does not have the activation counters for Sista, counters does
> (but without Sista active). We have both to see what the impact of the
> counters is. Those two and the Sista set of benchmarks all run on Pharo

it would be useful to have that info on the About page, if not elsewhere.

cheers -ben