Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Ben Coman
On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
<[hidden email]> wrote:
>We run benchmarks every day on
> http://speed.squeak.org/.

Reviewing at the timeline   http://speed.squeak.org/timeline/
I am curious about some of the performance improvements.

Several significant improvements seem aligned with Cog commit 2016120519
for example AStar...
http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on

which seems to be "Merge pull request #105 from estebanlm/Cog"
https://github.com/OpenSmalltalk/opensmalltalk-vm/network

But then also aligned with the same Cog commit, there is a
corresponding improvement in the rsqueak performance, for example
ArrayAccess...
http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on

...which seems to indicate a common cause from an in-Image
improvement, for which between 2016120322 and 2016120519 I see "The
various scanFor: and scanForEmptySlotFor: implementations only need to
access the size of their array once."
* Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
* Trunk: System-eem.920.mcz (SystemDictionary)
   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html


So I'm curious do the benchmarks track both Image and VM changes?
Perhaps it would be useful to also benchmark Pharo to control for
Image changes (now that its returned to the fold using the mainline
opensmalltalk-vm)

Now I'm further curious, the benchmarks below see a massive jump down
for 2016120519 for all data series, but all results are relatively
very close to zero, so I wonder are these valid results?
  ByteStringHash
  ClassVarBinding
  Compiler
  EqualBytes
  Fib
  FillArray
  FillByteArray
  FillString
  Graphsearch
  HashBytes
  HashWords
  InstVarAccess
  IntLoop
  IntegerByteCodes
  ModularConvolutionBytes
  ModularConvolutionWords
  ModularDotProductBytes
  ModularDotProductWords
  ModularSumBytes
  ModularSumWords
  PermutationCompositionArray
  PermutationCompositionWords
  Richards
  Send
  SendPrimitive
  SendWithManyArguments
  Slopstone
  WideStringHash

Here all series jump down, and the result range seems valid...
  FloatLoop

Here all series jump, and the result range seems valid. Rsqueak improves more...
  ShootoutSpectraNorm

Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
very close to zero, so are they valid?  rsqueakvm64 shows no change...
  Blowfish

Here only Cog jumps down, RSqueak stays much higher, seems valid..
  OrderedCollectionRandomInsert
  Nbody

Here only Cog jumps down, RSqueak being already pretty low. The
results seem valid
  AStar
  ArrayAccess
  BinaryTree
  BitBltExampleOne
  DeltaBlue
  DoesNotUnderstand
  Json
  Mandala
  MandelbrotIterative1Thread
  MandelbrotIterative2Thread
  MandelbrotIterative4Thread
  MandelbrotIterative8Thread
  MandelbrotRecursive1Thread
  MandelbrotRecursive2Thread
  MandelbrotRecursive4Thread
  MandelbrotRecursive8Thread
  OrderedCollectionInsertFirst

Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
  Smopstone
  SplayTree
  ToolInteraction

Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
change, seems valid...
  Fannkuck

Here RSqueak improves, Cog stays the same, seems valid...
  ShootoutMandelbrot3
  ShootoutNBody

The follow have no significant change around 5 Dec...
  BitBltColorMapping  - all already low
  DSAGen - all already low
  KMeans
  LRUCachePrintString
  Mandelbrot
  Polymorphy
  RaiseToLargeNumber
  RenderFont
  ShaLongString
  ShootoutBinarytrees
  ShootoutChameneosRedux
  ShootoutFannkuchRedux
  ShootoutFasta
  ShootoutFastaRedux
  ShootoutKnucleotide
  ShootoutMeteor
  ShootoutPidigits
  ShootoutRegexDNA
  ShootoutReverseComplement
  ShootoutThreadring


I also see around that time on 2 Dec Fabio says "I have fixed the
Squeak-trunk pipeline and we finally get daily updates again." So
maybe there were suddenly a bundle of improvements that showed up in
one go - but it seems the 2016120322 build should have picked those up
and didn't.
http://forum.world.st/Squeak-trunk-images-td4925570.html

Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Levente Uzonyi
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline   http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
>   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
>  ByteStringHash
>  ClassVarBinding
>  Compiler
>  EqualBytes
>  Fib
>  FillArray
>  FillByteArray
>  FillString
>  Graphsearch
>  HashBytes
>  HashWords
>  InstVarAccess
>  IntLoop
>  IntegerByteCodes
>  ModularConvolutionBytes
>  ModularConvolutionWords
>  ModularDotProductBytes
>  ModularDotProductWords
>  ModularSumBytes
>  ModularSumWords
>  PermutationCompositionArray
>  PermutationCompositionWords
>  Richards
>  Send
>  SendPrimitive
>  SendWithManyArguments
>  Slopstone
>  WideStringHash
>
> Here all series jump down, and the result range seems valid...
>  FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
>  ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid?  rsqueakvm64 shows no change...
>  Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
>  OrderedCollectionRandomInsert
>  Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
>  AStar
>  ArrayAccess
>  BinaryTree
>  BitBltExampleOne
>  DeltaBlue
>  DoesNotUnderstand
>  Json
>  Mandala
>  MandelbrotIterative1Thread
>  MandelbrotIterative2Thread
>  MandelbrotIterative4Thread
>  MandelbrotIterative8Thread
>  MandelbrotRecursive1Thread
>  MandelbrotRecursive2Thread
>  MandelbrotRecursive4Thread
>  MandelbrotRecursive8Thread
>  OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>  Smopstone
>  SplayTree
>  ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
>  Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
>  ShootoutMandelbrot3
>  ShootoutNBody
>
> The follow have no significant change around 5 Dec...
>  BitBltColorMapping  - all already low
>  DSAGen - all already low
>  KMeans
>  LRUCachePrintString
>  Mandelbrot
>  Polymorphy
>  RaiseToLargeNumber
>  RenderFont
>  ShaLongString
>  ShootoutBinarytrees
>  ShootoutChameneosRedux
>  ShootoutFannkuchRedux
>  ShootoutFasta
>  ShootoutFastaRedux
>  ShootoutKnucleotide
>  ShootoutMeteor
>  ShootoutPidigits
>  ShootoutRegexDNA
>  ShootoutReverseComplement
>  ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html

Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

timfelgentreff

No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.


Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline   http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
>   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
>  ByteStringHash
>  ClassVarBinding
>  Compiler
>  EqualBytes
>  Fib
>  FillArray
>  FillByteArray
>  FillString
>  Graphsearch
>  HashBytes
>  HashWords
>  InstVarAccess
>  IntLoop
>  IntegerByteCodes
>  ModularConvolutionBytes
>  ModularConvolutionWords
>  ModularDotProductBytes
>  ModularDotProductWords
>  ModularSumBytes
>  ModularSumWords
>  PermutationCompositionArray
>  PermutationCompositionWords
>  Richards
>  Send
>  SendPrimitive
>  SendWithManyArguments
>  Slopstone
>  WideStringHash
>
> Here all series jump down, and the result range seems valid...
>  FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
>  ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid?  rsqueakvm64 shows no change...
>  Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
>  OrderedCollectionRandomInsert
>  Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
>  AStar
>  ArrayAccess
>  BinaryTree
>  BitBltExampleOne
>  DeltaBlue
>  DoesNotUnderstand
>  Json
>  Mandala
>  MandelbrotIterative1Thread
>  MandelbrotIterative2Thread
>  MandelbrotIterative4Thread
>  MandelbrotIterative8Thread
>  MandelbrotRecursive1Thread
>  MandelbrotRecursive2Thread
>  MandelbrotRecursive4Thread
>  MandelbrotRecursive8Thread
>  OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>  Smopstone
>  SplayTree
>  ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
>  Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
>  ShootoutMandelbrot3
>  ShootoutNBody
>
> The follow have no significant change around 5 Dec...
>  BitBltColorMapping  - all already low
>  DSAGen - all already low
>  KMeans
>  LRUCachePrintString
>  Mandelbrot
>  Polymorphy
>  RaiseToLargeNumber
>  RenderFont
>  ShaLongString
>  ShootoutBinarytrees
>  ShootoutChameneosRedux
>  ShootoutFannkuchRedux
>  ShootoutFasta
>  ShootoutFastaRedux
>  ShootoutKnucleotide
>  ShootoutMeteor
>  ShootoutPidigits
>  ShootoutRegexDNA
>  ShootoutReverseComplement
>  ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html



Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

timfelgentreff

This gets more confusing when we look at RSqueak, because we still see massive changes in performance there due to ongoing refactorings and continued changes to the JIT.

I will just delete the older results.


Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:33:

No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.


Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline   http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
>   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
>  ByteStringHash
>  ClassVarBinding
>  Compiler
>  EqualBytes
>  Fib
>  FillArray
>  FillByteArray
>  FillString
>  Graphsearch
>  HashBytes
>  HashWords
>  InstVarAccess
>  IntLoop
>  IntegerByteCodes
>  ModularConvolutionBytes
>  ModularConvolutionWords
>  ModularDotProductBytes
>  ModularDotProductWords
>  ModularSumBytes
>  ModularSumWords
>  PermutationCompositionArray
>  PermutationCompositionWords
>  Richards
>  Send
>  SendPrimitive
>  SendWithManyArguments
>  Slopstone
>  WideStringHash
>
> Here all series jump down, and the result range seems valid...
>  FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
>  ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid?  rsqueakvm64 shows no change...
>  Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
>  OrderedCollectionRandomInsert
>  Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
>  AStar
>  ArrayAccess
>  BinaryTree
>  BitBltExampleOne
>  DeltaBlue
>  DoesNotUnderstand
>  Json
>  Mandala
>  MandelbrotIterative1Thread
>  MandelbrotIterative2Thread
>  MandelbrotIterative4Thread
>  MandelbrotIterative8Thread
>  MandelbrotRecursive1Thread
>  MandelbrotRecursive2Thread
>  MandelbrotRecursive4Thread
>  MandelbrotRecursive8Thread
>  OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>  Smopstone
>  SplayTree
>  ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
>  Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
>  ShootoutMandelbrot3
>  ShootoutNBody
>
> The follow have no significant change around 5 Dec...
>  BitBltColorMapping  - all already low
>  DSAGen - all already low
>  KMeans
>  LRUCachePrintString
>  Mandelbrot
>  Polymorphy
>  RaiseToLargeNumber
>  RenderFont
>  ShaLongString
>  ShootoutBinarytrees
>  ShootoutChameneosRedux
>  ShootoutFannkuchRedux
>  ShootoutFasta
>  ShootoutFastaRedux
>  ShootoutKnucleotide
>  ShootoutMeteor
>  ShootoutPidigits
>  ShootoutRegexDNA
>  ShootoutReverseComplement
>  ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html



Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

timfelgentreff

One reason for relative changes between RSqueak and Cog as far as I can tell is that previously we were sometimes favoring the RSqueak JIT, especially for the tiny loops. When the number of iterations was a constant in the compiled method (rather than an instance variable determined in the first pass) that constant would end up in our assembler and thus the (already tiny) loop got even shorter because we no longer even read the literal.


Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:36:

This gets more confusing when we look at RSqueak, because we still see massive changes in performance there due to ongoing refactorings and continued changes to the JIT.

I will just delete the older results.


Tim Felgentreff <[hidden email]> schrieb am So., 25. Dez. 2016, 21:33:

No, sorry, my bad. There was no change in hardware.

The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time it took for those to run.

Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.

I should probably just delete older results, because they are no longer comparable.

And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Also, I should update the website to show the image version, too, because we update that sporadically, too.


Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
Probably the tests were run on different hardware.

Levente

On Sun, 25 Dec 2016, Ben Coman wrote:

> On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
> <[hidden email]> wrote:
>> We run benchmarks every day on
>> http://speed.squeak.org/.
>
> Reviewing at the timeline   http://speed.squeak.org/timeline/
> I am curious about some of the performance improvements.
>
> Several significant improvements seem aligned with Cog commit 2016120519
> for example AStar...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>
> which seems to be "Merge pull request #105 from estebanlm/Cog"
> https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>
> But then also aligned with the same Cog commit, there is a
> corresponding improvement in the rsqueak performance, for example
> ArrayAccess...
> http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>
> ...which seems to indicate a common cause from an in-Image
> improvement, for which between 2016120322 and 2016120519 I see "The
> various scanFor: and scanForEmptySlotFor: implementations only need to
> access the size of their array once."
> * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
> * Trunk: System-eem.920.mcz (SystemDictionary)
>   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>
>
> So I'm curious do the benchmarks track both Image and VM changes?
> Perhaps it would be useful to also benchmark Pharo to control for
> Image changes (now that its returned to the fold using the mainline
> opensmalltalk-vm)
>
> Now I'm further curious, the benchmarks below see a massive jump down
> for 2016120519 for all data series, but all results are relatively
> very close to zero, so I wonder are these valid results?
>  ByteStringHash
>  ClassVarBinding
>  Compiler
>  EqualBytes
>  Fib
>  FillArray
>  FillByteArray
>  FillString
>  Graphsearch
>  HashBytes
>  HashWords
>  InstVarAccess
>  IntLoop
>  IntegerByteCodes
>  ModularConvolutionBytes
>  ModularConvolutionWords
>  ModularDotProductBytes
>  ModularDotProductWords
>  ModularSumBytes
>  ModularSumWords
>  PermutationCompositionArray
>  PermutationCompositionWords
>  Richards
>  Send
>  SendPrimitive
>  SendWithManyArguments
>  Slopstone
>  WideStringHash
>
> Here all series jump down, and the result range seems valid...
>  FloatLoop
>
> Here all series jump, and the result range seems valid. Rsqueak improves more...
>  ShootoutSpectraNorm
>
> Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
> very close to zero, so are they valid?  rsqueakvm64 shows no change...
>  Blowfish
>
> Here only Cog jumps down, RSqueak stays much higher, seems valid..
>  OrderedCollectionRandomInsert
>  Nbody
>
> Here only Cog jumps down, RSqueak being already pretty low. The
> results seem valid
>  AStar
>  ArrayAccess
>  BinaryTree
>  BitBltExampleOne
>  DeltaBlue
>  DoesNotUnderstand
>  Json
>  Mandala
>  MandelbrotIterative1Thread
>  MandelbrotIterative2Thread
>  MandelbrotIterative4Thread
>  MandelbrotIterative8Thread
>  MandelbrotRecursive1Thread
>  MandelbrotRecursive2Thread
>  MandelbrotRecursive4Thread
>  MandelbrotRecursive8Thread
>  OrderedCollectionInsertFirst
>
> Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>  Smopstone
>  SplayTree
>  ToolInteraction
>
> Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
> change, seems valid...
>  Fannkuck
>
> Here RSqueak improves, Cog stays the same, seems valid...
>  ShootoutMandelbrot3
>  ShootoutNBody
>
> The follow have no significant change around 5 Dec...
>  BitBltColorMapping  - all already low
>  DSAGen - all already low
>  KMeans
>  LRUCachePrintString
>  Mandelbrot
>  Polymorphy
>  RaiseToLargeNumber
>  RenderFont
>  ShaLongString
>  ShootoutBinarytrees
>  ShootoutChameneosRedux
>  ShootoutFannkuchRedux
>  ShootoutFasta
>  ShootoutFastaRedux
>  ShootoutKnucleotide
>  ShootoutMeteor
>  ShootoutPidigits
>  ShootoutRegexDNA
>  ShootoutReverseComplement
>  ShootoutThreadring
>
>
> I also see around that time on 2 Dec Fabio says "I have fixed the
> Squeak-trunk pipeline and we finally get daily updates again." So
> maybe there were suddenly a bundle of improvements that showed up in
> one go - but it seems the 2016120322 build should have picked those up
> and didn't.
> http://forum.world.st/Squeak-trunk-images-td4925570.html



Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Levente Uzonyi
In reply to this post by timfelgentreff
On Sun, 25 Dec 2016, Tim Felgentreff wrote:

>
> No, sorry, my bad. There was no change in hardware.
>
> The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time
> it took for those to run.
>
> Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing
> such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.
>
> I should probably just delete older results, because they are no longer comparable.
That sounds reasonable.

>
> And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Why is it called nocounters? What is counters?

>
> Also, I should update the website to show the image version, too, because we update that sporadically, too.

Yes, that would be helpful.

Levente

>
>
> Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
>       Probably the tests were run on different hardware.
>
>       Levente
>
>       On Sun, 25 Dec 2016, Ben Coman wrote:
>
>       > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
>       > <[hidden email]> wrote:
>       >> We run benchmarks every day on
>       >> http://speed.squeak.org/.
>       >
>       > Reviewing at the timeline   http://speed.squeak.org/timeline/
>       > I am curious about some of the performance improvements.
>       >
>       > Several significant improvements seem aligned with Cog commit 2016120519
>       > for example AStar...
>       > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>       >
>       > which seems to be "Merge pull request #105 from estebanlm/Cog"
>       > https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>       >
>       > But then also aligned with the same Cog commit, there is a
>       > corresponding improvement in the rsqueak performance, for example
>       > ArrayAccess...
>       > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>       >
>       > ...which seems to indicate a common cause from an in-Image
>       > improvement, for which between 2016120322 and 2016120519 I see "The
>       > various scanFor: and scanForEmptySlotFor: implementations only need to
>       > access the size of their array once."
>       > * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>       >   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
>       > * Trunk: System-eem.920.mcz (SystemDictionary)
>       >   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>       >
>       >
>       > So I'm curious do the benchmarks track both Image and VM changes?
>       > Perhaps it would be useful to also benchmark Pharo to control for
>       > Image changes (now that its returned to the fold using the mainline
>       > opensmalltalk-vm)
>       >
>       > Now I'm further curious, the benchmarks below see a massive jump down
>       > for 2016120519 for all data series, but all results are relatively
>       > very close to zero, so I wonder are these valid results?
>       >  ByteStringHash
>       >  ClassVarBinding
>       >  Compiler
>       >  EqualBytes
>       >  Fib
>       >  FillArray
>       >  FillByteArray
>       >  FillString
>       >  Graphsearch
>       >  HashBytes
>       >  HashWords
>       >  InstVarAccess
>       >  IntLoop
>       >  IntegerByteCodes
>       >  ModularConvolutionBytes
>       >  ModularConvolutionWords
>       >  ModularDotProductBytes
>       >  ModularDotProductWords
>       >  ModularSumBytes
>       >  ModularSumWords
>       >  PermutationCompositionArray
>       >  PermutationCompositionWords
>       >  Richards
>       >  Send
>       >  SendPrimitive
>       >  SendWithManyArguments
>       >  Slopstone
>       >  WideStringHash
>       >
>       > Here all series jump down, and the result range seems valid...
>       >  FloatLoop
>       >
>       > Here all series jump, and the result range seems valid. Rsqueak improves more...
>       >  ShootoutSpectraNorm
>       >
>       > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
>       > very close to zero, so are they valid?  rsqueakvm64 shows no change...
>       >  Blowfish
>       >
>       > Here only Cog jumps down, RSqueak stays much higher, seems valid..
>       >  OrderedCollectionRandomInsert
>       >  Nbody
>       >
>       > Here only Cog jumps down, RSqueak being already pretty low. The
>       > results seem valid
>       >  AStar
>       >  ArrayAccess
>       >  BinaryTree
>       >  BitBltExampleOne
>       >  DeltaBlue
>       >  DoesNotUnderstand
>       >  Json
>       >  Mandala
>       >  MandelbrotIterative1Thread
>       >  MandelbrotIterative2Thread
>       >  MandelbrotIterative4Thread
>       >  MandelbrotIterative8Thread
>       >  MandelbrotRecursive1Thread
>       >  MandelbrotRecursive2Thread
>       >  MandelbrotRecursive4Thread
>       >  MandelbrotRecursive8Thread
>       >  OrderedCollectionInsertFirst
>       >
>       > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>       >  Smopstone
>       >  SplayTree
>       >  ToolInteraction
>       >
>       > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
>       > change, seems valid...
>       >  Fannkuck
>       >
>       > Here RSqueak improves, Cog stays the same, seems valid...
>       >  ShootoutMandelbrot3
>       >  ShootoutNBody
>       >
>       > The follow have no significant change around 5 Dec...
>       >  BitBltColorMapping  - all already low
>       >  DSAGen - all already low
>       >  KMeans
>       >  LRUCachePrintString
>       >  Mandelbrot
>       >  Polymorphy
>       >  RaiseToLargeNumber
>       >  RenderFont
>       >  ShaLongString
>       >  ShootoutBinarytrees
>       >  ShootoutChameneosRedux
>       >  ShootoutFannkuchRedux
>       >  ShootoutFasta
>       >  ShootoutFastaRedux
>       >  ShootoutKnucleotide
>       >  ShootoutMeteor
>       >  ShootoutPidigits
>       >  ShootoutRegexDNA
>       >  ShootoutReverseComplement
>       >  ShootoutThreadring
>       >
>       >
>       > I also see around that time on 2 Dec Fabio says "I have fixed the
>       > Squeak-trunk pipeline and we finally get daily updates again." So
>       > maybe there were suddenly a bundle of improvements that showed up in
>       > one go - but it seems the 2016120322 build should have picked those up
>       > and didn't.
>       > http://forum.world.st/Squeak-trunk-images-td4925570.html
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

timfelgentreff

Nocounters does not have the activation counters for Sista, counters does (but without Sista active). We have both to see what the impact of the counters is. Those two and the Sista set of benchmarks all run on Pharo


Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 23:02:
On Sun, 25 Dec 2016, Tim Felgentreff wrote:

>
> No, sorry, my bad. There was no change in hardware.
>
> The jumps you are seeing stem from a change in how we report our measurements. Before, we had scaled all benchmarks to find a number of iterations that took longer than 600ms on Cog and then reported the time
> it took for those to run.
>
> Now we do a first pass to find a number of iterations that take 600ms for each VM separately each time, and when we report we divide by the number of iterations. This is, for example, why we are now seeing
> such low numbers for simple loops. Before we were reporting something like a few million iterations, now we still run those millions of iterations, but we then divide to get the time each iteration took.
>
> I should probably just delete older results, because they are no longer comparable.

That sounds reasonable.

>
> And we are already running Pharo tests, the "nocounters" VM is the exact same Cog VM running the benchmarks in a Pharo image.

Why is it called nocounters? What is counters?

>
> Also, I should update the website to show the image version, too, because we update that sporadically, too.

Yes, that would be helpful.

Levente

>
>
> Levente Uzonyi <[hidden email]> schrieb am So., 25. Dez. 2016, 16:10:
>       Probably the tests were run on different hardware.
>
>       Levente
>
>       On Sun, 25 Dec 2016, Ben Coman wrote:
>
>       > On Sat, Dec 24, 2016 at 8:13 PM, Tim Felgentreff
>       > <[hidden email]> wrote:
>       >> We run benchmarks every day on
>       >> http://speed.squeak.org/.
>       >
>       > Reviewing at the timeline   http://speed.squeak.org/timeline/
>       > I am curious about some of the performance improvements.
>       >
>       > Several significant improvements seem aligned with Cog commit 2016120519
>       > for example AStar...
>       > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=AStar&env=2&revs=50&equid=off&quarts=on&extr=on
>       >
>       > which seems to be "Merge pull request #105 from estebanlm/Cog"
>       > https://github.com/OpenSmalltalk/opensmalltalk-vm/network
>       >
>       > But then also aligned with the same Cog commit, there is a
>       > corresponding improvement in the rsqueak performance, for example
>       > ArrayAccess...
>       > http://speed.squeak.org/timeline/#/?exe=2,4,1,5,6,7,8,9&ben=ArrayAccess&env=2&revs=50&equid=off&quarts=on&extr=on
>       >
>       > ...which seems to indicate a common cause from an in-Image
>       > improvement, for which between 2016120322 and 2016120519 I see "The
>       > various scanFor: and scanForEmptySlotFor: implementations only need to
>       > access the size of their array once."
>       > * Trunk: Kernel-eem.1050.mcz  (MethodDictionary)
>       >   http://forum.world.st/The-Trunk-Kernel-eem-1050-mcz-td4925618.html
>       > * Trunk: System-eem.920.mcz (SystemDictionary)
>       >   http://forum.world.st/The-Trunk-System-eem-920-mcz-td4925619.html
>       >
>       >
>       > So I'm curious do the benchmarks track both Image and VM changes?
>       > Perhaps it would be useful to also benchmark Pharo to control for
>       > Image changes (now that its returned to the fold using the mainline
>       > opensmalltalk-vm)
>       >
>       > Now I'm further curious, the benchmarks below see a massive jump down
>       > for 2016120519 for all data series, but all results are relatively
>       > very close to zero, so I wonder are these valid results?
>       >  ByteStringHash
>       >  ClassVarBinding
>       >  Compiler
>       >  EqualBytes
>       >  Fib
>       >  FillArray
>       >  FillByteArray
>       >  FillString
>       >  Graphsearch
>       >  HashBytes
>       >  HashWords
>       >  InstVarAccess
>       >  IntLoop
>       >  IntegerByteCodes
>       >  ModularConvolutionBytes
>       >  ModularConvolutionWords
>       >  ModularDotProductBytes
>       >  ModularDotProductWords
>       >  ModularSumBytes
>       >  ModularSumWords
>       >  PermutationCompositionArray
>       >  PermutationCompositionWords
>       >  Richards
>       >  Send
>       >  SendPrimitive
>       >  SendWithManyArguments
>       >  Slopstone
>       >  WideStringHash
>       >
>       > Here all series jump down, and the result range seems valid...
>       >  FloatLoop
>       >
>       > Here all series jump, and the result range seems valid. Rsqueak improves more...
>       >  ShootoutSpectraNorm
>       >
>       > Here cog32, cog64 & rsqueakvm32 have a small jump down, but its is
>       > very close to zero, so are they valid?  rsqueakvm64 shows no change...
>       >  Blowfish
>       >
>       > Here only Cog jumps down, RSqueak stays much higher, seems valid..
>       >  OrderedCollectionRandomInsert
>       >  Nbody
>       >
>       > Here only Cog jumps down, RSqueak being already pretty low. The
>       > results seem valid
>       >  AStar
>       >  ArrayAccess
>       >  BinaryTree
>       >  BitBltExampleOne
>       >  DeltaBlue
>       >  DoesNotUnderstand
>       >  Json
>       >  Mandala
>       >  MandelbrotIterative1Thread
>       >  MandelbrotIterative2Thread
>       >  MandelbrotIterative4Thread
>       >  MandelbrotIterative8Thread
>       >  MandelbrotRecursive1Thread
>       >  MandelbrotRecursive2Thread
>       >  MandelbrotRecursive4Thread
>       >  MandelbrotRecursive8Thread
>       >  OrderedCollectionInsertFirst
>       >
>       > Here only Cog jumps down, Rsqueak is unchanged or not present, seems valid...
>       >  Smopstone
>       >  SplayTree
>       >  ToolInteraction
>       >
>       > Here only cog32 jumps down, cog64, rsqueakvm32 & rsqueakvm64 no
>       > change, seems valid...
>       >  Fannkuck
>       >
>       > Here RSqueak improves, Cog stays the same, seems valid...
>       >  ShootoutMandelbrot3
>       >  ShootoutNBody
>       >
>       > The follow have no significant change around 5 Dec...
>       >  BitBltColorMapping  - all already low
>       >  DSAGen - all already low
>       >  KMeans
>       >  LRUCachePrintString
>       >  Mandelbrot
>       >  Polymorphy
>       >  RaiseToLargeNumber
>       >  RenderFont
>       >  ShaLongString
>       >  ShootoutBinarytrees
>       >  ShootoutChameneosRedux
>       >  ShootoutFannkuchRedux
>       >  ShootoutFasta
>       >  ShootoutFastaRedux
>       >  ShootoutKnucleotide
>       >  ShootoutMeteor
>       >  ShootoutPidigits
>       >  ShootoutRegexDNA
>       >  ShootoutReverseComplement
>       >  ShootoutThreadring
>       >
>       >
>       > I also see around that time on 2 Dec Fabio says "I have fixed the
>       > Squeak-trunk pipeline and we finally get daily updates again." So
>       > maybe there were suddenly a bundle of improvements that showed up in
>       > one go - but it seems the 2016120322 build should have picked those up
>       > and didn't.
>       > http://forum.world.st/Squeak-trunk-images-td4925570.html
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Squeak VM Speed Centre - validity and basis of improvements 5 Dec

Ben Coman
On Mon, Dec 26, 2016 at 6:51 AM, Tim Felgentreff
<[hidden email]> wrote:
> Nocounters does not have the activation counters for Sista, counters does
> (but without Sista active). We have both to see what the impact of the
> counters is. Those two and the Sista set of benchmarks all run on Pharo

it would be useful to have that info on the About page, if not elsewhere.

cheers -ben