SqueakCI Benchmarking

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

SqueakCI Benchmarking

Jeff Gonis-2
Hi Everyone,

So with a lot of help from Frank Shearar and Nicolas Cellier, I have
introduced performance benchmarking to the SqueakCI server.  You can
see our current performance trends at the following link:
http://build.squeak.org/job/SqueakTrunk/performance/

I grabbed the shootout code from Nicolas Cellier, and tried to pick
input values that would allow the tests to run in 5-10 seconds, enough
to allow room for significant improvements without making the build
take forever.  I would like to encourage anyone who is interested to
take a look at the benchmarks.st script that is part of the SqueakCI
setup and send me advice if you have it.  I haven't done much
benchmarking so I am sure there are many places I could improve what
is done.  In the future I would also like to maybe add python or ruby
benchmarks to the server, so that we can compare Squeak's performance
vs other current languages.

I would also like to encourage anyone who is interested to grab
Nicolas' code and try to speed up Squeak itself, either on the image
or the VM side.  Now you can see the results of your work right away!

So, that is where things stand right now.  This is just the first
step.  Much as Frank has greatly expanded the CI server, I hope to
eventually add in benchmarks looking at graphical performance in
squeak (Balloon vs Gezira vs Cairo, etc), maybe network performance,
etc, etc.  If anyone has any ideas or suggestions I would love to hear
them.

Thank you all for your time, and for any feedback you can provide.
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Nicolas Cellier
Just a detail, this is not my code, it's code from various authors
found on shootout benchmarking site, and ported by Eliot on Squeak.
I just took care to extract form VMMaker and publish the package on
its own, because it has its own value.

Otherwise, that's great to see it in action!

What will trigger an evaluation, e new published VM?
Do you plan to support several VM flavours (Cog, Stack, Interpreter)?
If so, it would be great to compare them.

Nicolas

2013/2/26 Jeff Gonis <[hidden email]>:

> Hi Everyone,
>
> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
> introduced performance benchmarking to the SqueakCI server.  You can
> see our current performance trends at the following link:
> http://build.squeak.org/job/SqueakTrunk/performance/
>
> I grabbed the shootout code from Nicolas Cellier, and tried to pick
> input values that would allow the tests to run in 5-10 seconds, enough
> to allow room for significant improvements without making the build
> take forever.  I would like to encourage anyone who is interested to
> take a look at the benchmarks.st script that is part of the SqueakCI
> setup and send me advice if you have it.  I haven't done much
> benchmarking so I am sure there are many places I could improve what
> is done.  In the future I would also like to maybe add python or ruby
> benchmarks to the server, so that we can compare Squeak's performance
> vs other current languages.
>
> I would also like to encourage anyone who is interested to grab
> Nicolas' code and try to speed up Squeak itself, either on the image
> or the VM side.  Now you can see the results of your work right away!
>
> So, that is where things stand right now.  This is just the first
> step.  Much as Frank has greatly expanded the CI server, I hope to
> eventually add in benchmarks looking at graphical performance in
> squeak (Balloon vs Gezira vs Cairo, etc), maybe network performance,
> etc, etc.  If anyone has any ideas or suggestions I would love to hear
> them.
>
> Thank you all for your time, and for any feedback you can provide.
> Jeff
>

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Jeff Gonis-2
On Tue, Feb 26, 2013 at 2:16 PM, Nicolas Cellier
<[hidden email]> wrote:
> Just a detail, this is not my code, it's code from various authors
> found on shootout benchmarking site, and ported by Eliot on Squeak.
> I just took care to extract form VMMaker and publish the package on
> its own, because it has its own value.
>
> Otherwise, that's great to see it in action!

Thank you, I appreciate that.  I decided that I'd spent too much time
following open source projects, and not enough time contributing to
them, so I figured this would be a place I could start.

>
> What will trigger an evaluation, e new published VM?
> Do you plan to support several VM flavours (Cog, Stack, Interpreter)?
> If so, it would be great to compare them.
>
> Nicolas

Right now the benchmarks are run when the "SqueakTrunk" project is run
on the CI server.  I could look at other projects that build the
different VMs and run the benchmarks on those projects as well, but
comparing benchmarks in an inter-project fashion would be more
complicated, just due to the mechanics of the Jenkins performance
plugin.

I will look into creating a new CI project that contains all 3 VMs and
runs the tests on those and then publishes the results under that
project.  I should mention that another hitch is that I haven't found
a convenient way of getting the results onto a single graph, so you
can compare the lines, so it would be 1 graph for each of the VMs.
But that shouldn't be too big of a downside.

Thanks for your feedback!
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Stefan Marr-3
In reply to this post by Jeff Gonis-2
Hi Jeff:

On 26 Feb 2013, at 21:58, Jeff Gonis wrote:

> Hi Everyone,
>
> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
> introduced performance benchmarking to the SqueakCI server.  You can
> see our current performance trends at the following link:
> http://build.squeak.org/job/SqueakTrunk/performance/

> [..]

> Thank you all for your time, and for any feedback you can provide.
> Jeff

Looks very interesting.

I do something similar for the RoarVM at http://soft.vub.ac.be/~ppp/codespeed/ (page loading times aren't great...)

That's all based on SMark [1] a benchmarking framework in the SUnit-style,
which also properly warms up the JIT on the CogVMs.
You might want to look into it, it has also a number of the Benchmark game benchmarks.

Another thing, what's your rational for choosing  5-10sec runtimes?

I try typically to make the runtime just long enough to avoid an impact of imprecise time measurement.
Most of the time, the goal is to keep the runtime low and avoid triggering GC, except when GC is supposed to be measured.

Best regards
Stefan


[1] http://smalltalkhub.com/#!/~StefanMarr/SMark

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Jeff Gonis-2
On Tue, Feb 26, 2013 at 2:28 PM, Stefan Marr <[hidden email]> wrote:

> Hi Jeff:
>
> On 26 Feb 2013, at 21:58, Jeff Gonis wrote:
>
>> Hi Everyone,
>>
>> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
>> introduced performance benchmarking to the SqueakCI server.  You can
>> see our current performance trends at the following link:
>> http://build.squeak.org/job/SqueakTrunk/performance/
>
>> [..]
>
>> Thank you all for your time, and for any feedback you can provide.
>> Jeff
>
> Looks very interesting.
>
> I do something similar for the RoarVM at http://soft.vub.ac.be/~ppp/codespeed/ (page loading times aren't great...)
>
> That's all based on SMark [1] a benchmarking framework in the SUnit-style,
> which also properly warms up the JIT on the CogVMs.
> You might want to look into it, it has also a number of the Benchmark game benchmarks.
>
> Another thing, what's your rational for choosing  5-10sec runtimes?
>
> I try typically to make the runtime just long enough to avoid an impact of imprecise time measurement.
> Most of the time, the goal is to keep the runtime low and avoid triggering GC, except when GC is supposed to be measured.
>
> Best regards
> Stefan
>
>
> [1] http://smalltalkhub.com/#!/~StefanMarr/SMark
>
> --
> Stefan Marr
> Software Languages Lab
> Vrije Universiteit Brussel
> Pleinlaan 2 / B-1050 Brussels / Belgium
> http://soft.vub.ac.be/~smarr
> Phone: +32 2 629 2974
> Fax:   +32 2 629 3525
>
>

Wow, holy smokes that benchmarking page is impressive.  What sort of
setup is that running on? Maybe something similar to that for Squeak
could be my long term goal.

As for my rationale for a 5-10 second running time? I wanted something
that would allow for the image and the VM to become much faster
without having to adjust the benchmark too much.  Giving us plenty of
headroom as it were.  That way we could have a single unbroken chain
of progress on the graph, as things speed up.  I didn't mind
triggering the GC and such, because I felt that improvements to that
should be reflected as part of our speed, and I didn't have a specific
GC benchmark that I could use instead.

I have no idea if this rationale is completely wrong-headed, as I said
I haven't really done this before, so I just went with my gut.  If
many people feel I have headed in the wrong direction it won't be much
effort to change.

Thanks for your feedback, I appreciate it.
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Frank Shearar-3
In reply to this post by Jeff Gonis-2
On 26 February 2013 20:58, Jeff Gonis <[hidden email]> wrote:
> Hi Everyone,
>
> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
> introduced performance benchmarking to the SqueakCI server.  You can
> see our current performance trends at the following link:
> http://build.squeak.org/job/SqueakTrunk/performance/

Thanks for the time & effort, Jeff! Pretty graphs are always pretty!

frank

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Jeff Gonis-2
On Tue, Feb 26, 2013 at 3:04 PM, Frank Shearar <[hidden email]> wrote:

> On 26 February 2013 20:58, Jeff Gonis <[hidden email]> wrote:
>> Hi Everyone,
>>
>> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
>> introduced performance benchmarking to the SqueakCI server.  You can
>> see our current performance trends at the following link:
>> http://build.squeak.org/job/SqueakTrunk/performance/
>
> Thanks for the time & effort, Jeff! Pretty graphs are always pretty!
>
> frank
>

Graphics are always nice, but if you want the numbers you can also
visit: http://build.squeak.org/job/SqueakTrunk/lastBuild/performance/

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Stefan Marr-3
In reply to this post by Jeff Gonis-2
Hi Jeff:

On 26 Feb 2013, at 22:54, Jeff Gonis wrote:
> Wow, holy smokes that benchmarking page is impressive.  What sort of
> setup is that running on? Maybe something similar to that for Squeak
> could be my long term goal.

Well, the setup is a bit more elaborate.
I track the RoarVM performance on a number of machines, including a little 64core Tilera manycore chip.

The webapp is Codespeed, a spinoff of the PyPy project [1].

The benchmarks are written with SMark as mentioned earlier.

But they aren't executed by SMark, instead,
ReBench [2] is used to make sure I measure not just noise.
ReBench drives the benchmark execution and starts the VM with the different benchmarks independently.
So, I don't get any funky interactions between different benchmarks which could happen if they were execute during the same VM run.


> As for my rationale for a 5-10 second running time? I wanted something
> that would allow for the image and the VM to become much faster
> without having to adjust the benchmark too much.  Giving us plenty of
> headroom as it were.  That way we could have a single unbroken chain
> of progress on the graph, as things speed up.  I didn't mind
> triggering the GC and such, because I felt that improvements to that
> should be reflected as part of our speed, and I didn't have a specific
> GC benchmark that I could use instead.

Ah, ok, very optimistic ;)
Well, for longterm performance tracking that make sense.

To test for performance regressions, as I do it, I think, more specific/focused benchmarks are another option. Here the longterm aspect isn't as crucial. Instead, the results only need to be comparable to the previous few runs. That way, I have an indication whether I messed something up.

Bye the way, is the slave the benchmarks is run on idle, and only used for those, i.e., it doesn't run any build jobs in parallel?

Best regards
Stefan


[1] https://github.com/tobami/codespeed/
[2] https://github.com/smarr/ReBench

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Frank Shearar-3
On 26 February 2013 22:16, Stefan Marr <[hidden email]> wrote:

> Hi Jeff:
>
> On 26 Feb 2013, at 22:54, Jeff Gonis wrote:
>> Wow, holy smokes that benchmarking page is impressive.  What sort of
>> setup is that running on? Maybe something similar to that for Squeak
>> could be my long term goal.
>
> Well, the setup is a bit more elaborate.
> I track the RoarVM performance on a number of machines, including a little 64core Tilera manycore chip.
>
> The webapp is Codespeed, a spinoff of the PyPy project [1].
>
> The benchmarks are written with SMark as mentioned earlier.
>
> But they aren't executed by SMark, instead,
> ReBench [2] is used to make sure I measure not just noise.
> ReBench drives the benchmark execution and starts the VM with the different benchmarks independently.
> So, I don't get any funky interactions between different benchmarks which could happen if they were execute during the same VM run.
>
>
>> As for my rationale for a 5-10 second running time? I wanted something
>> that would allow for the image and the VM to become much faster
>> without having to adjust the benchmark too much.  Giving us plenty of
>> headroom as it were.  That way we could have a single unbroken chain
>> of progress on the graph, as things speed up.  I didn't mind
>> triggering the GC and such, because I felt that improvements to that
>> should be reflected as part of our speed, and I didn't have a specific
>> GC benchmark that I could use instead.
>
> Ah, ok, very optimistic ;)
> Well, for longterm performance tracking that make sense.
>
> To test for performance regressions, as I do it, I think, more specific/focused benchmarks are another option. Here the longterm aspect isn't as crucial. Instead, the results only need to be comparable to the previous few runs. That way, I have an indication whether I messed something up.
>
> Bye the way, is the slave the benchmarks is run on idle, and only used for those, i.e., it doesn't run any build jobs in parallel?

The slave can run two concurrent builds (so yes, the benchmark
accuracy is compromised). Otherwise, as far as I know at least, the
box only serves up Jenkins. Chris Cunnington or Ken Causey could give
more detailed answers: I just keep adding CI jobs to the thing.

frank

> Best regards
> Stefan
>
>
> [1] https://github.com/tobami/codespeed/
> [2] https://github.com/smarr/ReBench
>
> --
> Stefan Marr
> Software Languages Lab
> Vrije Universiteit Brussel
> Pleinlaan 2 / B-1050 Brussels / Belgium
> http://soft.vub.ac.be/~smarr
> Phone: +32 2 629 2974
> Fax:   +32 2 629 3525
>
>

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Jeff Gonis-2
> The slave can run two concurrent builds (so yes, the benchmark
> accuracy is compromised). Otherwise, as far as I know at least, the
> box only serves up Jenkins. Chris Cunnington or Ken Causey could give
> more detailed answers: I just keep adding CI jobs to the thing.
>
> frank

Frank, you probably know the CI best, is the SqueakTrunk project a
"root" project as it were?  I know that it triggers builds in a number
of other projects, but doesn't it have to build first, before the
other projects start?  If that is true, then would it not be the only
job running at the time as it will run first?  If so, we might be able
to get away with not having too much disruption to the benchmarks from
other projects running.

Also, if the other projects are built on their own thread, will that
not also help to isolate SqueakTrunk, as the VM is still single
threaded, and the benchmarks don't grab any data from the disk after
startup?

Thanks,
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Ken Causey-3
In reply to this post by Frank Shearar-3
On 02/26/2013 04:29 PM, Frank Shearar wrote:
> The slave can run two concurrent builds (so yes, the benchmark
> accuracy is compromised). Otherwise, as far as I know at least, the
> box only serves up Jenkins. Chris Cunnington or Ken Causey could give
> more detailed answers: I just keep adding CI jobs to the thing.
>
> frank

Hmm, yes I believe is it currently the case that very little will
interfere with accurate benchmarks.  But that is not going to be true
for very long; the plan is to setup the mailing lists on the same server
as build.squeak.org.  That's going to throw a serious wrench into
accurate benchmarking.

Disclaimer: I know very little about Jenkins and not much about how we
are using it.

That said, I seem to remember some work was done to make it possible to
do builds (or whatever) remotely?  Perhaps benchmarking can be done
remotely (adding the benefit of including platform specific info)?  That
said the same problem holds if the user of the remote system is doing
much on their system.

Ken

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Bert Freudenberg
In reply to this post by Jeff Gonis-2

On 2013-02-26, at 21:58, Jeff Gonis <[hidden email]> wrote:

> Hi Everyone,
>
> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
> introduced performance benchmarking to the SqueakCI server.  You can
> see our current performance trends at the following link:
> http://build.squeak.org/job/SqueakTrunk/performance/

Interesting. Why is there already such a large difference between builds #182 and #183? Or are the results simply too noisy? TO be useful they should be fairly consistent.

> I hope to
> eventually add in benchmarks looking at graphical performance in
> squeak (Balloon vs Gezira vs Cairo, etc), maybe network performance,
> etc, etc.  If anyone has any ideas or suggestions I would love to hear
> them.


Maybe some macro benchmarks would be useful: opening a browser, finding senders, accepting methods, parsing an xml file, etc.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Jeff Gonis-2
Hi Bert,

>On Tue, Feb 26, 2013 at 4:06 PM, Bert Freudenberg <[hidden email]> wrote:
>
> Interesting. Why is there already such a large difference between builds #182 and #183? Or are the results simply too noisy? TO be useful they should be fairly consistent.

So I think that at a minimum, I can change the benchmarks to run 3
times each and then kick out the lowest and highest, which should also
allow the JIT in Cog to warm up.  After that I can investigate
additional sources of noise in the results. I will also look at using
the SMark framework to see if that provides more consistency. That
might take longer though.

> Maybe some macro benchmarks would be useful: opening a browser, finding senders, accepting methods, parsing an xml file, etc.
>
> - Bert -

This is a great idea, and along the lines of what I had in mind. I
would also like to do some benchmarks for various Canvas operations to
see if we can get some speedups there, and also to compare against
different implementations, such as Gezira or Athens.

Thanks,
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Eliot Miranda-2
In reply to this post by Bert Freudenberg


On Tue, Feb 26, 2013 at 3:06 PM, Bert Freudenberg <[hidden email]> wrote:

On 2013-02-26, at 21:58, Jeff Gonis <[hidden email]> wrote:

> Hi Everyone,
>
> So with a lot of help from Frank Shearar and Nicolas Cellier, I have
> introduced performance benchmarking to the SqueakCI server.  You can
> see our current performance trends at the following link:
> http://build.squeak.org/job/SqueakTrunk/performance/

Interesting. Why is there already such a large difference between builds #182 and #183? Or are the results simply too noisy? TO be useful they should be fairly consistent.

> I hope to
> eventually add in benchmarks looking at graphical performance in
> squeak (Balloon vs Gezira vs Cairo, etc), maybe network performance,
> etc, etc.  If anyone has any ideas or suggestions I would love to hear
> them.


Maybe some macro benchmarks would be useful: opening a browser, finding senders, accepting methods, parsing an xml file, etc.

Be very careful about reflective benchmarks.  The speed of finding senders depends on the host hardware, the VM, *and* the number of classes and methods in the system.  So any such benchmark results need to be scaled by the number of classes and methods in some way to normalize.

Please *don't* add any such benchmarks to the shootout tests.  These are designed to compare the performance of specific algorithms across languages.  And one reason I cherry-picked these was to use benchmarks that stressed the VM, not C libraries or plugins.  So I'm not particularly interested in adding anything to Shootout that depends on externally-compiled code such as a regexp benchmark that simply tests the compilation of a regexp plugin.

By all means add Squeak-specifc benchmarks, but do so in a different package, please.  If you look in Smalltalk-80.sources you'll find the standard set of Smalltalk benchmarks used back in the day (and I expect no one would mind if we stole these).  But the find implementors and senders there-in are not yet scaled by the size of the system.

--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: SqueakCI Benchmarking

Frank Shearar-3
In reply to this post by Jeff Gonis-2
On 26 February 2013 22:35, Jeff Gonis <[hidden email]> wrote:

>> The slave can run two concurrent builds (so yes, the benchmark
>> accuracy is compromised). Otherwise, as far as I know at least, the
>> box only serves up Jenkins. Chris Cunnington or Ken Causey could give
>> more detailed answers: I just keep adding CI jobs to the thing.
>>
>> frank
>
> Frank, you probably know the CI best, is the SqueakTrunk project a
> "root" project as it were?  I know that it triggers builds in a number
> of other projects, but doesn't it have to build first, before the
> other projects start?  If that is true, then would it not be the only
> job running at the time as it will run first?  If so, we might be able
> to get away with not having too much disruption to the benchmarks from
> other projects running.

Yes, SqueakTrunk is a root project, so in the general case ought to
trigger on its own. It's just not _guaranteed_ to run on its own.

frank

> Also, if the other projects are built on their own thread, will that
> not also help to isolate SqueakTrunk, as the VM is still single
> threaded, and the benchmarks don't grab any data from the disk after
> startup?
>
> Thanks,
> Jeff
>