A Benchmarking tool for the trunk?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

A Benchmarking tool for the trunk?

timfelgentreff
Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.


As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.


Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

cheers,
Tim


Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Eliot Miranda-2
Hi Tim,


On Apr 27, 2016, at 12:37 PM, Tim Felgentreff <[hidden email]> wrote:

Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.


As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.

IMO 600ms is about 500 times too short ;-). Is this parameterised?



Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

That would be fabulous.  Hence
- can it be controlled from the command line?
- is it portable to Pharo?


cheers,
Tim



Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

David T. Lewis
In reply to this post by timfelgentreff
On Wed, Apr 27, 2016 at 09:37:29PM +0200, Tim Felgentreff wrote:
> Hi,
>
> We have a proposal for a tool that we think might be useful to have in
> trunk.

This looks very nice! May I suggest that you (or we) create a SqueakMap
entry for this, so that it can be easily located and loaded in Squeak
5.0 and trunk images? If it also works in Pharo, someone will probably
volunteer to make a ConfigurationOfBenchmark also.

There are lots of advantages to maintaining a package like this as an
external package, just as long as the package is easy to find and easy
to load. It seems to me that this would be especially important for a
benchmarking suite, because we would want to encourage people to use
the same suite in Cuis, Pharo, and other images in the Squeak family.

Dave


>
> We spent some time pulling together benchmarks from various sources
> (papers, the mailinglist, projects on squeaksource, ...) and combining them
> with an extended version of Stefan Marr's implementation of a benchmarking
> framework SMark. The tool and framework are modeled after SUnit, and
> include different execution suites and code to figure out confidence
> variations over multiple runs and such. Also, it draws graphs over multiple
> runs so you can look at things like warmup and GC behavior, and see how
> much time is spent doing incremental GCs and full GCs vs plain execution.
> As a part of this I fixed the EPS export so these graphs can be exported in
> a scalable format.
>
> Here is a picture of the tool:
> https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg
>
> As I said, it's modeled after TestRunner and SUnit, benchmarks subclass
> from the "Benchmark" class, any method starting with "bench" is a
> benchmark, and you can have setUp and tearDown methods as usual. By default
> the benchmarks are run under an Autosize runner that re-executes each
> benchmark until the combined runtime reaches 600ms (to smooth out any
> noise). Beyond that, you can specify a number of iterations that the runner
> will re-do that to see multiple averaged runs. The graph shows the
> execution times split between running code (gray) incremental GCs (yellow)
> and full GCs (red). There are popups and you can scroll to zoom in and out.
> There is also a history of benchmark runs stored on the class side of
> benchmark classes for later reference.
>
> The code currently lives here:
> http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner
>
> Considering we are every so often discussing benchmark results here, I
> think it might be useful to share an execution framework for those.
>
> cheers,
> Tim

>


Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

timfelgentreff
In reply to this post by Eliot Miranda-2
Hi Eliot,

On 28 April 2016 at 00:44, Eliot Miranda <[hidden email]> wrote:
>
> IMO 600ms is about 500 times too short ;-). Is this parameterised?
>

This is parameterised, but let me elaborate on why I think this is ok.
There are three levels to each benchmark:

1. Problem size.
This depends on the benchmark, e.g. a linear progression in
BinaryTrees will give you exponentially growing runtime. This should
be chosen by the benchmark implementer to be reasonably small while
still providing reasonable results. For example, the fibonacci
benchmark shouldn't set it's problem size only to 4 or 5, because e.g.
on RSqueak the JIT could simply unroll this completely and generate
machine code that just checks that the relevant method dicts haven't
changed and return the constant. So the problem size should be large
enough for that.

2. Autosize iterations
These are dynamically chosen per machine to execute a benchmark with a
fixed problem size repeatedly to average any noise from e.g. OS-level
scheduling and such. I think this is usually fine with 500-1000ms,
because OS-level interruptions are then distributed fairly evenly. The
autosizer simply finds a small number of iterations to run the inner
benchmarks and then averages the runs to get rid of the small noise.

3. Benchmark iterations
This is something that you choose when you use the tool, to actually
do enough runs to warmup the JIT and such. With an autosize time of
about 600ms, I usually choose 100 Benchmark iterations, so that the
overall benchmarking time the benchmark runs will be about 60 seconds.
In the tool, we measure times and GC stats between these iterations to
get the bar chart.

All three of these levels are configurable, but the UI just asks you
for the third.

> - can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark
you can write e.g.
- BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
iterations, reporting statistics in the autosize suite
- BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
benchmarks for 100 outer iterations without autosizing, but with one
extra iteration for warmup
...

Output is printed to stdout.


> - is it portable to Pharo?

I don't see why it shouldn't work immediately, unless they don't have
the tool builder anymore. Might be that they removed the Postscript
Canvas, then you cannot export your benchmarks as easily.

>
> cheers,
> Tim
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Stefan Marr-3
Hi Tim:

> On 28 Apr 2016, at 13:01, Tim Felgentreff <[hidden email]> wrote:
>
>> - can it be controlled from the command line?
>
> Yes, provided you mean "use a .st file argument". To run a benchmark
> you can write e.g.
> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
> iterations, reporting statistics in the autosize suite
> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
> benchmarks for 100 outer iterations without autosizing, but with one
> extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html

$ squeak-vm.sh Pharo-1.2.image --help
SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter] <suiteOrBenchmark>
                               [iterations [processes [problemSize]]]

Arguments:
 runner             optional, a SMarkRunner class that executes the benchmarks
 reporter           optional, a SMarkReporter class that processes
                              and displays the results
 suiteOrBenchmark   required, either a SMarkSuite with benchmarks,
                              or a benchmark denoted by Suite.benchName
 iterations         optional, number of times the benchmarks are repeated
 processes          optional, number of processes/threads used by the benchmarks
 problemSize        optional, depending on benchmark for instance number of
                              inner iterations or size of used data set


Best regards
Stefan

--
Stefan Marr
Johannes Kepler Universität Linz
http://stefan-marr.de/research/




Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Eliot Miranda-2
Hi Tim,

    the below is lovely and makes it easy to run from the command line.  Please can we keep it?  The Mac VM's command line support is broken (but being fixed) so test in Windows using a console VM and/or in Linux.

One more request, it would be great if the package was load able and runnable in VW in some form so that at least one can gather a complete baseline set of results from VW.

_,,,^..^,,,_ (phone)

> On Apr 28, 2016, at 5:19 AM, Stefan Marr <[hidden email]> wrote:
>
> Hi Tim:
>
>>> On 28 Apr 2016, at 13:01, Tim Felgentreff <[hidden email]> wrote:
>>>
>>> - can it be controlled from the command line?
>>
>> Yes, provided you mean "use a .st file argument". To run a benchmark
>> you can write e.g.
>> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
>> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
>> iterations, reporting statistics in the autosize suite
>> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
>> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
>> benchmarks for 100 outer iterations without autosizing, but with one
>> extra iteration for warmup
>
> I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.
>
> See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html
>
> $ squeak-vm.sh Pharo-1.2.image --help
> SMark Benchmark Framework, version: SMark-StefanMarr.12
>
> Usage: <vm+image> SMarkHarness [runner] [reporter] <suiteOrBenchmark>
>                               [iterations [processes [problemSize]]]
>
> Arguments:
> runner             optional, a SMarkRunner class that executes the benchmarks
> reporter           optional, a SMarkReporter class that processes
>                              and displays the results
> suiteOrBenchmark   required, either a SMarkSuite with benchmarks,
>                              or a benchmark denoted by Suite.benchName
> iterations         optional, number of times the benchmarks are repeated
> processes          optional, number of processes/threads used by the benchmarks
> problemSize        optional, depending on benchmark for instance number of
>                              inner iterations or size of used data set
>
>
> Best regards
> Stefan
>
> --
> Stefan Marr
> Johannes Kepler Universität Linz
> http://stefan-marr.de/research/
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Levente Uzonyi
In reply to this post by timfelgentreff
On Wed, 27 Apr 2016, Tim Felgentreff wrote:

> Hi,
>
> We have a proposal for a tool that we think might be useful to have in trunk.
>
> We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them
> with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and
> include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple
> runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution.
> As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.
>
> Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg
>
> As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a
> benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each
> benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner
> will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and
> full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of
> benchmark classes for later reference.
>
> The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner
The link seems to be broken.

Levente

>
> Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.
>
> cheers,
> Tim
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Tobias Pape

On 28.04.2016, at 22:29, Levente Uzonyi <[hidden email]> wrote:

> On Wed, 27 Apr 2016, Tim Felgentreff wrote:
>
>> Hi,
>> We have a proposal for a tool that we think might be useful to have in trunk.
>> We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them
>> with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and
>> include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple
>> runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution.
>> As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.
>> Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg
>> As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a
>> benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each
>> benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner
>> will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and
>> full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of
>> benchmark classes for later reference.
>> The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner
>
> The link seems to be broken.
>

In how far?
It works from here :)

> Levente
>
>> Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.
>> cheers,
>> Tim



Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Stephan Eggermont-3
In reply to this post by Eliot Miranda-2
On 28/04/16 00:44, Eliot Miranda wrote:
> That would be fabulous.  Hence
> - is it portable to Pharo?

Pharo has a class BenchmarkResult in Kernel-Chronology
Pharo has no DummyStream
Pharo has no asOop
TimeStamp now -> DateAndTime now
no , for numbers, use asString
Smalltalk getVMParameters -> Smalltalk vm getParameters

Where is BenchmarkTestRunnerSuite?

Stephan


Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Levente Uzonyi
In reply to this post by Tobias Pape
On Thu, 28 Apr 2016, Tobias Pape wrote:

>
> On 28.04.2016, at 22:29, Levente Uzonyi <[hidden email]> wrote:
>
>> On Wed, 27 Apr 2016, Tim Felgentreff wrote:
>>
>>> Hi,
>>> We have a proposal for a tool that we think might be useful to have in trunk.
>>> We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them
>>> with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and
>>> include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple
>>> runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution.
>>> As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.
>>> Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg
>>> As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a
>>> benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each
>>> benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner
>>> will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and
>>> full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of
>>> benchmark classes for later reference.
>>> The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner
>>
>> The link seems to be broken.
>>
>
> In how far?
> It works from here :)

HTTPS Everywhere turns it into an https URL, but that's a 404. It works
via http though.

Levente

>
>> Levente
>>
>>> Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.
>>> cheers,
>>> Tim
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Tobias Pape

On 29.04.2016, at 00:02, Levente Uzonyi <[hidden email]> wrote:

> On Thu, 28 Apr 2016, Tobias Pape wrote:
>
>>
>> On 28.04.2016, at 22:29, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Wed, 27 Apr 2016, Tim Felgentreff wrote:
>>>
>>>> Hi,
>>>> We have a proposal for a tool that we think might be useful to have in trunk.
>>>> We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them
>>>> with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and
>>>> include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple
>>>> runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution.
>>>> As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.
>>>> Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg
>>>> As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a
>>>> benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each
>>>> benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner
>>>> will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and
>>>> full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of
>>>> benchmark classes for later reference.
>>>> The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner
>>>
>>> The link seems to be broken.
>>>
>>
>> In how far?
>> It works from here :)
>
> HTTPS Everywhere turns it into an https URL, but that's a 404. It works via http though.

Yeah sorry for that.
But since we're not HTTPS-ready with Monticello, this probably has to wait a tiny bit more...

Best regards
        -Tobias

>
> Levente
>
>>
>>> Levente
>>>
>>>> Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.
>>>> cheers,
>>>> Tim



Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

timfelgentreff
In reply to this post by Stefan Marr-3
Hi Stefan,

what does your squeak-vm.sh script do? Because on Squeak, I cannot simply type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don't think we have in Squeak.

Stefan Marr-3 wrote
Hi Tim:

> On 28 Apr 2016, at 13:01, Tim Felgentreff <[hidden email]> wrote:
>
>> - can it be controlled from the command line?
>
> Yes, provided you mean "use a .st file argument". To run a benchmark
> you can write e.g.
> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
> iterations, reporting statistics in the autosize suite
> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
> benchmarks for 100 outer iterations without autosizing, but with one
> extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html

$ squeak-vm.sh Pharo-1.2.image --help
SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter] <suiteOrBenchmark> 
                               [iterations [processes [problemSize]]]

Arguments:
 runner             optional, a SMarkRunner class that executes the benchmarks
 reporter           optional, a SMarkReporter class that processes
                              and displays the results
 suiteOrBenchmark   required, either a SMarkSuite with benchmarks,
                              or a benchmark denoted by Suite.benchName
 iterations         optional, number of times the benchmarks are repeated
 processes          optional, number of processes/threads used by the benchmarks
 problemSize        optional, depending on benchmark for instance number of
                              inner iterations or size of used data set


Best regards
Stefan

--
Stefan Marr
Johannes Kepler Universität Linz
http://stefan-marr.de/research/
Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Stefan Marr-3
Hi Tim:

> On 29 Apr 2016, at 12:10, timfelgentreff <[hidden email]> wrote:
>
> what does your squeak-vm.sh script do? Because on Squeak, I cannot simply
> type --help and get output. The mailing list thread you linked refers to
> something Pharo specific that I don’t think we have in Squeak.

At least in Pharo there was/is a way to register a handler for the startup.
SMark used to do that. It then will process the command line arguments.

I don’t remember the details, sorry, and currently don’t have access to the code to check.

Best regards
Stefan

>
>
> Stefan Marr-3 wrote
>> Hi Tim:
>>
>>> On 28 Apr 2016, at 13:01, Tim Felgentreff &lt;
>
>> timfelgentreff@
>
>> &gt; wrote:
>>>
>>>> - can it be controlled from the command line?
>>>
>>> Yes, provided you mean "use a .st file argument". To run a benchmark
>>> you can write e.g.
>>> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
>>> iterations, reporting statistics in the autosize suite
>>> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
>>> benchmarks for 100 outer iterations without autosizing, but with one
>>> extra iteration for warmup
>>
>> I look at your changes to the code, but if you didn’t remove any SMark
>> features, there is also a proper command-line interface.
>>
>> See:
>> http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html
>>
>> $ squeak-vm.sh Pharo-1.2.image --help
>> SMark Benchmark Framework, version: SMark-StefanMarr.12
>>
>> Usage: &lt;vm+image&gt; SMarkHarness [runner] [reporter]
>> <suiteOrBenchmark>
>>
>>                              [iterations [processes [problemSize]]]
>>
>> Arguments:
>> runner             optional, a SMarkRunner class that executes the
>> benchmarks
>> reporter           optional, a SMarkReporter class that processes
>>                             and displays the results
>> suiteOrBenchmark   required, either a SMarkSuite with benchmarks,
>>                             or a benchmark denoted by Suite.benchName
>> iterations         optional, number of times the benchmarks are repeated
>> processes          optional, number of processes/threads used by the
>> benchmarks
>> problemSize        optional, depending on benchmark for instance number
>> of
>>                             inner iterations or size of used data set
>>
>>
>> Best regards
>> Stefan
>>
>> --
>> Stefan Marr
>> Johannes Kepler Universität Linz
>> http://stefan-marr.de/research/
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Chris Muller-3
Squeak always processes command-line arguments.  If the
#readDocumentAtStartup Preference in the image is set (the default),
it will treat the first image argument as a URL referring to a
Smalltalk script to execute, and the subsequent ones as arguments to
that script:

   squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript]
[scriptArg1 scriptArg2 ...]

There's a convenience method provides easy access to those arguments
and basic error handling for headless running via

      "This code goes in a text file and referred to by the
urlToSmalltalkScript"
      Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]

If readDocumentAtStartup is not set, then each image argument is
simply passed in as an Array of Strings.

   squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]

On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr <[hidden email]> wrote:

> Hi Tim:
>
>> On 29 Apr 2016, at 12:10, timfelgentreff <[hidden email]> wrote:
>>
>> what does your squeak-vm.sh script do? Because on Squeak, I cannot simply
>> type --help and get output. The mailing list thread you linked refers to
>> something Pharo specific that I don’t think we have in Squeak.
>
> At least in Pharo there was/is a way to register a handler for the startup.
> SMark used to do that. It then will process the command line arguments.
>
> I don’t remember the details, sorry, and currently don’t have access to the code to check.
>
> Best regards
> Stefan
>
>>
>>
>> Stefan Marr-3 wrote
>>> Hi Tim:
>>>
>>>> On 28 Apr 2016, at 13:01, Tim Felgentreff &lt;
>>
>>> timfelgentreff@
>>
>>> &gt; wrote:
>>>>
>>>>> - can it be controlled from the command line?
>>>>
>>>> Yes, provided you mean "use a .st file argument". To run a benchmark
>>>> you can write e.g.
>>>> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
>>>> iterations, reporting statistics in the autosize suite
>>>> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
>>>> benchmarks for 100 outer iterations without autosizing, but with one
>>>> extra iteration for warmup
>>>
>>> I look at your changes to the code, but if you didn’t remove any SMark
>>> features, there is also a proper command-line interface.
>>>
>>> See:
>>> http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html
>>>
>>> $ squeak-vm.sh Pharo-1.2.image --help
>>> SMark Benchmark Framework, version: SMark-StefanMarr.12
>>>
>>> Usage: &lt;vm+image&gt; SMarkHarness [runner] [reporter]
>>> <suiteOrBenchmark>
>>>
>>>                              [iterations [processes [problemSize]]]
>>>
>>> Arguments:
>>> runner             optional, a SMarkRunner class that executes the
>>> benchmarks
>>> reporter           optional, a SMarkReporter class that processes
>>>                             and displays the results
>>> suiteOrBenchmark   required, either a SMarkSuite with benchmarks,
>>>                             or a benchmark denoted by Suite.benchName
>>> iterations         optional, number of times the benchmarks are repeated
>>> processes          optional, number of processes/threads used by the
>>> benchmarks
>>> problemSize        optional, depending on benchmark for instance number
>>> of
>>>                             inner iterations or size of used data set
>>>
>>>
>>> Best regards
>>> Stefan
>>>
>>> --
>>> Stefan Marr
>>> Johannes Kepler Universität Linz
>>> http://stefan-marr.de/research/
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.html
>> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

timfelgentreff

Hi Chris and Stefan

yes, the Squeak cmdline arg processing through a file is what I'm using, but Pharo supports different ( more 'traditional looking') stuff afaict. If there was code specific to the Pharo way of doing it, I'm SMark, I don't know about it, since I haven't used Pharo for a few years (and RSqueak doesn't work with it,  because they removed some of the fallback code for primitives that we don't implement).

I can look at the code, but I would be against a command line interface that doesn't work the same across different Smalltalk distributions. But we can certainly think about how to improve it.

cheers,
Tim

On 30 April 2016 at 18:12, Chris Muller <[hidden email]> wrote:
> Squeak always processes command-line arguments. If the
> #readDocumentAtStartup Preference in the image is set (the default),
> it will treat the first image argument as a URL referring to a
> Smalltalk script to execute, and the subsequent ones as arguments to
> that script:
>
> squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript]
> [scriptArg1 scriptArg2 ...]
>
> There's a convenience method provides easy access to those arguments
> and basic error handling for headless running via
>
> "This code goes in a text file and referred to by the
> urlToSmalltalkScript"
> Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]
>
> If readDocumentAtStartup is not set, then each image argument is
> simply passed in as an Array of Strings.
>
> squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]
>
> On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr <[hidden email]> wrote:
>> Hi Tim:
>>
>>> On 29 Apr 2016, at 12:10, timfelgentreff <[hidden email]> wrote:
>>>
>>> what does your squeak-vm.sh script do? Because on Squeak, I cannot simply
>>> type --help and get output. The mailing list thread you linked refers to
>>> something Pharo specific that I don’t think we have in Squeak.
>>
>> At least in Pharo there was/is a way to register a handler for the startup.
>> SMark used to do that. It then will process the command line arguments.
>>
>> I don’t remember the details, sorry, and currently don’t have access to the code to check.
>>
>> Best regards
>> Stefan
>>
>>>
>>>
>>> Stefan Marr-3 wrote
>>>> Hi Tim:
>>>>
>>>>> On 28 Apr 2016, at 13:01, Tim Felgentreff &lt;
>>>
>>>> timfelgentreff@
>>>
>>>> &gt; wrote:
>>>>>
>>>>>> - can it be controlled from the command line?
>>>>>
>>>>> Yes, provided you mean "use a .st file argument". To run a benchmark
>>>>> you can write e.g.
>>>>> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>>> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
>>>>> iterations, reporting statistics in the autosize suite
>>>>> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>>> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
>>>>> benchmarks for 100 outer iterations without autosizing, but with one
>>>>> extra iteration for warmup
>>>>
>>>> I look at your changes to the code, but if you didn’t remove any SMark
>>>> features, there is also a proper command-line interface.
>>>>
>>>> See:
>>>> http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html
>>>>
>>>> $ squeak-vm.sh Pharo-1.2.image --help
>>>> SMark Benchmark Framework, version: SMark-StefanMarr.12
>>>>
>>>> Usage: &lt;vm+image&gt; SMarkHarness [runner] [reporter]
>>>> <suiteOrBenchmark>
>>>>
>>>> [iterations [processes [problemSize]]]
>>>>
>>>> Arguments:
>>>> runner optional, a SMarkRunner class that executes the
>>>> benchmarks
>>>> reporter optional, a SMarkReporter class that processes
>>>> and displays the results
>>>> suiteOrBenchmark required, either a SMarkSuite with benchmarks,
>>>> or a benchmark denoted by Suite.benchName
>>>> iterations optional, number of times the benchmarks are repeated
>>>> processes optional, number of processes/threads used by the
>>>> benchmarks
>>>> problemSize optional, depending on benchmark for instance number
>>>> of
>>>> inner iterations or size of used data set
>>>>
>>>>
>>>> Best regards
>>>> Stefan
>>>>
>>>> --
>>>> Stefan Marr
>>>> Johannes Kepler Universität Linz
>>>> http://stefan-marr.de/research/
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.html
>>> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: A Benchmarking tool for the trunk?

Stefan Marr-3
Hi Tim:

Ok, I checked what I did. See http://smalltalkhub.com/#!/~StefanMarr/SMark/packages/Scripting

This implements support for command-line scripting by registering a startup handler, which then calls SMarkHarness class>>#run:.

Last time I checked, this was compatible with Squeak and Pharo, because I was using it even with Squeak 3.9 images. All this infrastructure is coming out of the RoarVM project. So, mind you, the code is dating back a while…

The `ScriptStarter` class should be the code to look at.  The #initialize/#install methods on the class side do the relevant setup.

Hope that helps
Stefan



> On 01 May 2016, at 00:24, Tim Felgentreff <[hidden email]> wrote:
>
> Hi Chris and Stefan
>
> yes, the Squeak cmdline arg processing through a file is what I'm using, but Pharo supports different ( more 'traditional looking') stuff afaict. If there was code specific to the Pharo way of doing it, I'm SMark, I don't know about it, since I haven't used Pharo for a few years (and RSqueak doesn't work with it,  because they removed some of the fallback code for primitives that we don't implement).
>
> I can look at the code, but I would be against a command line interface that doesn't work the same across different Smalltalk distributions. But we can certainly think about how to improve it.
>
> cheers,
> Tim
>
> On 30 April 2016 at 18:12, Chris Muller <[hidden email]> wrote:
>> Squeak always processes command-line arguments. If the
>> #readDocumentAtStartup Preference in the image is set (the default),
>> it will treat the first image argument as a URL referring to a
>> Smalltalk script to execute, and the subsequent ones as arguments to
>> that script:
>>
>> squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript]
>> [scriptArg1 scriptArg2 ...]
>>
>> There's a convenience method provides easy access to those arguments
>> and basic error handling for headless running via
>>
>> "This code goes in a text file and referred to by the
>> urlToSmalltalkScript"
>> Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]
>>
>> If readDocumentAtStartup is not set, then each image argument is
>> simply passed in as an Array of Strings.
>>
>> squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]
>>
>> On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr <[hidden email]> wrote:
>>> Hi Tim:
>>>
>>>> On 29 Apr 2016, at 12:10, timfelgentreff <[hidden email]> wrote:
>>>>
>>>> what does your squeak-vm.sh script do? Because on Squeak, I cannot simply
>>>> type --help and get output. The mailing list thread you linked refers to
>>>> something Pharo specific that I don’t think we have in Squeak.
>>>
>>> At least in Pharo there was/is a way to register a handler for the startup.
>>> SMark used to do that. It then will process the command line arguments.
>>>
>>> I don’t remember the details, sorry, and currently don’t have access to the code to check.
>>>
>>> Best regards
>>> Stefan
>>>
>>>>
>>>>
>>>> Stefan Marr-3 wrote
>>>>> Hi Tim:
>>>>>
>>>>>> On 28 Apr 2016, at 13:01, Tim Felgentreff &lt;
>>>>
>>>>> timfelgentreff@
>>>>
>>>>> &gt; wrote:
>>>>>>
>>>>>>> - can it be controlled from the command line?
>>>>>>
>>>>>> Yes, provided you mean "use a .st file argument". To run a benchmark
>>>>>> you can write e.g.
>>>>>> - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>>>> 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer
>>>>>> iterations, reporting statistics in the autosize suite
>>>>>> - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.
>>>>>> 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees
>>>>>> benchmarks for 100 outer iterations without autosizing, but with one
>>>>>> extra iteration for warmup
>>>>>
>>>>> I look at your changes to the code, but if you didn’t remove any SMark
>>>>> features, there is also a proper command-line interface.
>>>>>
>>>>> See:
>>>>> http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3524056.html
>>>>>
>>>>> $ squeak-vm.sh Pharo-1.2.image --help
>>>>> SMark Benchmark Framework, version: SMark-StefanMarr.12
>>>>>
>>>>> Usage: &lt;vm+image&gt; SMarkHarness [runner] [reporter]
>>>>> <suiteOrBenchmark>
>>>>>
>>>>> [iterations [processes [problemSize]]]
>>>>>
>>>>> Arguments:
>>>>> runner optional, a SMarkRunner class that executes the
>>>>> benchmarks
>>>>> reporter optional, a SMarkReporter class that processes
>>>>> and displays the results
>>>>> suiteOrBenchmark required, either a SMarkSuite with benchmarks,
>>>>> or a benchmark denoted by Suite.benchName
>>>>> iterations optional, number of times the benchmarks are repeated
>>>>> processes optional, number of processes/threads used by the
>>>>> benchmarks
>>>>> problemSize optional, depending on benchmark for instance number
>>>>> of
>>>>> inner iterations or size of used data set
>>>>>
>>>>>
>>>>> Best regards
>>>>> Stefan
>>>>>
>>>>> --
>>>>> Stefan Marr
>>>>> Johannes Kepler Universität Linz
>>>>> http://stefan-marr.de/research/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.html
>>>> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>