Call for big benchmarks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Call for big benchmarks

melkyades
 
Hi everybody! While measuring performance I usually face the problem of assessing performance. At present, I'm using are-we-fast-yet benchs and also smark ones, among others. Unfortunately, for some purposes those benchs are considered too low-level, so I'd like to collect a set of "real world" fat workloads. I'm asking for contributions on computing problems you have. I'll create a public repo as a common place,  explaining what each bench does, so that we can all benefit from the result.

The code should have the following properties:

- The executed code should be > 20KLOC.
- It should have compute or memory intensive models or both.
- Ideally, it has sufficient run time, perhaps a few minutes, but at least a few seconds.
- The results should be verifiable.
- The more portable the best, as I need to have those benchs running in another dialect.
- Should be fully automate-able, it should be possible to suppress output.


Some examples are moose models, source code analysis, parallelization, web apps, parser generators, their corresponding parsers, XML processing, report generation, refactoring tools, type inference, etc…

Feel free to forward this mail to other lists you consider appropriate.

Cheers,
Pocho

-- 
Javier Pimás
Ciudad de Buenos Aires
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

Ben Coman
 
Good initiative, but I guess satisfying both  "> 20KLOC"  and
"portable"  at the same time may be a tough ask.

Just a very random thought - I wonder if any insight might be gained
by crossing several of the low level together.
Like run a compute intensive benchmark at the same time as a memory
intensive benchmark?

cheers -ben

On Wed, Mar 22, 2017 at 7:53 AM, Javier Pimás
<[hidden email]> wrote:

>
> Hi everybody! While measuring performance I usually face the problem of assessing performance. At present, I'm using are-we-fast-yet benchs and also smark ones, among others. Unfortunately, for some purposes those benchs are considered too low-level, so I'd like to collect a set of "real world" fat workloads. I'm asking for contributions on computing problems you have. I'll create a public repo as a common place,  explaining what each bench does, so that we can all benefit from the result.
>
> The code should have the following properties:
>
> - The executed code should be > 20KLOC.
> - It should have compute or memory intensive models or both.
> - Ideally, it has sufficient run time, perhaps a few minutes, but at least a few seconds.
> - The results should be verifiable.
> - The more portable the best, as I need to have those benchs running in another dialect.
> - Should be fully automate-able, it should be possible to suppress output.
>
>
> Some examples are moose models, source code analysis, parallelization, web apps, parser generators, their corresponding parsers, XML processing, report generation, refactoring tools, type inference, etc…
>
> Feel free to forward this mail to other lists you consider appropriate.
>
> Cheers,
> Pocho
>
> --
> Javier Pimás
> Ciudad de Buenos Aires
>
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

timrowledge
In reply to this post by melkyades
 

> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks


Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

timfelgentreff
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks


Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

Eliot Miranda-2
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

melkyades
In reply to this post by Ben Coman
 


On Tue, Mar 21, 2017 at 9:05 PM, Ben Coman <[hidden email]> wrote:

Good initiative, but I guess satisfying both  "> 20KLOC"  and
"portable"  at the same time may be a tough ask.

You are right, what I meant is not _too much_ dependent in the dialect. Anyway, my experience from porting Smark benchs from Pharo to Bee is that I had to add ~20 methods and change some selectors here and there.


Just a very random thought - I wonder if any insight might be gained
by crossing several of the low level together.
Like run a compute intensive benchmark at the same time as a memory
intensive benchmark?

cheers -ben

On Wed, Mar 22, 2017 at 7:53 AM, Javier Pimás
<[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance. At present, I'm using are-we-fast-yet benchs and also smark ones, among others. Unfortunately, for some purposes those benchs are considered too low-level, so I'd like to collect a set of "real world" fat workloads. I'm asking for contributions on computing problems you have. I'll create a public repo as a common place,  explaining what each bench does, so that we can all benefit from the result.
>
> The code should have the following properties:
>
> - The executed code should be > 20KLOC.
> - It should have compute or memory intensive models or both.
> - Ideally, it has sufficient run time, perhaps a few minutes, but at least a few seconds.
> - The results should be verifiable.
> - The more portable the best, as I need to have those benchs running in another dialect.
> - Should be fully automate-able, it should be possible to suppress output.
>
>
> Some examples are moose models, source code analysis, parallelization, web apps, parser generators, their corresponding parsers, XML processing, report generation, refactoring tools, type inference, etc…
>
> Feel free to forward this mail to other lists you consider appropriate.
>
> Cheers,
> Pocho
>
> --
> Javier Pimás
> Ciudad de Buenos Aires
>



--
Javier Pimás
Ciudad de Buenos Aires
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

melkyades
In reply to this post by timfelgentreff
 


On Thu, Mar 23, 2017 at 5:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a

Those kinds of things are nice, I'll take a look
 

frame limit would work as a decent CPU bound benchmark.

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
Javier Pimás
Ciudad de Buenos Aires
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

melkyades
In reply to this post by Eliot Miranda-2
 


On Thu, Mar 23, 2017 at 1:18 PM, Eliot Miranda <[hidden email]> wrote:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Also nice, count on me for helping!

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language. 

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
Javier Pimás
Ciudad de Buenos Aires
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

timfelgentreff
In reply to this post by Eliot Miranda-2
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.


Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

Eliot Miranda-2
 
Hi Tim,

On Fri, Mar 24, 2017 at 1:10 AM, Tim Felgentreff <[hidden email]> wrote:
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.

I don't care in this case.  I'm happy to include those other benchmarks.  I just want to include one Smalltalk-centric benchmark that exemplifies excellent Smalltalk style and that uses the language to its full extent.  One is likely to find that in a Smalltalk compiler; it's written by experts in the language, and I know that the Opal compiler is particularly clean.  The point here is to have a benchmark that shows how well Scorch/Sista optimists an exemplary Smalltalk workload, not a generic workload to enable comparisons between languages.  I'm as interested in seeing how fast Scorch/Sista is w.r.t. the Interpreter, the StackInterpreter, the V3 Cog VM and the Spur Cog VM, as in seeing how fast generic benchmarks are compared to other language implementations.

In any case I wouldn't be interested in installing the methods that the benchmark compiler would generate, only interested in its source -> compiled method transformation.  Installing measures all sorts of JIT related hacks that are irrelevant to compute performance.

And of course, taking a snapshot of a specific Smalltalk compiler at a given point in time and sticking with it gives us much less of a moving target (collections etc will still affect things).



Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

John Dougan
In reply to this post by timfelgentreff
 
I don't know if this qualifies, but I ported John Walker's fbench floating point accuracy benchmark (https://www.fourmilab.ch/fbench/fbench.html) to a variety of Smalltalk platforms. The numerical code is written in the standard Numerical Recipes style, which isn't very Smalltalky, but is very common. Probably lots of opportunities for optimizations. The included code tries to write to stdout as it was designed to be called from the command line, but that is pretty trivial to change.

Cheers,
 -  John

On Fri, Mar 24, 2017 at 1:10 AM, Tim Felgentreff <[hidden email]> wrote:
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.


Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
John Dougan
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

Eliot Miranda-2
 
Hi John,

On Fri, Mar 24, 2017 at 7:29 PM, John Dougan <[hidden email]> wrote:
 
I don't know if this qualifies, but I ported John Walker's fbench floating point accuracy benchmark (https://www.fourmilab.ch/fbench/fbench.html) to a variety of Smalltalk platforms. The numerical code is written in the standard Numerical Recipes style, which isn't very Smalltalky, but is very common. Probably lots of opportunities for optimizations. The included code tries to write to stdout as it was designed to be called from the command line, but that is pretty trivial to change.

I'd love to see this contributed.  How old is that page?  I'm curious about these relative results:

C 1 GCC 3.2.3 -O3, Linux
...
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux

I'd like to see if Spur Cog can beat VW and Gnu St.
 

Cheers,
 -  John

On Fri, Mar 24, 2017 at 1:10 AM, Tim Felgentreff <[hidden email]> wrote:
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.


Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
John Dougan
[hidden email]




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

Eliot Miranda-2
 


On Fri, Mar 24, 2017 at 7:36 PM, Eliot Miranda <[hidden email]> wrote:
Hi John,

On Fri, Mar 24, 2017 at 7:29 PM, John Dougan <[hidden email]> wrote:
 
I don't know if this qualifies, but I ported John Walker's fbench floating point accuracy benchmark (https://www.fourmilab.ch/fbench/fbench.html) to a variety of Smalltalk platforms. The numerical code is written in the standard Numerical Recipes style, which isn't very Smalltalky, but is very common. Probably lots of opportunities for optimizations. The included code tries to write to stdout as it was designed to be called from the command line, but that is pretty trivial to change.

I'd love to see this contributed.  How old is that page? 

(I mean when were the results computed; it says last updated 2016, but no dates for the individual times are taken; were they all computed at the same time or are some historical results)
 
I'm curious about these relative results:

C 1 GCC 3.2.3 -O3, Linux
...
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux

I'd like to see if Spur Cog can beat VW and Gnu St.
 

Cheers,
 -  John

On Fri, Mar 24, 2017 at 1:10 AM, Tim Felgentreff <[hidden email]> wrote:
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.


Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
John Dougan
[hidden email]




--
_,,,^..^,,,_
best, Eliot



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Call for big benchmarks

John Dougan
 
The benchmarks on the fourmilab page were done over a period of time and when I last messed with running the benchmark in ST in late 2014, there were discrepancies in the relative times.
has a folder with a spreadsheet with the results I got then along with an mcz with the benchmark for Squeak 4.

Cheers,
 -- John

On Fri, Mar 24, 2017 at 7:37 PM, Eliot Miranda <[hidden email]> wrote:
 


On Fri, Mar 24, 2017 at 7:36 PM, Eliot Miranda <[hidden email]> wrote:
Hi John,

On Fri, Mar 24, 2017 at 7:29 PM, John Dougan <[hidden email]> wrote:
 
I don't know if this qualifies, but I ported John Walker's fbench floating point accuracy benchmark (https://www.fourmilab.ch/fbench/fbench.html) to a variety of Smalltalk platforms. The numerical code is written in the standard Numerical Recipes style, which isn't very Smalltalky, but is very common. Probably lots of opportunities for optimizations. The included code tries to write to stdout as it was designed to be called from the command line, but that is pretty trivial to change.

I'd love to see this contributed.  How old is that page? 

(I mean when were the results computed; it says last updated 2016, but no dates for the individual times are taken; were they all computed at the same time or are some historical results)
 
I'm curious about these relative results:

C 1 GCC 3.2.3 -O3, Linux
...
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux

I'd like to see if Spur Cog can beat VW and Gnu St.
 

Cheers,
 -  John

On Fri, Mar 24, 2017 at 1:10 AM, Tim Felgentreff <[hidden email]> wrote:
 
Hi Eliot, 

the question for me is, how indicative is this workload of real world performance? Creating compiled methods may not be something that is highly optimized, simply because it doesn't need to be in real applications. One would have to be careful about what is being measured, or if the benchmark is just measuring how fast we can blow out the caches... 

If we're just talking about running parsing and optimizing something, then maybe some real world applications are using that, but even then some JSON or HTML parsing library that implements e.g. Apache mod_rewrite would be more realistic, I think. Dynamically parsing and patching HTML and then pretty-printing or minimizing it seems a more common problem.

I know, you're trying to argue that the Opal compiler may show common workloads equally well, but we could argue that for some of the Shootout benchmarks, too. It's an argument that doesn't seem to convince some people.


Eliot Miranda <[hidden email]> schrieb am Do., 23. März 2017, 17:18:
 
Hi Tim,

On Thu, Mar 23, 2017 at 1:31 AM, Tim Felgentreff <[hidden email]> wrote:
 

Yes, big benchmarks would be nice. Those on speed.squeak.org or in VMMaker are all somewhat small.

Note the Ruby community, for example, has benchmarks such as a NES emulator (optcarrot) that can run for a few thousand frames with predefined input as benchmarks. It's definitely possible.

Maybe some of the projects from HPI students could be made to work, there was a Chip8 emulator in Squeak, for example, that seems big enough. Or maybe the DCPU emulator at github.com/fniephaus/BroDCPU without a frame limit would work as a decent CPU bound benchmark.


I've discussed with Clément doing something like cloning the Opal compiler, or the Squeak compiler, so that it uses a fixed set of classes that won't change over time, excepting the collections, and using as a benchmark this compiler recompiling all its own methods.  This is a nice mix of string processing (in the tokenizer) and symbolic processing (in the building and optimizing of the parse tree).

Cross - dialect could be hard. Pharo and Squeak are fairly easy to do, but with larger programs staying compatible across different dialects is harder.


Again, extracting a compiler from its host system would make it possible to maintain a cross-platform version.  It could be left as an exercise to the reader to port it to one's favorite non-Smalltalk dynamic language.

tim Rowledge <[hidden email]> schrieb am Mi., 22. März 2017, 21:40:


> On 21-03-2017, at 4:53 PM, Javier Pimás <[hidden email]> wrote:
>
> Hi everybody! While measuring performance I usually face the problem of assessing performance.

Have you tried the benchmarks package - CogBenchmarks - included in the source.squeak.org/VMMaker repository?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks






--
_,,,^..^,,,_
best, Eliot




--
John Dougan
[hidden email]




--
_,,,^..^,,,_
best, Eliot



--
_,,,^..^,,,_
best, Eliot




--
John Dougan
[hidden email]