Smalltalk › Squeak › Squeak - Dev

Cog 2776 on Intel i7-4790K @ 4GHz

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Chris Muller-4

Cog 2776 on Intel i7-4790K @ 4GHz

0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'

[] bench '131,000,000 per second.'

ccrraaiigg

re: Cog 2776 on Intel i7-4790K @ 4GHz

> 0 tinyBenchmarks 1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
> [] bench 131,000,000 per second.

I'll take ten, please.

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Bert Freudenberg

Re: Cog 2776 on Intel i7-4790K @ 4GHz

In reply to this post by Chris Muller-4

On 29.08.2014, at 05:03, Chris Muller <[hidden email]> wrote:

> 0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'

Nice.

> [] bench '131,000,000 per second.'

Hmm, this mostly measures the millisecondClockValue primitive.

How about we replace this

count := 0.
endTime := Time millisecondClockValue + 5000.
startTime := Time millisecondClockValue.
[ Time millisecondClockValue > endTime ] whileFalse: [ self value. count := count + 1 ].
endTime := Time millisecondClockValue.

with

count := 0.
repeat := true.
[(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
startTime := Time millisecondClockValue.
[ self value. count := count + 1. repeat ] whileTrue.
endTime := Time millisecondClockValue.

which on my machine makes it go from

'70,800,000 per second.'

to

'168,000,000 per second.'

- Bert -

smime.p7s (5K) Download Attachment

Chris Muller-3

Re: Cog 2776 on Intel i7-4790K @ 4GHz

>> 0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
>
> Nice.
>
>> [] bench '131,000,000 per second.'
>
> Hmm, this mostly measures the millisecondClockValue primitive.
>
> How about we replace this
>
> count := 0.
> endTime := Time millisecondClockValue + 5000.
> startTime := Time millisecondClockValue.
> [ Time millisecondClockValue > endTime ] whileFalse: [ self value. count := count + 1 ].
> endTime := Time millisecondClockValue.
>
> with
>
> count := 0.
> repeat := true.
> [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
> startTime := Time millisecondClockValue.
> [ self value. count := count + 1. repeat ] whileTrue.
> endTime := Time millisecondClockValue.
>
> which on my machine makes it go from
>
> '70,800,000 per second.'
>
> to
>
> '168,000,000 per second.'

Wow, I didn't realize millisecondClockValue had that much of an
impact! Yours is defiintely less-intrusive, we should update #bench..

Ben Coman

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Chris Muller wrote:

0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'

Nice.

[] bench    '131,000,000 per second.'

Hmm, this mostly measures the millisecondClockValue primitive.

How about we replace this

        count := 0.
        endTime := Time millisecondClockValue + 5000.
        startTime := Time millisecondClockValue.
        [ Time millisecondClockValue > endTime ] whileFalse: [ self value.  count := count + 1 ].
        endTime := Time millisecondClockValue.

with

        count := 0.
        repeat := true.
        [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
        startTime := Time millisecondClockValue.
        [ self value.  count := count + 1. repeat ] whileTrue.
        endTime := Time millisecondClockValue.

which on my machine makes it go from

 '70,800,000 per second.'

to

 '168,000,000 per second.'


Wow, I didn't realize millisecondClockValue had that much of an
impact!  Yours is defiintely less-intrusive, we should update #bench..

Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ?
cheers -ben

Chris Muller-4

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Do you actually have persistent bench results? That sounds
interesting. Please, do tell.

I"m curious how you would be able to keep a consistent baseline with
improving hardware and VM's and even image performance improvements.

On Sun, Aug 31, 2014 at 7:30 PM, Ben Coman <[hidden email]> wrote:

> Chris Muller wrote:
>
> 0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
>
>
> Nice.
>
>
>
> [] bench '131,000,000 per second.'
>
>
> Hmm, this mostly measures the millisecondClockValue primitive.
>
> How about we replace this
>
> count := 0.
> endTime := Time millisecondClockValue + 5000.
> startTime := Time millisecondClockValue.
> [ Time millisecondClockValue > endTime ] whileFalse: [ self value.
> count := count + 1 ].
> endTime := Time millisecondClockValue.
>
> with
>
> count := 0.
> repeat := true.
> [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor
> activePriority + 1.
> startTime := Time millisecondClockValue.
> [ self value. count := count + 1. repeat ] whileTrue.
> endTime := Time millisecondClockValue.
>
> which on my machine makes it go from
>
> '70,800,000 per second.'
>
> to
>
> '168,000,000 per second.'
>
>
> Wow, I didn't realize millisecondClockValue had that much of an
> impact! Yours is defiintely less-intrusive, we should update #bench..
>
>
>
>
>
> Would you leave #bench as it is to avoid invalidating comparisons with
> previous results, and add some kind of #bench2 ?
> cheers -ben

Bert Freudenberg

Re: Cog 2776 on Intel i7-4790K @ 4GHz

In reply to this post by Ben Coman

On 01.09.2014, at 02:30, Ben Coman <[hidden email]> wrote:

> Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ?
> cheers -ben

I wouldn't think that's necessary. #bench itself is supposed to have a negligible impact on the numbers, so keeping it as low as possible seems appropriate.

There is an argument to be made that if this change impacts the numbers, then we're not measuring anything useful anyway. E.g. cost of the block activation is still in there for each iteration, so maybe it's not worth changing after all:

[3+4] bench ==> '150,000,000 per second.'

[1 to: 150000000 do: [:i | 3 + 4]] timeToRun ==> 386

... which suggests that the block activation has an almost 200% overhead in this case. But that is a fallacy in itself:

[1 to: 150000000 do: [:i | 3 + 4. 3 + 4]] timeToRun ==> 373

... which suggests that the iteration has a 3000% overhead. At least in current Cog, whereas an optimizing JIT might reduce the whole thing into a no-op.

Yes, micro benchmarks are pretty meaningless.

Optimizing #bench does not make them more meaningful, but since it reduces the measurement error, it might still be worth doing?

- Bert -

smime.p7s (5K) Download Attachment

Göran Krampe

Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Hi guys!

While on the subject of tinyBenchmarks (toying with comparing to
LuaJIT2), can someone explain a few things to me:

- Why do we take "500000 / <the-time-to-run-benchmark>" to mean
bytecodes/sec? I presume its because someone made a count at some point
that it takes 500000 bytecodes to find those primes? Is that still a
correct estimation/presumption?

- Why is benchFib not a correct Fibonacci sequence? The implementation
as it stands (seems to have been like this ever since 1998 when John
Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct
Fibonacci:
#(1 1 3 5 9 15 25 41 67 109 177)

...while correct Fibonacci is (returning self, not 1, and not adding 1
in the recursion):
#(0 1 1 2 3 5 8 13 21 34 55)

It almost seems like an odd optimization gone wrong - returning 1
instead of "self" when < 2 - and then trying to compansate for the fact
that "0 benchFib" should actually be 0 - by adding 1 to the result, but
missing the fact that this will add 1 on every recursive call?

I presume there is something smart going on here - that makes this count
"sends" better this way?

And if we just want to count sends - isn't there a better way?

Come on Bert - enlighten me! :)

Curious.

regards, Göran

Eliot Miranda-2

Re: Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Hi Göran,

On Sep 1, 2014, at 6:06 AM, Göran Krampe <[hidden email]> wrote:

> Hi guys!
>
> While on the subject of tinyBenchmarks (toying with comparing to LuaJIT2), can someone explain a few things to me:
>
> - Why do we take "500000 / <the-time-to-run-benchmark>" to mean bytecodes/sec? I presume its because someone made a count at some point that it takes 500000 bytecodes to find those primes? Is that still a correct estimation/presumption?

That's right. It's probably still close. One can count the actual number by simulating the expression using (IIRC) run:atEachStep: which is in the class side of ContextPart.

> - Why is benchFib not a correct Fibonacci sequence?

BTW Fibonacci sequences have been generalized. See Lucas Numbers on Wikipedia. Ffor example the classic one is close to 2^N, but one which added the previous three results would be close to 3^N, etc ("tribonacci").

But the point of benchFib is that it adds one for each and every invocation whereas the classic one adds one for each leaf activation. Hence benchFib's result is the number if activations required to evaluate it and hence dividing the result by the time in seconds taken to compute it gives a rough measure of activations per second. This really should be in the comment.

> The implementation as it stands (seems to have been like this ever since 1998 when John Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct Fibonacci:
> #(1 1 3 5 9 15 25 41 67 109 177)
>
> ...while correct Fibonacci is (returning self, not 1, and not adding 1 in the recursion):
> #(0 1 1 2 3 5 8 13 21 34 55)
>
> It almost seems like an odd optimization gone wrong - returning 1 instead of "self" when < 2 - and then trying to compansate for the fact that "0 benchFib" should actually be 0 - by adding 1 to the result, but missing the fact that this will add 1 on every recursive call?
>
> I presume there is something smart going on here - that makes this count "sends" better this way?
>
> And if we just want to count sends - isn't there a better way?
>
> Come on Bert - enlighten me! :)
>
> Curious.
>
> regards, Göran

Eliot (phone)

Göran Krampe

Re: Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Hi Eliot!

On 09/01/2014 03:39 PM, Eliot Miranda wrote:

> Hi Göran,
>
> On Sep 1, 2014, at 6:06 AM, Göran Krampe <[hidden email]> wrote:
>> Hi guys!
>>
>> While on the subject of tinyBenchmarks (toying with comparing to
>> LuaJIT2), can someone explain a few things to me:
>>
>> - Why do we take "500000 / <the-time-to-run-benchmark>" to mean
>> bytecodes/sec? I presume its because someone made a count at some
>> point that it takes 500000 bytecodes to find those primes? Is that
>> still a correct estimation/presumption?
>
> That's right. It's probably still close. One can count the actual
> number by simulating the expression using (IIRC) run:atEachStep:
> which is in the class side of ContextPart.

Ah, good. So I am not entirely stupid. :)

>> - Why is benchFib not a correct Fibonacci sequence?
>
> BTW Fibonacci sequences have been generalized. See Lucas Numbers on
> Wikipedia. Ffor example the classic one is close to 2^N, but one
> which added the previous three results would be close to 3^N, etc
> ("tribonacci").
>
> But the point of benchFib is that it adds one for each and every
> invocation whereas the classic one adds one for each leaf activation.
> Hence benchFib's result is the number if activations required to
> evaluate it and hence dividing the result by the time in seconds
> taken to compute it gives a rough measure of activations per second.
> This really should be in the comment.

Ah, great! Thank you, I knew it must be something "smart" :)

regards, Göran