Cog 2776 on Intel i7-4790K @ 4GHz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Cog 2776 on Intel i7-4790K @ 4GHz

Chris Muller-4
0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'

[] bench    '131,000,000 per second.'

Reply | Threaded
Open this post in threaded view
|

re: Cog 2776 on Intel i7-4790K @ 4GHz

ccrraaiigg

> 0 tinyBenchmarks 1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
> [] bench           131,000,000 per second.

     I'll take ten, please.


-C

--
Craig Latta
netjam.org
+31   6 2757 7177 (SMS ok)
+ 1 415  287 3547 (no SMS)


Reply | Threaded
Open this post in threaded view
|

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Bert Freudenberg
In reply to this post by Chris Muller-4
On 29.08.2014, at 05:03, Chris Muller <[hidden email]> wrote:

> 0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'

Nice.

> [] bench    '131,000,000 per second.'

Hmm, this mostly measures the millisecondClockValue primitive.

How about we replace this

        count := 0.
        endTime := Time millisecondClockValue + 5000.
        startTime := Time millisecondClockValue.
        [ Time millisecondClockValue > endTime ] whileFalse: [ self value.  count := count + 1 ].
        endTime := Time millisecondClockValue.

with

        count := 0.
        repeat := true.
        [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
        startTime := Time millisecondClockValue.
        [ self value.  count := count + 1. repeat ] whileTrue.
        endTime := Time millisecondClockValue.

which on my machine makes it go from

 '70,800,000 per second.'

to

 '168,000,000 per second.'

- Bert -






smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Chris Muller-3
>> 0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
>
> Nice.
>
>> [] bench    '131,000,000 per second.'
>
> Hmm, this mostly measures the millisecondClockValue primitive.
>
> How about we replace this
>
>         count := 0.
>         endTime := Time millisecondClockValue + 5000.
>         startTime := Time millisecondClockValue.
>         [ Time millisecondClockValue > endTime ] whileFalse: [ self value.  count := count + 1 ].
>         endTime := Time millisecondClockValue.
>
> with
>
>         count := 0.
>         repeat := true.
>         [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
>         startTime := Time millisecondClockValue.
>         [ self value.  count := count + 1. repeat ] whileTrue.
>         endTime := Time millisecondClockValue.
>
> which on my machine makes it go from
>
>  '70,800,000 per second.'
>
> to
>
>  '168,000,000 per second.'

Wow, I didn't realize millisecondClockValue had that much of an
impact!  Yours is defiintely less-intrusive, we should update #bench..

Reply | Threaded
Open this post in threaded view
|

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Ben Coman
Chris Muller wrote:
0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
      
Nice.

    
[] bench    '131,000,000 per second.'
      
Hmm, this mostly measures the millisecondClockValue primitive.

How about we replace this

        count := 0.
        endTime := Time millisecondClockValue + 5000.
        startTime := Time millisecondClockValue.
        [ Time millisecondClockValue > endTime ] whileFalse: [ self value.  count := count + 1 ].
        endTime := Time millisecondClockValue.

with

        count := 0.
        repeat := true.
        [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1.
        startTime := Time millisecondClockValue.
        [ self value.  count := count + 1. repeat ] whileTrue.
        endTime := Time millisecondClockValue.

which on my machine makes it go from

 '70,800,000 per second.'

to

 '168,000,000 per second.'
    

Wow, I didn't realize millisecondClockValue had that much of an
impact!  Yours is defiintely less-intrusive, we should update #bench..


  

Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ?
cheers -ben


Reply | Threaded
Open this post in threaded view
|

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Chris Muller-4
Do you actually have persistent bench results?  That sounds
interesting.  Please, do tell.

I"m curious how you would be able to keep a consistent baseline with
improving hardware and VM's and even image performance improvements.

On Sun, Aug 31, 2014 at 7:30 PM, Ben Coman <[hidden email]> wrote:

> Chris Muller wrote:
>
> 0 tinyBenchmarks    '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
>
>
> Nice.
>
>
>
> [] bench    '131,000,000 per second.'
>
>
> Hmm, this mostly measures the millisecondClockValue primitive.
>
> How about we replace this
>
>         count := 0.
>         endTime := Time millisecondClockValue + 5000.
>         startTime := Time millisecondClockValue.
>         [ Time millisecondClockValue > endTime ] whileFalse: [ self value.
> count := count + 1 ].
>         endTime := Time millisecondClockValue.
>
> with
>
>         count := 0.
>         repeat := true.
>         [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor
> activePriority + 1.
>         startTime := Time millisecondClockValue.
>         [ self value.  count := count + 1. repeat ] whileTrue.
>         endTime := Time millisecondClockValue.
>
> which on my machine makes it go from
>
>  '70,800,000 per second.'
>
> to
>
>  '168,000,000 per second.'
>
>
> Wow, I didn't realize millisecondClockValue had that much of an
> impact!  Yours is defiintely less-intrusive, we should update #bench..
>
>
>
>
>
> Would you leave #bench as it is to avoid invalidating comparisons with
> previous results, and add some kind of #bench2 ?
> cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: Cog 2776 on Intel i7-4790K @ 4GHz

Bert Freudenberg
In reply to this post by Ben Coman
On 01.09.2014, at 02:30, Ben Coman <[hidden email]> wrote:

> Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ?
> cheers -ben

I wouldn't think that's necessary. #bench itself is supposed to have a negligible impact on the numbers, so keeping it as low as possible seems appropriate.

There is an argument to be made that if this change impacts the numbers, then we're not measuring anything useful anyway. E.g. cost of the block activation is still in there for each iteration, so maybe it's not worth changing after all:

        [3+4] bench ==> '150,000,000 per second.'

        [1 to: 150000000 do: [:i | 3 + 4]] timeToRun ==> 386

... which suggests that the block activation has an almost 200% overhead in this case. But that is a fallacy in itself:

        [1 to: 150000000 do: [:i | 3 + 4.  3 + 4]] timeToRun ==> 373

... which suggests that the iteration has a 3000% overhead. At least in current Cog, whereas an optimizing JIT might reduce the whole thing into a no-op.

Yes, micro benchmarks are pretty meaningless.

Optimizing #bench does not make them more meaningful, but since it reduces the measurement error, it might still be worth doing?

- Bert -






smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Göran Krampe
Hi guys!

While on the subject of tinyBenchmarks (toying with comparing to
LuaJIT2), can someone explain a few things to me:

- Why do we take "500000 / <the-time-to-run-benchmark>" to mean
bytecodes/sec? I presume its because someone made a count at some point
that it takes 500000 bytecodes to find those primes? Is that still a
correct estimation/presumption?

- Why is benchFib not a correct Fibonacci sequence? The implementation
as it stands (seems to have been like this ever since 1998 when John
Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct
Fibonacci:
    #(1 1 3 5 9 15 25 41 67 109 177)

...while correct Fibonacci is (returning self, not 1, and not adding 1
in the recursion):
    #(0 1 1 2 3 5 8 13 21 34 55)

It almost seems like an odd optimization gone wrong - returning 1
instead of "self" when < 2 - and then trying to compansate for the fact
that "0 benchFib" should actually be 0 - by adding 1 to the result, but
missing the fact that this will add 1 on every recursive call?

I presume there is something smart going on here - that makes this count
"sends" better this way?

And if we just want to count sends - isn't there a better way?

Come on Bert - enlighten me! :)

Curious.

regards, Göran

Reply | Threaded
Open this post in threaded view
|

Re: Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Eliot Miranda-2
Hi Göran,

On Sep 1, 2014, at 6:06 AM, Göran Krampe <[hidden email]> wrote:

> Hi guys!
>
> While on the subject of tinyBenchmarks (toying with comparing to LuaJIT2), can someone explain a few things to me:
>
> - Why do we take "500000 / <the-time-to-run-benchmark>" to mean bytecodes/sec? I presume its because someone made a count at some point that it takes 500000 bytecodes to find those primes? Is that still a correct estimation/presumption?

That's right.  It's probably still close.  One can count the actual number by simulating the expression using (IIRC) run:atEachStep: which is in the class side of ContextPart.

> - Why is benchFib not a correct Fibonacci sequence?

BTW Fibonacci sequences have been generalized.  See Lucas Numbers on Wikipedia.  Ffor example the classic one is close to 2^N, but one which added the previous three results would be close to 3^N, etc ("tribonacci").

But the point of benchFib is that it adds one for each and every invocation whereas the classic one adds one for each leaf activation. Hence benchFib's result is the number if activations required to evaluate it and hence dividing the result by the time in seconds taken to compute it gives a rough measure of activations per second.  This really should be in the comment.

> The implementation as it stands (seems to have been like this ever since 1998 when John Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct Fibonacci:
>   #(1 1 3 5 9 15 25 41 67 109 177)
>
> ...while correct Fibonacci is (returning self, not 1, and not adding 1 in the recursion):
>   #(0 1 1 2 3 5 8 13 21 34 55)
>
> It almost seems like an odd optimization gone wrong - returning 1 instead of "self" when < 2 - and then trying to compansate for the fact that "0 benchFib" should actually be 0 - by adding 1 to the result, but missing the fact that this will add 1 on every recursive call?
>
> I presume there is something smart going on here - that makes this count "sends" better this way?
>
> And if we just want to count sends - isn't there a better way?
>
> Come on Bert - enlighten me! :)
>
> Curious.
>
> regards, Göran

Eliot (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Wondering about tinyBenchmarks... (was Re: [squeak-dev] Cog 2776 on Intel i7-4790K @ 4GHz)

Göran Krampe
Hi Eliot!

On 09/01/2014 03:39 PM, Eliot Miranda wrote:

> Hi Göran,
>
> On Sep 1, 2014, at 6:06 AM, Göran Krampe <[hidden email]> wrote:
>> Hi guys!
>>
>> While on the subject of tinyBenchmarks (toying with comparing to
>> LuaJIT2), can someone explain a few things to me:
>>
>> - Why do we take "500000 / <the-time-to-run-benchmark>" to mean
>> bytecodes/sec? I presume its because someone made a count at some
>> point that it takes 500000 bytecodes to find those primes? Is that
>> still a correct estimation/presumption?
>
> That's right.  It's probably still close.  One can count the actual
> number by simulating the expression using (IIRC) run:atEachStep:
> which is in the class side of ContextPart.

Ah, good. So I am not entirely stupid. :)

>> - Why is benchFib not a correct Fibonacci sequence?
>
> BTW Fibonacci sequences have been generalized.  See Lucas Numbers on
> Wikipedia.  Ffor example the classic one is close to 2^N, but one
> which added the previous three results would be close to 3^N, etc
> ("tribonacci").
>
> But the point of benchFib is that it adds one for each and every
> invocation whereas the classic one adds one for each leaf activation.
> Hence benchFib's result is the number if activations required to
> evaluate it and hence dividing the result by the time in seconds
> taken to compute it gives a rough measure of activations per second.
> This really should be in the comment.

Ah, great! Thank you, I knew it must be something "smart" :)

regards, Göran