bitXor: is slower than - on Spur32 (MacOSX) r3684

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

bitXor: is slower than - on Spur32 (MacOSX) r3684

Nicolas Cellier
 
Hi,
for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.

{
[(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
[(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
}
#('3,480,000 per second. 288 nanoseconds per run.'
 '4,950,000 per second. 202 nanoseconds per run.')

For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:

{
[(1234 bitOr: 5678) - 5678] bench.
[(1234 bitOr: 5678) bitXor: 5678] bench.
}

#('170,000,000 per second. 5.88 nanoseconds per run.'
  '133,000,000 per second. 7.53 nanoseconds per run.')

{
[(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
[(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
}
#('7,100,000 per second. 141 nanoseconds per run.'
'5,110,000 per second. 196 nanoseconds per run.')

OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?

Reply | Threaded
Open this post in threaded view
|

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

Eliot Miranda-2

Hi Nicolas,

> On Apr 23, 2016, at 2:22 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.
>
> {
> [(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
> [(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
> }
> #('3,480,000 per second. 288 nanoseconds per run.'
>  '4,950,000 per second. 202 nanoseconds per run.')
>
> For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:
>
> {
> [(1234 bitOr: 5678) - 5678] bench.
> [(1234 bitOr: 5678) bitXor: 5678] bench.
> }
>
> #('170,000,000 per second. 5.88 nanoseconds per run.'
>   '133,000,000 per second. 7.53 nanoseconds per run.')
>
> {
> [(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
> [(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
> }
> #('7,100,000 per second. 141 nanoseconds per run.'
> '5,110,000 per second. 196 nanoseconds per run.')
>
> OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?


I expect so.  You can test the primitive performance by invoking both via perform:with:, using e.g. #== or #= as a control.

I would suggest using the VM profiler, which will show you immediately but
a) you need the latest VM to run it in Mac OS X
b) it is currently dog slow because it gets confused by CoreAUC which it thinks has a 2.5Mb text symbol in and spends all its time tallying it
c) somewhere along the line Morohic changes result in its UI hiding behind an alignment morph the size of the inner window which has to be manually removed
So I have done maintenance to do before it's usable.  I fixed the VM.  The rest will follow soon.
Reply | Threaded
Open this post in threaded view
|

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

Eliot Miranda-2


> On Apr 23, 2016, at 2:44 PM, Eliot Miranda <[hidden email]> wrote:
>
> Hi Nicolas,
>
>> On Apr 23, 2016, at 2:22 PM, Nicolas Cellier <[hidden email]> wrote:
>>
>> Hi,
>> for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.
>>
>> {
>> [(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
>> [(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
>> }
>> #('3,480,000 per second. 288 nanoseconds per run.'
>> '4,950,000 per second. 202 nanoseconds per run.')
>>
>> For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:
>>
>> {
>> [(1234 bitOr: 5678) - 5678] bench.
>> [(1234 bitOr: 5678) bitXor: 5678] bench.
>> }
>>
>> #('170,000,000 per second. 5.88 nanoseconds per run.'
>>  '133,000,000 per second. 7.53 nanoseconds per run.')
>>
>> {
>> [(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
>> [(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
>> }
>> #('7,100,000 per second. 141 nanoseconds per run.'
>> '5,110,000 per second. 196 nanoseconds per run.')
>>
>> OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?
>
>
> I expect so.  You can test the primitive performance by invoking both via perform:with:, using e.g. #== or #= as a control.

Oh, and you can test the difference between inlined and no inlined #+ by eg copying a test method that uses bitXor: and replacing the #bitXor: literal with #+.  #+ is only inlined for special selector send 176, etc, not for ordinary sends.

_,,,^..^,,,_ (phone)

>
> I would suggest using the VM profiler, which will show you immediately but
> a) you need the latest VM to run it in Mac OS X
> b) it is currently dog slow because it gets confused by CoreAUC which it thinks has a 2.5Mb text symbol in and spends all its time tallying it
> c) somewhere along the line Morohic changes result in its UI hiding behind an alignment morph the size of the inner window which has to be manually removed
> So I have done maintenance to do before it's usable.  I fixed the VM.  The rest will follow soon.


Reply | Threaded
Open this post in threaded view
|

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

Levente Uzonyi
In reply to this post by Nicolas Cellier
 
My guess is that it is just the JIT optimizing #- for constant argument.
There's probably no such optimization for #bitXor:.
There's no difference when the argument is not a constant:

| a b |
a := 1234.
b := 5678.
{
[(a bitOr: b) - b] bench.
[(a bitOr: b) bitXor: b] bench.
}
  #('55,200,000 per second. 18.1 nanoseconds per run.' '55,600,000 per
second. 18 nanoseconds per run.')

Levente
Reply | Threaded
Open this post in threaded view
|

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

Eliot Miranda-2



> On Apr 23, 2016, at 5:43 PM, Levente Uzonyi <[hidden email]> wrote:
>
> My guess is that it is just the JIT optimizing #- for constant argument. There's probably no such optimization for #bitXor:.

Right.

> There's no difference when the argument is not a constant:
>
> | a b |
> a := 1234.
> b := 5678.
> {
> [(a bitOr: b) - b] bench.
> [(a bitOr: b) bitXor: b] bench.
> }
> #('55,200,000 per second. 18.1 nanoseconds per run.' '55,600,000 per second. 18 nanoseconds per run.'

>
> Levente