Smalltalk › Squeak › Squeak VM

bitXor: is slower than - on Spur32 (MacOSX) r3684

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Nicolas Cellier

bitXor: is slower than - on Spur32 (MacOSX) r3684

Hi,
for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.

{
[(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
[(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
}
#('3,480,000 per second. 288 nanoseconds per run.'
'4,950,000 per second. 202 nanoseconds per run.')

For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:

{
[(1234 bitOr: 5678) - 5678] bench.
[(1234 bitOr: 5678) bitXor: 5678] bench.
}

#('170,000,000 per second. 5.88 nanoseconds per run.'
'133,000,000 per second. 7.53 nanoseconds per run.')

{
[(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
[(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
}
#('7,100,000 per second. 141 nanoseconds per run.'
'5,110,000 per second. 196 nanoseconds per run.')

OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?

Eliot Miranda-2

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

Hi Nicolas,

> On Apr 23, 2016, at 2:22 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.
>
> {
> [(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
> [(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
> }
> #('3,480,000 per second. 288 nanoseconds per run.'
> '4,950,000 per second. 202 nanoseconds per run.')
>
> For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:
>
> {
> [(1234 bitOr: 5678) - 5678] bench.
> [(1234 bitOr: 5678) bitXor: 5678] bench.
> }
>
> #('170,000,000 per second. 5.88 nanoseconds per run.'
> '133,000,000 per second. 7.53 nanoseconds per run.')
>
> {
> [(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
> [(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
> }
> #('7,100,000 per second. 141 nanoseconds per run.'
> '5,110,000 per second. 196 nanoseconds per run.')
>
> OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?

I expect so. You can test the primitive performance by invoking both via perform:with:, using e.g. #== or #= as a control.

I would suggest using the VM profiler, which will show you immediately but
a) you need the latest VM to run it in Mac OS X
b) it is currently dog slow because it gets confused by CoreAUC which it thinks has a 2.5Mb text symbol in and spends all its time tallying it
c) somewhere along the line Morohic changes result in its UI hiding behind an alignment morph the size of the inner window which has to be manually removed
So I have done maintenance to do before it's usable. I fixed the VM. The rest will follow soon.

Eliot Miranda-2

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

> On Apr 23, 2016, at 2:44 PM, Eliot Miranda <[hidden email]> wrote:
>
> Hi Nicolas,
>
>> On Apr 23, 2016, at 2:22 PM, Nicolas Cellier <[hidden email]> wrote:
>>
>> Hi,
>> for large integers (> 64 bits), a bitClear: implementation using bitXor: is faster than by using - indeed we don't have to care of carry.
>>
>> {
>> [(12398767868759866578644 bitOr: 567812564128976768553) - 567812564128976768553] bench.
>> [(12398767868759866578644 bitOr: 567812564128976768553) bitXor: 567812564128976768553] bench.
>> }
>> #('3,480,000 per second. 288 nanoseconds per run.'
>> '4,950,000 per second. 202 nanoseconds per run.')
>>
>> For large integers < 64 bits or small integers, I would expect sort of similar speed, but bitXor: is slower:
>>
>> {
>> [(1234 bitOr: 5678) - 5678] bench.
>> [(1234 bitOr: 5678) bitXor: 5678] bench.
>> }
>>
>> #('170,000,000 per second. 5.88 nanoseconds per run.'
>> '133,000,000 per second. 7.53 nanoseconds per run.')
>>
>> {
>> [(12398767868754 bitOr: 5678125641253) - 5678125641253] bench.
>> [(12398767868754 bitOr: 5678125641253) bitXor: 5678125641253] bench.
>> }
>> #('7,100,000 per second. 141 nanoseconds per run.'
>> '5,110,000 per second. 196 nanoseconds per run.')
>>
>> OK, (Smalltalk specialSelectors includes: #bitXor:) is false, but is it the sole explanation?
>
>
> I expect so. You can test the primitive performance by invoking both via perform:with:, using e.g. #== or #= as a control.

Oh, and you can test the difference between inlined and no inlined #+ by eg copying a test method that uses bitXor: and replacing the #bitXor: literal with #+. #+ is only inlined for special selector send 176, etc, not for ordinary sends.

_,,,^..^,,,_ (phone)

>
> I would suggest using the VM profiler, which will show you immediately but
> a) you need the latest VM to run it in Mac OS X
> b) it is currently dog slow because it gets confused by CoreAUC which it thinks has a 2.5Mb text symbol in and spends all its time tallying it
> c) somewhere along the line Morohic changes result in its UI hiding behind an alignment morph the size of the inner window which has to be manually removed
> So I have done maintenance to do before it's usable. I fixed the VM. The rest will follow soon.

Levente Uzonyi

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

In reply to this post by Nicolas Cellier

My guess is that it is just the JIT optimizing #- for constant argument.
There's probably no such optimization for #bitXor:.
There's no difference when the argument is not a constant:

| a b |
a := 1234.
b := 5678.
{
[(a bitOr: b) - b] bench.
[(a bitOr: b) bitXor: b] bench.
}
#('55,200,000 per second. 18.1 nanoseconds per run.' '55,600,000 per
second. 18 nanoseconds per run.')

Levente

Eliot Miranda-2

Re: bitXor: is slower than - on Spur32 (MacOSX) r3684

> On Apr 23, 2016, at 5:43 PM, Levente Uzonyi <[hidden email]> wrote:
>
> My guess is that it is just the JIT optimizing #- for constant argument. There's probably no such optimization for #bitXor:.

Right.

> There's no difference when the argument is not a constant:
>
> | a b |
> a := 1234.
> b := 5678.
> {
> [(a bitOr: b) - b] bench.
> [(a bitOr: b) bitXor: b] bench.
> }
> #('55,200,000 per second. 18.1 nanoseconds per run.' '55,600,000 per second. 18 nanoseconds per run.'

>
> Levente