64 bit specific hash multiply

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

64 bit specific hash multiply

Bert Freudenberg
(moved from vm-dev)

On Tue, Apr 18, 2017 at 10:27 AM, Nicolas Cellier <[hidden email]> wrote:
 
2017-04-18 4:02 GMT+02:00 Andres Valloud <[hidden email]>:

On 4/17/17 16:45 , Eliot Miranda wrote:
FYI, hash multiply is
hashMultiply
| low |
low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low)
bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF

[...] that convoluted arithmetic is done this way in Smalltalk only to simulate a 32x32 multiplication bit-anded down to 28 bits without overflowing into large integers (the original code from August 2000 had my initials).

At a sufficiently low level such as C, all that complexity is just an unsigned multiplication by 1664525.  The image code should still have a comment to that effect, did it get lost?

Andres.

More: if it's a 64 bit image, then we have 60 bit long unsigned small int
since 1664525 highBit = 21, and self is a hash result not exceeding 30 bits, we can implement hashMultiply as
    ^self * 1664525 bitAnd: 16r0FFFFFFF

Yes I wondered about that too. How could we get different behavior in the 32 vs 64 bit image, while running the same code?

One way would be to make SmallInteger abstract and have concrete SmallInteger32 and SmallInteger64 subclasses. In a 32 bit image you would only get SmallInteger32 instances, and in a 64 bit image only SmallInteger64 instances. Most methods would remain in SmallInteger, but 32 or 64 bit specific behavior could be implemented in the subclasses. Seems very clean, but sounds like overkill.

A more hackish but less intrusive way would be to have a Monticello package with *overrides which would only be loaded in 64 bit images. Then again, we try to avoid overrides, and we want to use the same config map in both 32 and 64 bit images.

Or how about a little compiler hack? Basically the code could read

hashMultiply
Smalltalk is64Bits
ifTrue: [^self * 1664525 bitAnd: 16r0FFFFFFF]
ifFalse: [
| low |
low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF]

and the compiler could recognize that "Smalltalk is64Bits" is constant and generate only one of the branches ... I actually quite like this idea, I think :)

- Bert - 


Reply | Threaded
Open this post in threaded view
|

Re: 64 bit specific hash multiply

Eliot Miranda-2


On Tue, Apr 18, 2017 at 10:49 AM, Bert Freudenberg <[hidden email]> wrote:
(moved from vm-dev)

On Tue, Apr 18, 2017 at 10:27 AM, Nicolas Cellier <[hidden email]> wrote:
 
2017-04-18 4:02 GMT+02:00 Andres Valloud <[hidden email]>:

On 4/17/17 16:45 , Eliot Miranda wrote:
FYI, hash multiply is
hashMultiply
| low |
low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low)
bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF

[...] that convoluted arithmetic is done this way in Smalltalk only to simulate a 32x32 multiplication bit-anded down to 28 bits without overflowing into large integers (the original code from August 2000 had my initials).

At a sufficiently low level such as C, all that complexity is just an unsigned multiplication by 1664525.  The image code should still have a comment to that effect, did it get lost?

Andres.

More: if it's a 64 bit image, then we have 60 bit long unsigned small int
since 1664525 highBit = 21, and self is a hash result not exceeding 30 bits, we can implement hashMultiply as
    ^self * 1664525 bitAnd: 16r0FFFFFFF

Yes I wondered about that too. How could we get different behavior in the 32 vs 64 bit image, while running the same code?

One way would be to make SmallInteger abstract and have concrete SmallInteger32 and SmallInteger64 subclasses. In a 32 bit image you would only get SmallInteger32 instances, and in a 64 bit image only SmallInteger64 instances. Most methods would remain in SmallInteger, but 32 or 64 bit specific behavior could be implemented in the subclasses. Seems very clean, but sounds like overkill.

Cool idea but I agree; it's overkill.

A more hackish but less intrusive way would be to have a Monticello package with *overrides which would only be loaded in 64 bit images. Then again, we try to avoid overrides, and we want to use the same config map in both 32 and 64 bit images.

Or how about a little compiler hack? Basically the code could read

hashMultiply
Smalltalk is64Bits
ifTrue: [^self * 1664525 bitAnd: 16r0FFFFFFF]
ifFalse: [
| low |
low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF]

BTW, I would replace Smalltalk is64Bits by a class variable updated at start up. But...
 

and the compiler could recognize that "Smalltalk is64Bits" is constant and generate only one of the branches ... I actually quite like this idea, I think :)

KISS :-)  I just committed

SmallInteger>>hashMultiply
"This is a multiplication by by 1664525 mod 2^28 written to avoid overflowing into large integers.
The primitive is able to perform the operation with modulo arihmetic."
<primitive: 159>
| low |
low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF
 
and this is on the fast side in the latest Cog VMs :-)

159?  All the multiply primitives end in 9 :-)

- Bert - 

_,,,^..^,,,_
best, Eliot