identityHash or scaledIdentityHash ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

identityHash or scaledIdentityHash ?

Nicolas Cellier
Maybe Levente will help me clarifying this...
I encountered several different implementations:

MessageTally, Semaphore, FutureMaker, LocatedMethod, TextAnchor
hash
    ^self identityHash
(or ^someInstVar identityHash, which is equivalent)

Object
hash
   ^self scaledIdentityHash

The first does not scale well for large Set/Dictionary, and the second
costs a little overhed on small (Set/Dictionary).
My questions are:
- are these different implementations deliberate ? For example knowing
there won't be any large Set... In which case a comment would help
- or should we always use the scaledIdentityHash to hash ? (and let
identityHash for printing and a few well commented exceptional cases)

Beside, apart printing usage, I note these two usages of IdentityHash as hash:
MethodDictionary scanFor: (I guess we don't expect more than 4096
messages too often).
Player recaptureUniqueCostumes

I also noted that MethodDictionary does not override
scanForEmptySlotFor: which is defined in term either of hash or
scaledIdentityHash, rather than IdentityHash
This is currently OK, because the unique path I found was:

MethodDictionary>>#at:put
-> HashCollection>>atNewIndex:put:
-> MethodDictionary>>#grow
OK, grow overrides super, otherwise:
-> HashCollection>>#grow
-> HashCollection>>#growTo:
-> Dictionary>>#noCheckNoGrowFillFrom:
-> HashCollection>>#scanForEmptySlotFor:
-> Symbol>>#hash

But it's a potential tricky trap for future.

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: identityHash or scaledIdentityHash ?

Levente Uzonyi-2
On Thu, 24 Dec 2009, Nicolas Cellier wrote:

> Maybe Levente will help me clarifying this...
> I encountered several different implementations:
>
> MessageTally, Semaphore, FutureMaker, LocatedMethod, TextAnchor
> hash
>    ^self identityHash
> (or ^someInstVar identityHash, which is equivalent)
>
> Object
> hash
>   ^self scaledIdentityHash
>
> The first does not scale well for large Set/Dictionary, and the second
> costs a little overhed on small (Set/Dictionary).
> My questions are:
> - are these different implementations deliberate ? For example knowing
> there won't be any large Set... In which case a comment would help

No.

> - or should we always use the scaledIdentityHash to hash ? (and let
> identityHash for printing and a few well commented exceptional cases)

Exactly.

I quickly checked the classes and I think that:
- all of the above should be #scaledIdentityHash
- optionally MessageTally >> #hash should use the process's hash,
   since #= also compares the process variable

>
> Beside, apart printing usage, I note these two usages of IdentityHash as hash:
> MethodDictionary scanFor: (I guess we don't expect more than 4096
> messages too often).

MethodDictionary's array's size is a power of two, and #scaledIdentityHash
is just #identityHash bitShift: 18. Using #scaledIdentityHash would
degrade the performance to the minimum. The use of #hash instead of
#identityHash could help with large MethodDictionaries, since the keys are
Symbols, but the current code assumes that the size will never reach 4096.
And that's acceptable IMO.

> Player recaptureUniqueCostumes
>

That should use #scaledIdentityHash, since the set defined there mimics an
IdentitySet.

> I also noted that MethodDictionary does not override
> scanForEmptySlotFor: which is defined in term either of hash or
> scaledIdentityHash, rather than IdentityHash
> This is currently OK, because the unique path I found was:
>
> MethodDictionary>>#at:put
> -> HashCollection>>atNewIndex:put:
> -> MethodDictionary>>#grow
> OK, grow overrides super, otherwise:
> -> HashCollection>>#grow
> -> HashCollection>>#growTo:
> -> Dictionary>>#noCheckNoGrowFillFrom:
> -> HashCollection>>#scanForEmptySlotFor:
> -> Symbol>>#hash
>
> But it's a potential tricky trap for future.
>

Well, maybe. MethodDictionary has a different structure than other
HashedCollections and because of this it has to reimplement #grow and
#rehash. #scanForEmptySlotFor: is only used by those two methods in
general (WeakKeyDictionary is the only exception).


Levente

> Nicolas
>
>

Reply | Threaded
Open this post in threaded view
|

Re: identityHash or scaledIdentityHash ?

Nicolas Cellier
2009/12/24 Levente Uzonyi <[hidden email]>:

> On Thu, 24 Dec 2009, Nicolas Cellier wrote:
>
>> Maybe Levente will help me clarifying this...
>> I encountered several different implementations:
>>
>> MessageTally, Semaphore, FutureMaker, LocatedMethod, TextAnchor
>> hash
>>   ^self identityHash
>> (or ^someInstVar identityHash, which is equivalent)
>>
>> Object
>> hash
>>  ^self scaledIdentityHash
>>
>> The first does not scale well for large Set/Dictionary, and the second
>> costs a little overhed on small (Set/Dictionary).
>> My questions are:
>> - are these different implementations deliberate ? For example knowing
>> there won't be any large Set... In which case a comment would help
>
> No.
>
>> - or should we always use the scaledIdentityHash to hash ? (and let
>> identityHash for printing and a few well commented exceptional cases)
>
> Exactly.
>
> I quickly checked the classes and I think that:
> - all of the above should be #scaledIdentityHash
> - optionally MessageTally >> #hash should use the process's hash,
>  since #= also compares the process variable
>
>>
>> Beside, apart printing usage, I note these two usages of IdentityHash as
>> hash:
>> MethodDictionary scanFor: (I guess we don't expect more than 4096
>> messages too often).
>
> MethodDictionary's array's size is a power of two, and #scaledIdentityHash
> is just #identityHash bitShift: 18. Using #scaledIdentityHash would degrade
> the performance to the minimum. The use of #hash instead of #identityHash
> could help with large MethodDictionaries, since the keys are Symbols, but
> the current code assumes that the size will never reach 4096. And that's
> acceptable IMO.
>

Indeed, current trunk monster is far:
(ProtoObject withAllSubclasses asArray collect: [:e | e selectors size
-> e]) sort last.
  1167->Morph

We should accept only refactorings that reduce this number :)

>> Player recaptureUniqueCostumes
>>
>
> That should use #scaledIdentityHash, since the set defined there mimics an
> IdentitySet.
>
>> I also noted that MethodDictionary does not override
>> scanForEmptySlotFor: which is defined in term either of hash or
>> scaledIdentityHash, rather than IdentityHash
>> This is currently OK, because the unique path I found was:
>>
>> MethodDictionary>>#at:put
>> -> HashCollection>>atNewIndex:put:
>> -> MethodDictionary>>#grow
>> OK, grow overrides super, otherwise:
>> -> HashCollection>>#grow
>> -> HashCollection>>#growTo:
>> -> Dictionary>>#noCheckNoGrowFillFrom:
>> -> HashCollection>>#scanForEmptySlotFor:
>> -> Symbol>>#hash
>>
>> But it's a potential tricky trap for future.
>>
>
> Well, maybe. MethodDictionary has a different structure than other
> HashedCollections and because of this it has to reimplement #grow and
> #rehash. #scanForEmptySlotFor: is only used by those two methods in general
> (WeakKeyDictionary is the only exception).
>

You have a better view (conceptual) and maybe my fear was exagerated.
But it was not that easy to follow the senders the reverse-engineering way...

>
> Levente
>

Thanks for explanations.

>> Nicolas
>>
>>
>
>