Equals and HashCode Builder

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Equals and HashCode Builder

Stephane Ducasse-3
Here is what I extracted from older emails

From David Allouche in Pharo-dev to see if
The ==hash== method is used in Pharo to separate objects in
bucket-based data structures.


!! Property

It must have one required property which is required data integrity:

!!!! A.
Objects that are equal according to ==\=== must have equal hashes.


It should have two desirable properties that are required to support
the algorithmic assumptions of bucket-based data structures:

!!!! B.
Objects that are not equal should have different hashes. Ideally, the
probability of hash collision should by 1/N where N is the size of
hash value space.

!!!! C. The hash values should spread as much as possible over their
value space. For example: ProtoObject>>#identityHash.




How these two desirable properties can be implemented depends on the
actual distribution of the values used to compute the hash.

An object can safely use XOR hashing if its instance variables are all
different objects that already provide a hash method with these two
properties.

@@todo check following sentence because the use of singleton is obscure

For example, an object whose behaviour is defined by several of
singleton delegates of different classes that do not override
Object>>#hash.

But if the instance variables are not guaranteed to have those
properties, it is necessary to more thoroughly "mix the bits". For
example the  method ==hash== of ==SequenceableCollection== illustrates
this point.


[[[
SequenceableCollection >> hash
| hash |
hash := self species hash.
1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply].
^ hash
]]]


The mixing of the bits is done by the combination of addition and
==hashMultiply==. The value of ==species hash== does not need to be
processed by ==hashMultiply==, because it is probably computed by the
method =identityHash== of ==ProtoObject==, and that already provides
property C.

!!! When we should not use XOR

==LayoutFrame== is a particularly good example of a data structure
that should not use XOR hashing. Its instance variables provide none
of the required properties:

- ==SmallInteger>>hash== is just ==^self==.
- In addition, common ==LayoutFrame== instances tend to use pairs of
identical values in their instance variables. Like ==0@0 corner:
1@1==, or ==Margin fromNumber: 10==.

A good hash function for LayoutFrame needs to:

- Get hashes for each instance variable that provides properties A and B.
- Combine them in a way that is order-sensitive (to maintain property
B) and that does some extra mixing (to provide property C).

Now, we can assume that common values for the instance variables will
be instances of ==SmallInteger==, ==Float==, or ==Fraction==.

You can see in ==Fraction>>hash==, that a comment mentions the
assumption that the fraction is already reduced, that is what makes it
acceptable to use bitXor.

The hash of ==SmallInteger== does provide properties A and B, but not
property C. The hash of ==Float== is harder to understand, but tests
show that it provides distinct values for 0.5, 0.25 and 0.125. So it
hopefully provides B for the range of values of interest to
==LayoutFrame==.

Another class that has similar hashing constraints to ==LayoutFrame==
is ==Point==. Its ==hash== method is:

[[[
Point >> hash
"Hash is reimplemented because = is implemented."
^(x hash hashMultiply + y hash) hashMultiply
]]]

It does not include species in the computation, which is less than
ideal because it favours collisions with objects of different species
that have a similar content and a the same hash algorithm.

Finally, it is probably not necessary or useful to use the accessors
to get to the instance variables.

So a good implementation for LayoutFrame is

[[[
LayoutFrame >> hash
| hash |
hash := self species hash
hash := (hash + leftFraction hash) hashMultiply
hash := (hash + leftOffset hash) hashMultiply
hash := (hash + topFraction hash) hashMultiply
hash := (hash + topOffset hash) hashMultiply
hash := (hash + rightFraction hash) hashMultiply
hash := (hash + rightOffset hash) hashMultiply
hash := (hash + bottomFraction hash) hashMultiply
hash := (hash + bottomOffset hash) hashMultiply
^ hash
]]]

On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse
<[hidden email]> wrote:

> Noury also proposed a trait to avoid to systematically have to implement hash.
> Vitor why don't you take it as a small project to propose a solution for Pharo.
>
> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]> wrote:
>> Yes, canEqual implementation also make #= be commutative.
>>
>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]>
>> wrote:
>>>
>>>
>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>:
>>>>
>>>>
>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>:
>>>>>
>>>>> I am sorry, not species, but #isKindOf istead of #= to compare classes.
>>>>
>>>>
>>>> It is bad idea. #= should be transitive.
>>>
>>>
>>> Oh, I used wrong word, shame on me :). I tried to say commutative.
>>>
>>>>
>>>> How you will generate it with isKindOf: logic? You need to know common
>>>> parent.
>>>>
>>>> Also I not remember cases where I was needed two instances of different
>>>> classes to be equal.
>>>> And I can imaging the problems which it will lead to.
>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>:
>>>>>>>
>>>>>>>
>>>>>>> Two questions/comments about the generated code:
>>>>>>> 1. #=
>>>>>>>         ...
>>>>>>>         self class = anObject class "should compare #species instead?"
>>>>>>>                 ifFalse: [ ^ false ].
>>>>>>>         ...
>>>>>>> Typically, I've seen #species instead of #class in the guard
>>>>>>> statement.
>>>>>>> Should we change it to that?
>>>>>>
>>>>>>
>>>>>> I doubt that it is important for domain classes. Because I never saw
>>>>>> the user of #species which is not a kind of Collection. And for collections
>>>>>> this refactoring is not valid anyway.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2. #hash
>>>>>>>         ^ var1 hash bitXor: (var2 hash bitXor: var3 hash)
>>>>>>> Is this implementation always safe? It's what I usually hand roll
>>>>>>> based on
>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on
>>>>>>> hashing, so
>>>>>>> I've always wondered if I was missing something…
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----
>>>>>>> Cheers,
>>>>>>> Sean
>>>>>>> --
>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Equals and HashCode Builder

Vitor Medina Cruz
Stephane

"Vitor why don't you take it as a small project to propose a solution for Pharo."

I will try :) I like the idea of using a Trait.


Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]> escreveu:
Here is what I extracted from older emails

From David Allouche in Pharo-dev to see if
The ==hash== method is used in Pharo to separate objects in
bucket-based data structures.


!! Property

It must have one required property which is required data integrity:

!!!! A.
Objects that are equal according to ==\=== must have equal hashes.


It should have two desirable properties that are required to support
the algorithmic assumptions of bucket-based data structures:

!!!! B.
Objects that are not equal should have different hashes. Ideally, the
probability of hash collision should by 1/N where N is the size of
hash value space.

!!!! C. The hash values should spread as much as possible over their
value space. For example: ProtoObject>>#identityHash.




How these two desirable properties can be implemented depends on the
actual distribution of the values used to compute the hash.

An object can safely use XOR hashing if its instance variables are all
different objects that already provide a hash method with these two
properties.

@@todo check following sentence because the use of singleton is obscure

For example, an object whose behaviour is defined by several of
singleton delegates of different classes that do not override
Object>>#hash.

But if the instance variables are not guaranteed to have those
properties, it is necessary to more thoroughly "mix the bits". For
example the  method ==hash== of ==SequenceableCollection== illustrates
this point.


[[[
SequenceableCollection >> hash
| hash |
hash := self species hash.
1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply].
^ hash
]]]


The mixing of the bits is done by the combination of addition and
==hashMultiply==. The value of ==species hash== does not need to be
processed by ==hashMultiply==, because it is probably computed by the
method =identityHash== of ==ProtoObject==, and that already provides
property C.

!!! When we should not use XOR

==LayoutFrame== is a particularly good example of a data structure
that should not use XOR hashing. Its instance variables provide none
of the required properties:

- ==SmallInteger>>hash== is just ==^self==.
- In addition, common ==LayoutFrame== instances tend to use pairs of
identical values in their instance variables. Like ==0@0 corner:
1@1==, or ==Margin fromNumber: 10==.

A good hash function for LayoutFrame needs to:

- Get hashes for each instance variable that provides properties A and B.
- Combine them in a way that is order-sensitive (to maintain property
B) and that does some extra mixing (to provide property C).

Now, we can assume that common values for the instance variables will
be instances of ==SmallInteger==, ==Float==, or ==Fraction==.

You can see in ==Fraction>>hash==, that a comment mentions the
assumption that the fraction is already reduced, that is what makes it
acceptable to use bitXor.

The hash of ==SmallInteger== does provide properties A and B, but not
property C. The hash of ==Float== is harder to understand, but tests
show that it provides distinct values for 0.5, 0.25 and 0.125. So it
hopefully provides B for the range of values of interest to
==LayoutFrame==.

Another class that has similar hashing constraints to ==LayoutFrame==
is ==Point==. Its ==hash== method is:

[[[
Point >> hash
"Hash is reimplemented because = is implemented."
^(x hash hashMultiply + y hash) hashMultiply
]]]

It does not include species in the computation, which is less than
ideal because it favours collisions with objects of different species
that have a similar content and a the same hash algorithm.

Finally, it is probably not necessary or useful to use the accessors
to get to the instance variables.

So a good implementation for LayoutFrame is

[[[
LayoutFrame >> hash
| hash |
hash := self species hash
hash := (hash + leftFraction hash) hashMultiply
hash := (hash + leftOffset hash) hashMultiply
hash := (hash + topFraction hash) hashMultiply
hash := (hash + topOffset hash) hashMultiply
hash := (hash + rightFraction hash) hashMultiply
hash := (hash + rightOffset hash) hashMultiply
hash := (hash + bottomFraction hash) hashMultiply
hash := (hash + bottomOffset hash) hashMultiply
^ hash
]]]

On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse
<[hidden email]> wrote:
> Noury also proposed a trait to avoid to systematically have to implement hash.
> Vitor why don't you take it as a small project to propose a solution for Pharo.
>
> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]> wrote:
>> Yes, canEqual implementation also make #= be commutative.
>>
>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]>
>> wrote:
>>>
>>>
>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>:
>>>>
>>>>
>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>:
>>>>>
>>>>> I am sorry, not species, but #isKindOf istead of #= to compare classes.
>>>>
>>>>
>>>> It is bad idea. #= should be transitive.
>>>
>>>
>>> Oh, I used wrong word, shame on me :). I tried to say commutative.
>>>
>>>>
>>>> How you will generate it with isKindOf: logic? You need to know common
>>>> parent.
>>>>
>>>> Also I not remember cases where I was needed two instances of different
>>>> classes to be equal.
>>>> And I can imaging the problems which it will lead to.
>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>:
>>>>>>>
>>>>>>>
>>>>>>> Two questions/comments about the generated code:
>>>>>>> 1. #=
>>>>>>>         ...
>>>>>>>         self class = anObject class "should compare #species instead?"
>>>>>>>                 ifFalse: [ ^ false ].
>>>>>>>         ...
>>>>>>> Typically, I've seen #species instead of #class in the guard
>>>>>>> statement.
>>>>>>> Should we change it to that?
>>>>>>
>>>>>>
>>>>>> I doubt that it is important for domain classes. Because I never saw
>>>>>> the user of #species which is not a kind of Collection. And for collections
>>>>>> this refactoring is not valid anyway.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2. #hash
>>>>>>>         ^ var1 hash bitXor: (var2 hash bitXor: var3 hash)
>>>>>>> Is this implementation always safe? It's what I usually hand roll
>>>>>>> based on
>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on
>>>>>>> hashing, so
>>>>>>> I've always wondered if I was missing something…
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----
>>>>>>> Cheers,
>>>>>>> Sean
>>>>>>> --
>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>


Reply | Threaded
Open this post in threaded view
|

Re: Equals and HashCode Builder

Stephane Ducasse-3
Contact noury because I do not remember.
But having a little builder is also nice.

On Fri, Oct 6, 2017 at 4:15 PM, Vitor Medina Cruz <[hidden email]> wrote:

> Stephane
>
> "Vitor why don't you take it as a small project to propose a solution for
> Pharo."
>
> I will try :) I like the idea of using a Trait.
>
>
> Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]>
> escreveu:
>
> Here is what I extracted from older emails
>
> From David Allouche in Pharo-dev to see if
> The ==hash== method is used in Pharo to separate objects in
> bucket-based data structures.
>
>
> !! Property
>
> It must have one required property which is required data integrity:
>
> !!!! A.
> Objects that are equal according to ==\=== must have equal hashes.
>
>
> It should have two desirable properties that are required to support
> the algorithmic assumptions of bucket-based data structures:
>
> !!!! B.
> Objects that are not equal should have different hashes. Ideally, the
> probability of hash collision should by 1/N where N is the size of
> hash value space.
>
> !!!! C. The hash values should spread as much as possible over their
> value space. For example: ProtoObject>>#identityHash.
>
>
>
>
> How these two desirable properties can be implemented depends on the
> actual distribution of the values used to compute the hash.
>
> An object can safely use XOR hashing if its instance variables are all
> different objects that already provide a hash method with these two
> properties.
>
> @@todo check following sentence because the use of singleton is obscure
>
> For example, an object whose behaviour is defined by several of
> singleton delegates of different classes that do not override
> Object>>#hash.
>
> But if the instance variables are not guaranteed to have those
> properties, it is necessary to more thoroughly "mix the bits". For
> example the  method ==hash== of ==SequenceableCollection== illustrates
> this point.
>
>
> [[[
> SequenceableCollection >> hash
> | hash |
> hash := self species hash.
> 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply].
> ^ hash
> ]]]
>
>
> The mixing of the bits is done by the combination of addition and
> ==hashMultiply==. The value of ==species hash== does not need to be
> processed by ==hashMultiply==, because it is probably computed by the
> method =identityHash== of ==ProtoObject==, and that already provides
> property C.
>
> !!! When we should not use XOR
>
> ==LayoutFrame== is a particularly good example of a data structure
> that should not use XOR hashing. Its instance variables provide none
> of the required properties:
>
> - ==SmallInteger>>hash== is just ==^self==.
> - In addition, common ==LayoutFrame== instances tend to use pairs of
> identical values in their instance variables. Like ==0@0 corner:
> 1@1==, or ==Margin fromNumber: 10==.
>
> A good hash function for LayoutFrame needs to:
>
> - Get hashes for each instance variable that provides properties A and B.
> - Combine them in a way that is order-sensitive (to maintain property
> B) and that does some extra mixing (to provide property C).
>
> Now, we can assume that common values for the instance variables will
> be instances of ==SmallInteger==, ==Float==, or ==Fraction==.
>
> You can see in ==Fraction>>hash==, that a comment mentions the
> assumption that the fraction is already reduced, that is what makes it
> acceptable to use bitXor.
>
> The hash of ==SmallInteger== does provide properties A and B, but not
> property C. The hash of ==Float== is harder to understand, but tests
> show that it provides distinct values for 0.5, 0.25 and 0.125. So it
> hopefully provides B for the range of values of interest to
> ==LayoutFrame==.
>
> Another class that has similar hashing constraints to ==LayoutFrame==
> is ==Point==. Its ==hash== method is:
>
> [[[
> Point >> hash
> "Hash is reimplemented because = is implemented."
> ^(x hash hashMultiply + y hash) hashMultiply
> ]]]
>
> It does not include species in the computation, which is less than
> ideal because it favours collisions with objects of different species
> that have a similar content and a the same hash algorithm.
>
> Finally, it is probably not necessary or useful to use the accessors
> to get to the instance variables.
>
> So a good implementation for LayoutFrame is
>
> [[[
> LayoutFrame >> hash
> | hash |
> hash := self species hash
> hash := (hash + leftFraction hash) hashMultiply
> hash := (hash + leftOffset hash) hashMultiply
> hash := (hash + topFraction hash) hashMultiply
> hash := (hash + topOffset hash) hashMultiply
> hash := (hash + rightFraction hash) hashMultiply
> hash := (hash + rightOffset hash) hashMultiply
> hash := (hash + bottomFraction hash) hashMultiply
> hash := (hash + bottomOffset hash) hashMultiply
> ^ hash
> ]]]
>
> On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse
> <[hidden email]> wrote:
>> Noury also proposed a trait to avoid to systematically have to implement
>> hash.
>> Vitor why don't you take it as a small project to propose a solution for
>> Pharo.
>>
>> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]>
>> wrote:
>>> Yes, canEqual implementation also make #= be commutative.
>>>
>>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]>
>>> wrote:
>>>>
>>>>
>>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>:
>>>>>
>>>>>
>>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>:
>>>>>>
>>>>>> I am sorry, not species, but #isKindOf istead of #= to compare
>>>>>> classes.
>>>>>
>>>>>
>>>>> It is bad idea. #= should be transitive.
>>>>
>>>>
>>>> Oh, I used wrong word, shame on me :). I tried to say commutative.
>>>>
>>>>>
>>>>> How you will generate it with isKindOf: logic? You need to know common
>>>>> parent.
>>>>>
>>>>> Also I not remember cases where I was needed two instances of different
>>>>> classes to be equal.
>>>>> And I can imaging the problems which it will lead to.
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>:
>>>>>>>>
>>>>>>>>
>>>>>>>> Two questions/comments about the generated code:
>>>>>>>> 1. #=
>>>>>>>>         ...
>>>>>>>>         self class = anObject class "should compare #species
>>>>>>>> instead?"
>>>>>>>>                 ifFalse: [ ^ false ].
>>>>>>>>         ...
>>>>>>>> Typically, I've seen #species instead of #class in the guard
>>>>>>>> statement.
>>>>>>>> Should we change it to that?
>>>>>>>
>>>>>>>
>>>>>>> I doubt that it is important for domain classes. Because I never saw
>>>>>>> the user of #species which is not a kind of Collection. And for
>>>>>>> collections
>>>>>>> this refactoring is not valid anyway.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. #hash
>>>>>>>>         ^ var1 hash bitXor: (var2 hash bitXor: var3 hash)
>>>>>>>> Is this implementation always safe? It's what I usually hand roll
>>>>>>>> based on
>>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on
>>>>>>>> hashing, so
>>>>>>>> I've always wondered if I was missing something…
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Cheers,
>>>>>>>> Sean
>>>>>>>> --
>>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Equals and HashCode Builder

Noury Bouraqadi-2
Hi,

I have developed few years ago a trait-based solution to make it easy to deal with equality.
By default, instances of the same class with equal IVs are equal + have the same hash code.
You can customize the list of IVs to compare, which is handy in case of IVs that store cache values.
All you have to do is to use the TEquality in your class.

The code + tests are here
http://smalltalkhub.com/#!/~CAR/ReusableBricks/packages/Equals

This is part of the "ReusableBricks" repo that gather packages we're using it in different projects in my team.
And it works fine.
I'd love to have this integrated in Pharo.

Note : I no longer receive mailing-lists emails, because I can't handle all the traffic. We can continue the conversation on Discord


Noury

>
>
> On 06 Oct 2017, at 18:31, Stephane Ducasse <[hidden email]> wrote:
>
> Contact noury because I do not remember.
> But having a little builder is also nice.
>
> On Fri, Oct 6, 2017 at 4:15 PM, Vitor Medina Cruz <[hidden email]> wrote:
>> Stephane
>>
>> "Vitor why don't you take it as a small project to propose a solution for
>> Pharo."
>>
>> I will try :) I like the idea of using a Trait.
>>
>>
>> Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]>
>> escreveu:
>>
>> Here is what I extracted from older emails
>>
>> From David Allouche in Pharo-dev to see if
>> The ==hash== method is used in Pharo to separate objects in
>> bucket-based data structures.
>>
>>
>> !! Property
>>
>> It must have one required property which is required data integrity:
>>
>> !!!! A.
>> Objects that are equal according to ==\=== must have equal hashes.
>>
>>
>> It should have two desirable properties that are required to support
>> the algorithmic assumptions of bucket-based data structures:
>>
>> !!!! B.
>> Objects that are not equal should have different hashes. Ideally, the
>> probability of hash collision should by 1/N where N is the size of
>> hash value space.
>>
>> !!!! C. The hash values should spread as much as possible over their
>> value space. For example: ProtoObject>>#identityHash.
>>
>>
>>
>>
>> How these two desirable properties can be implemented depends on the
>> actual distribution of the values used to compute the hash.
>>
>> An object can safely use XOR hashing if its instance variables are all
>> different objects that already provide a hash method with these two
>> properties.
>>
>> @@todo check following sentence because the use of singleton is obscure
>>
>> For example, an object whose behaviour is defined by several of
>> singleton delegates of different classes that do not override
>> Object>>#hash.
>>
>> But if the instance variables are not guaranteed to have those
>> properties, it is necessary to more thoroughly "mix the bits". For
>> example the  method ==hash== of ==SequenceableCollection== illustrates
>> this point.
>>
>>
>> [[[
>> SequenceableCollection >> hash
>> | hash |
>> hash := self species hash.
>> 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply].
>> ^ hash
>> ]]]
>>
>>
>> The mixing of the bits is done by the combination of addition and
>> ==hashMultiply==. The value of ==species hash== does not need to be
>> processed by ==hashMultiply==, because it is probably computed by the
>> method =identityHash== of ==ProtoObject==, and that already provides
>> property C.
>>
>> !!! When we should not use XOR
>>
>> ==LayoutFrame== is a particularly good example of a data structure
>> that should not use XOR hashing. Its instance variables provide none
>> of the required properties:
>>
>> - ==SmallInteger>>hash== is just ==^self==.
>> - In addition, common ==LayoutFrame== instances tend to use pairs of
>> identical values in their instance variables. Like ==0@0 corner:
>> 1@1==, or ==Margin fromNumber: 10==.
>>
>> A good hash function for LayoutFrame needs to:
>>
>> - Get hashes for each instance variable that provides properties A and B.
>> - Combine them in a way that is order-sensitive (to maintain property
>> B) and that does some extra mixing (to provide property C).
>>
>> Now, we can assume that common values for the instance variables will
>> be instances of ==SmallInteger==, ==Float==, or ==Fraction==.
>>
>> You can see in ==Fraction>>hash==, that a comment mentions the
>> assumption that the fraction is already reduced, that is what makes it
>> acceptable to use bitXor.
>>
>> The hash of ==SmallInteger== does provide properties A and B, but not
>> property C. The hash of ==Float== is harder to understand, but tests
>> show that it provides distinct values for 0.5, 0.25 and 0.125. So it
>> hopefully provides B for the range of values of interest to
>> ==LayoutFrame==.
>>
>> Another class that has similar hashing constraints to ==LayoutFrame==
>> is ==Point==. Its ==hash== method is:
>>
>> [[[
>> Point >> hash
>> "Hash is reimplemented because = is implemented."
>> ^(x hash hashMultiply + y hash) hashMultiply
>> ]]]
>>
>> It does not include species in the computation, which is less than
>> ideal because it favours collisions with objects of different species
>> that have a similar content and a the same hash algorithm.
>>
>> Finally, it is probably not necessary or useful to use the accessors
>> to get to the instance variables.
>>
>> So a good implementation for LayoutFrame is
>>
>> [[[
>> LayoutFrame >> hash
>> | hash |
>> hash := self species hash
>> hash := (hash + leftFraction hash) hashMultiply
>> hash := (hash + leftOffset hash) hashMultiply
>> hash := (hash + topFraction hash) hashMultiply
>> hash := (hash + topOffset hash) hashMultiply
>> hash := (hash + rightFraction hash) hashMultiply
>> hash := (hash + rightOffset hash) hashMultiply
>> hash := (hash + bottomFraction hash) hashMultiply
>> hash := (hash + bottomOffset hash) hashMultiply
>> ^ hash
>> ]]]
>>
>> On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse
>> <[hidden email]> wrote:
>>> Noury also proposed a trait to avoid to systematically have to implement
>>> hash.
>>> Vitor why don't you take it as a small project to propose a solution for
>>> Pharo.
>>>
>>> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]>
>>> wrote:
>>>> Yes, canEqual implementation also make #= be commutative.
>>>>
>>>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]>
>>>> wrote:
>>>>>
>>>>>
>>>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>:
>>>>>>
>>>>>>
>>>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>:
>>>>>>>
>>>>>>> I am sorry, not species, but #isKindOf istead of #= to compare
>>>>>>> classes.
>>>>>>
>>>>>>
>>>>>> It is bad idea. #= should be transitive.
>>>>>
>>>>>
>>>>> Oh, I used wrong word, shame on me :). I tried to say commutative.
>>>>>
>>>>>>
>>>>>> How you will generate it with isKindOf: logic? You need to know common
>>>>>> parent.
>>>>>>
>>>>>> Also I not remember cases where I was needed two instances of different
>>>>>> classes to be equal.
>>>>>> And I can imaging the problems which it will lead to.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Two questions/comments about the generated code:
>>>>>>>>> 1. #=
>>>>>>>>>        ...
>>>>>>>>>        self class = anObject class "should compare #species
>>>>>>>>> instead?"
>>>>>>>>>                ifFalse: [ ^ false ].
>>>>>>>>>        ...
>>>>>>>>> Typically, I've seen #species instead of #class in the guard
>>>>>>>>> statement.
>>>>>>>>> Should we change it to that?
>>>>>>>>
>>>>>>>>
>>>>>>>> I doubt that it is important for domain classes. Because I never saw
>>>>>>>> the user of #species which is not a kind of Collection. And for
>>>>>>>> collections
>>>>>>>> this refactoring is not valid anyway.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2. #hash
>>>>>>>>>        ^ var1 hash bitXor: (var2 hash bitXor: var3 hash)
>>>>>>>>> Is this implementation always safe? It's what I usually hand roll
>>>>>>>>> based on
>>>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on
>>>>>>>>> hashing, so
>>>>>>>>> I've always wondered if I was missing something…
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----
>>>>>>>>> Cheers,
>>>>>>>>> Sean
>>>>>>>>> --
>>>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>


Reply | Threaded
Open this post in threaded view
|

Re: Equals and HashCode Builder

Sean P. DeNigris
Administrator
Noury Bouraqadi-2 wrote
> The code + tests are here
> http://smalltalkhub.com/#!/~CAR/ReusableBricks/packages/Equals

I forked to https://github.com/seandenigris/Pharo-Equals with a baseline to
make it easier to try out/possibly-integrate.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
12