Is it always needed to redefine #hash message when you redefine #= message?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it always needed to redefine #hash message when you redefine #= message?

Julien Delplanque
The subject of this mail is exactly my question.

I came to this question by looking at Object>>#= message implementation.

"""
= anObject
     "Answer whether the receiver and the argument represent the same
     object. If = is redefined in any subclass, consider also redefining
the
     message hash."

     ^self == anObject
"""

When do we need to redefine #hash message?

Is it the right way to implement equality between two objects or is
there another message that I should override?

Regards,
Julien

Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

CyrilFerlicot
I think you redefine hash to allow a Set to know if two objects are
the same or not.
So it's not an equals but it's a way for a Set to know if an
equivalent element is already on a Set or not.
Correct me if i'm wrong.

On 26 May 2015 at 20:45, Julien Delplanque <[hidden email]> wrote:

> The subject of this mail is exactly my question.
>
> I came to this question by looking at Object>>#= message implementation.
>
> """
> = anObject
>     "Answer whether the receiver and the argument represent the same
>     object. If = is redefined in any subclass, consider also redefining the
>     message hash."
>
>     ^self == anObject
> """
>
> When do we need to redefine #hash message?
>
> Is it the right way to implement equality between two objects or is there
> another message that I should override?
>
> Regards,
> Julien
>



--
Cheers
Cyril Ferlicot

Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

Esteban A. Maringolo
In reply to this post by Julien Delplanque
2015-05-26 15:45 GMT-03:00 Julien Delplanque <[hidden email]>:

> The subject of this mail is exactly my question.
>
> I came to this question by looking at Object>>#= message implementation.
>
> """
> = anObject
>     "Answer whether the receiver and the argument represent the same
>     object. If = is redefined in any subclass, consider also redefining the
>     message hash."
>
>     ^self == anObject
> """
>
> When do we need to redefine #hash message?

Whenever you redefine #=.

> Is it the right way to implement equality between two objects or is there
> another message that I should override?

#hash, as per the #= suggests.

Hashed collections (Set mainly, but there are others) index and lookup
objects by its hash value.

Esteban A. Maringolo

ps: The proper implementation of #hash to avoid collisions (two
different objects sharing the same hash) can be plain simple or a
little more complex (mathematically challenging).

Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

Julien Delplanque
Mmmh,

Let's assume I have a Letter object.
This object have a body (String) and a title (also a String).

I want to define that two Letter objects are equals if and only if
theirs bodies are equals and theirs titles are equals.

How do I implement it? I don't understand why I should override #hash in
this case?

Julien

On 26/05/15 20:54, Esteban A. Maringolo wrote:

> 2015-05-26 15:45 GMT-03:00 Julien Delplanque <[hidden email]>:
>> The subject of this mail is exactly my question.
>>
>> I came to this question by looking at Object>>#= message implementation.
>>
>> """
>> = anObject
>>      "Answer whether the receiver and the argument represent the same
>>      object. If = is redefined in any subclass, consider also redefining the
>>      message hash."
>>
>>      ^self == anObject
>> """
>>
>> When do we need to redefine #hash message?
> Whenever you redefine #=.
>
>> Is it the right way to implement equality between two objects or is there
>> another message that I should override?
> #hash, as per the #= suggests.
>
> Hashed collections (Set mainly, but there are others) index and lookup
> objects by its hash value.
>
> Esteban A. Maringolo
>
> ps: The proper implementation of #hash to avoid collisions (two
> different objects sharing the same hash) can be plain simple or a
> little more complex (mathematically challenging).
>


Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

Carlo-2
In reply to this post by Julien Delplanque
Hi

Any two objects that are = must answer the same #hash value.
The hash is used in any of the HashedCollection's for storage and then to find the exact element the #= is used.
If 2 objects are #= but their hash values are different the HashedCollection will not find the correct storage slot and you will have undefined behaviour when looking up objects.

Out of interest, Andres Valloud wrote a whole book on 'Hashing in Smalltalk' (http://www.lulu.com/content/1455536)

Also the Javadoc comments are pretty good in explaining the usage:
> The general contract of hashCode is:
> Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
> It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
> As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

Cheers
Carlo

On 26 May 2015, at 8:45 PM, Julien Delplanque <[hidden email]> wrote:

The subject of this mail is exactly my question.

I came to this question by looking at Object>>#= message implementation.

"""
= anObject
   "Answer whether the receiver and the argument represent the same
   object. If = is redefined in any subclass, consider also redefining the
   message hash."

   ^self == anObject
"""

When do we need to redefine #hash message?

Is it the right way to implement equality between two objects or is there another message that I should override?

Regards,
Julien



Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

Julien Delplanque
Thanks, I understand now :)

On 26/05/15 21:06, Carlo wrote:

> Hi
>
> Any two objects that are = must answer the same #hash value.
> The hash is used in any of the HashedCollection's for storage and then to find the exact element the #= is used.
> If 2 objects are #= but their hash values are different the HashedCollection will not find the correct storage slot and you will have undefined behaviour when looking up objects.
>
> Out of interest, Andres Valloud wrote a whole book on 'Hashing in Smalltalk' (http://www.lulu.com/content/1455536)
>
> Also the Javadoc comments are pretty good in explaining the usage:
>> The general contract of hashCode is:
>> Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
>> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
>> It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
>> As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
> Cheers
> Carlo
>
> On 26 May 2015, at 8:45 PM, Julien Delplanque <[hidden email]> wrote:
>
> The subject of this mail is exactly my question.
>
> I came to this question by looking at Object>>#= message implementation.
>
> """
> = anObject
>     "Answer whether the receiver and the argument represent the same
>     object. If = is redefined in any subclass, consider also redefining the
>     message hash."
>
>     ^self == anObject
> """
>
> When do we need to redefine #hash message?
>
> Is it the right way to implement equality between two objects or is there another message that I should override?
>
> Regards,
> Julien
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

Henrik Sperre Johansen
Also implied, but seldom stated, is that all values used in = and hash must only be set at creation, never after.
The reason is the same, keeping hashed collection usage sane;
p := Point x: 3 y: 2.
Set add: p.
"Imaginary method"
p setX: 2.
Set includes: p -> false

Cheers,
Henry

> On 26 May 2015, at 9:11 , Julien Delplanque <[hidden email]> wrote:
>
> Thanks, I understand now :)
>
> On 26/05/15 21:06, Carlo wrote:
>> Hi
>>
>> Any two objects that are = must answer the same #hash value.
>> The hash is used in any of the HashedCollection's for storage and then to find the exact element the #= is used.
>> If 2 objects are #= but their hash values are different the HashedCollection will not find the correct storage slot and you will have undefined behaviour when looking up objects.
>>
>> Out of interest, Andres Valloud wrote a whole book on 'Hashing in Smalltalk' (http://www.lulu.com/content/1455536)
>>
>> Also the Javadoc comments are pretty good in explaining the usage:
>>> The general contract of hashCode is:
>>> Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
>>> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
>>> It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
>>> As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
>> Cheers
>> Carlo
>>
>> On 26 May 2015, at 8:45 PM, Julien Delplanque <[hidden email]> wrote:
>>
>> The subject of this mail is exactly my question.
>>
>> I came to this question by looking at Object>>#= message implementation.
>>
>> """
>> = anObject
>>    "Answer whether the receiver and the argument represent the same
>>    object. If = is redefined in any subclass, consider also redefining the
>>    message hash."
>>
>>    ^self == anObject
>> """
>>
>> When do we need to redefine #hash message?
>>
>> Is it the right way to implement equality between two objects or is there another message that I should override?
>>
>> Regards,
>> Julien
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Is it always needed to redefine #hash message when you redefine #= message?

abergel
Wow, compelling example!

Alexandre


> On May 29, 2015, at 12:33 PM, Henrik Johansen <[hidden email]> wrote:
>
> Also implied, but seldom stated, is that all values used in = and hash must only be set at creation, never after.
> The reason is the same, keeping hashed collection usage sane;
> p := Point x: 3 y: 2.
> Set add: p.
> "Imaginary method"
> p setX: 2.
> Set includes: p -> false
>
> Cheers,
> Henry
>
>> On 26 May 2015, at 9:11 , Julien Delplanque <[hidden email]> wrote:
>>
>> Thanks, I understand now :)
>>
>> On 26/05/15 21:06, Carlo wrote:
>>> Hi
>>>
>>> Any two objects that are = must answer the same #hash value.
>>> The hash is used in any of the HashedCollection's for storage and then to find the exact element the #= is used.
>>> If 2 objects are #= but their hash values are different the HashedCollection will not find the correct storage slot and you will have undefined behaviour when looking up objects.
>>>
>>> Out of interest, Andres Valloud wrote a whole book on 'Hashing in Smalltalk' (http://www.lulu.com/content/1455536)
>>>
>>> Also the Javadoc comments are pretty good in explaining the usage:
>>>> The general contract of hashCode is:
>>>> Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
>>>> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
>>>> It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
>>>> As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
>>> Cheers
>>> Carlo
>>>
>>> On 26 May 2015, at 8:45 PM, Julien Delplanque <[hidden email]> wrote:
>>>
>>> The subject of this mail is exactly my question.
>>>
>>> I came to this question by looking at Object>>#= message implementation.
>>>
>>> """
>>> = anObject
>>>   "Answer whether the receiver and the argument represent the same
>>>   object. If = is redefined in any subclass, consider also redefining the
>>>   message hash."
>>>
>>>   ^self == anObject
>>> """
>>>
>>> When do we need to redefine #hash message?
>>>
>>> Is it the right way to implement equality between two objects or is there another message that I should override?
>>>
>>> Regards,
>>> Julien
>>>
>>>
>>>
>>
>>
>
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.