hash and Collections

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

hash and Collections

EstebanLM
Hi,

Can anyone explain me how this is correct?

Collection>>#hash
        "Answer an integer hash value for the receiver such that,
          -- the hash value of an unchanged object is constant over time, and
          -- two equal objects have equal hash values"

        | hash |

        hash := self species hash.
        self size <= 10 ifTrue:
                [self do: [:elem | hash := hash bitXor: elem hash]].
        ^hash bitXor: self size hash


I mean… a hash that changes when I add elements to the collection… how can this work?

Esteban
Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

Levente Uzonyi-2
You simply don't modify _objects_ which are used as keys in hashed
collections. Or if you do so, you'll have to deal with the consequences
yourself.

Levente

On Fri, 9 Oct 2015, Esteban Lorenzano wrote:

> Hi,
>
> Can anyone explain me how this is correct?
>
> Collection>>#hash
> "Answer an integer hash value for the receiver such that,
>  -- the hash value of an unchanged object is constant over time, and
>  -- two equal objects have equal hash values"
>
> | hash |
>
> hash := self species hash.
> self size <= 10 ifTrue:
> [self do: [:elem | hash := hash bitXor: elem hash]].
> ^hash bitXor: self size hash
>
>
> I mean… a hash that changes when I add elements to the collection… how can this work?
>
> Esteban
>
Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

EstebanLM
but that’s an error, I think.
a collection has to have always same hash… no matter its size (because is not an array) and no matter his elements.


> On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email]> wrote:
>
> You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.
>
> Levente
>
> On Fri, 9 Oct 2015, Esteban Lorenzano wrote:
>
>> Hi,
>>
>> Can anyone explain me how this is correct?
>>
>> Collection>>#hash
>> "Answer an integer hash value for the receiver such that,
>>  -- the hash value of an unchanged object is constant over time, and
>>  -- two equal objects have equal hash values"
>>
>> | hash |
>>
>> hash := self species hash.
>> self size <= 10 ifTrue:
>> [self do: [:elem | hash := hash bitXor: elem hash]].
>> ^hash bitXor: self size hash
>>
>>
>> I mean… a hash that changes when I add elements to the collection… how can this work?
>>
>> Esteban


Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

Henrik Nergaard
The identityHash should stay the same for any number of changes to the array..

| c1 c2 |

c1 := { #a . #b } asOrderedCollection.
c2 := { #a . #b } asOrderedCollection.

{
        c1 hash.
        c1 identityHash.
        c2 hash.
        c2 identityHash .
        (c2 add: #c; yourself) identityHash.
} "inspect"

Best regards,
Henrik

-----Original Message-----
From: Pharo-dev [mailto:[hidden email]] On Behalf Of Esteban Lorenzano
Sent: Friday, October 9, 2015 4:08 PM
To: Pharo Development List <[hidden email]>
Subject: Re: [Pharo-dev] hash and Collections

but that’s an error, I think.
a collection has to have always same hash… no matter its size (because is not an array) and no matter his elements.


> On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email]> wrote:
>
> You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.
>
> Levente
>
> On Fri, 9 Oct 2015, Esteban Lorenzano wrote:
>
>> Hi,
>>
>> Can anyone explain me how this is correct?
>>
>> Collection>>#hash
>> "Answer an integer hash value for the receiver such that,
>>  -- the hash value of an unchanged object is constant over time, and
>>  -- two equal objects have equal hash values"
>>
>> | hash |
>>
>> hash := self species hash.
>> self size <= 10 ifTrue:
>> [self do: [:elem | hash := hash bitXor: elem hash]].
>> ^hash bitXor: self size hash
>>
>>
>> I mean… a hash that changes when I add elements to the collection… how can this work?
>>
>> Esteban


Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

gcotelli
In reply to this post by EstebanLM
If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).

I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.

On Fri, Oct 9, 2015 at 11:07 AM, Esteban Lorenzano <[hidden email]> wrote:
but that’s an error, I think.
a collection has to have always same hash… no matter its size (because is not an array) and no matter his elements.


> On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email]> wrote:
>
> You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.
>
> Levente
>
> On Fri, 9 Oct 2015, Esteban Lorenzano wrote:
>
>> Hi,
>>
>> Can anyone explain me how this is correct?
>>
>> Collection>>#hash
>>      "Answer an integer hash value for the receiver such that,
>>        -- the hash value of an unchanged object is constant over time, and
>>        -- two equal objects have equal hash values"
>>
>>      | hash |
>>
>>      hash := self species hash.
>>      self size <= 10 ifTrue:
>>              [self do: [:elem | hash := hash bitXor: elem hash]].
>>      ^hash bitXor: self size hash
>>
>>
>> I mean… a hash that changes when I add elements to the collection… how can this work?
>>
>> Esteban



Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

EstebanLM

On 09 Oct 2015, at 16:21, Gabriel Cotelli <[hidden email]> wrote:

If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).

it is not :)


I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.

no idea… I’m not using it, just arrive to that method random and I cannot understand it. 
I still believe is wrong. 

Esteban


On Fri, Oct 9, 2015 at 11:07 AM, Esteban Lorenzano <[hidden email]> wrote:
but that’s an error, I think.
a collection has to have always same hash… no matter its size (because is not an array) and no matter his elements.


> On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email]> wrote:
>
> You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.
>
> Levente
>
> On Fri, 9 Oct 2015, Esteban Lorenzano wrote:
>
>> Hi,
>>
>> Can anyone explain me how this is correct?
>>
>> Collection>>#hash
>>      "Answer an integer hash value for the receiver such that,
>>        -- the hash value of an unchanged object is constant over time, and
>>        -- two equal objects have equal hash values"
>>
>>      | hash |
>>
>>      hash := self species hash.
>>      self size <= 10 ifTrue:
>>              [self do: [:elem | hash := hash bitXor: elem hash]].
>>      ^hash bitXor: self size hash
>>
>>
>> I mean… a hash that changes when I add elements to the collection… how can this work?
>>
>> Esteban




Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

Henrik Sperre Johansen

On 09 Oct 2015, at 4:33 , Esteban Lorenzano <[hidden email]> wrote:


On 09 Oct 2015, at 16:21, Gabriel Cotelli <[hidden email]> wrote:

If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).

it is not :)


I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.

no idea… I’m not using it, just arrive to that method random and I cannot understand it. 
I still believe is wrong. 
It's not.

Either Dictionary >> #= or IdentityDictionary >> #species in Pharo *is* broken though.

dic1 := Dictionary newFrom:{1 -> 4}.
dic2 := IdentityDictionary newFrom: dic1 associations .

dic1 = dic2     true
dic1 hash = dic2 hash false

Cheers,
Henry


signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

abergel
In reply to this post by EstebanLM
Hi Esteban,

I understand your surprise, but it has to be like that.

Two objects that are equals have to have the same hash value. This is part of the contract. 
And to comply with this rule with collections, the hash value has to reflect the content of the collection.

Java has the same behavior.

Cheers,
Alexandre
-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.



On Oct 9, 2015, at 11:07 AM, Esteban Lorenzano <[hidden email]> wrote:

but that’s an error, I think.
a collection has to have always same hash… no matter its size (because is not an array) and no matter his elements.


On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email]> wrote:

You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.

Levente

On Fri, 9 Oct 2015, Esteban Lorenzano wrote:

Hi,

Can anyone explain me how this is correct?

Collection>>#hash
"Answer an integer hash value for the receiver such that,
 -- the hash value of an unchanged object is constant over time, and
 -- two equal objects have equal hash values"

| hash |

hash := self species hash.
self size <= 10 ifTrue:
[self do: [:elem | hash := hash bitXor: elem hash]].
^hash bitXor: self size hash


I mean… a hash that changes when I add elements to the collection… how can this work?

Esteban



Reply | Threaded
Open this post in threaded view
|

Re: hash and collections

Richard Uttner
In reply to this post by EstebanLM
Esteban,
it is defined that #hash and #= always have to be implemented accordingly. What you are expecting seems to me be more appropriate for object identity considerations as already mentioned correctly in this thread.
One of the clearest explanations concerning this topic I know is:
http://www.javaranch.com/journal/2002/10/equalhash.html

In fact, it would not lead to a wrong behaviour if #hash had been implemented as you expected, but to unnecessary compares when searching within a hashed collection (meaning: the implementation would sometimes consider two collections wrongly to be equal due to their identical hash code, but when comparing them with #=, recognize the difference and check the next one with the same hash code).
More important is the opposite relation: Two objects considered as equal when compared with #= always MUST produce the same hash code, otherwise the system would behave badly in an unpredictable way.
Richard

> On 09 Oct 2015, at 16:21, Gabriel Cotelli <[hidden email]> wrote:
>
> If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).

it is not :)

>
> I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.

no idea? I?m not using it, just arrive to that method random and I cannot understand it.
I still believe is wrong.

Esteban

>
> On Fri, Oct 9, 2015 at 11:07 AM, Esteban Lorenzano <[hidden email] <mailto:[hidden email]>> wrote:
> but that?s an error, I think.
> a collection has to have always same hash? no matter its size (because is not an array) and no matter his elements.
>
>
> > On 09 Oct 2015, at 15:21, Levente Uzonyi <[hidden email] <mailto:[hidden email]>> wrote:
> >
> > You simply don't modify _objects_ which are used as keys in hashed collections. Or if you do so, you'll have to deal with the consequences yourself.
> >
> > Levente
> >
> > On Fri, 9 Oct 2015, Esteban Lorenzano wrote:
> >
> >> Hi,
> >>
> >> Can anyone explain me how this is correct?
> >>
> >> Collection>>#hash
> >>      "Answer an integer hash value for the receiver such that,
> >>        -- the hash value of an unchanged object is constant over time, and
> >>        -- two equal objects have equal hash values"
> >>
> >>      | hash |
> >>
> >>      hash := self species hash.
> >>      self size <= 10 ifTrue:
> >>              [self do: [:elem | hash := hash bitXor: elem hash]].
> >>      ^hash bitXor: self size hash
> >>
> >>
> >> I mean? a hash that changes when I add elements to the collection? how can this work?
> >>
> >> Esteban
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20151009/a843583b/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Pharo-dev mailing list
[hidden email]
http://lists.pharo.org/mailman/listinfo/pharo-dev_lists.pharo.org


------------------------------

End of Pharo-dev Digest, Vol 30, Issue 48
*****************************************

Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

Eliot Miranda-2
In reply to this post by Henrik Sperre Johansen


On Oct 9, 2015, at 7:51 AM, Henrik Johansen <[hidden email]> wrote:


On 09 Oct 2015, at 4:33 , Esteban Lorenzano <[hidden email]> wrote:


On 09 Oct 2015, at 16:21, Gabriel Cotelli <[hidden email]> wrote:

If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).

it is not :)


I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.

no idea… I’m not using it, just arrive to that method random and I cannot understand it. 
I still believe is wrong. 
It's not.

Either Dictionary >> #= or IdentityDictionary >> #species in Pharo *is* broken though.

dic1 := Dictionary newFrom:{1 -> 4}.
dic2 := IdentityDictionary newFrom: dic1 associations .

dic1 = dic2     true
dic1 hash = dic2 hash false

Cheers,
Henry

+1.  Henry is right.  Esteban, given
    #(1 2 3) = #(1 2 3)
and
    #(1 2 3) ~= #(1 2 3 4)
we require that
    #(1 2 3) hash = #(1 2 3) hash
and would like that
#(1 2 3) hash ~= #(1 2 3 4) hash

Now it's too much for me to type in my phone but exactly the same should be true of dictionaries.  And given that a Dictionary with eg key/value pairs 1->a 2->b 3->c can have 4->d added to it it implies that if the hash function is good it will change when 4->d is added.  And as others have said if you want a hash that doesn't change there is the identityHash and identity collections for that very purpose.
Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

monty-3
In reply to this post by Henrik Sperre Johansen
Never use #newFrom: with associations from another dictonary if you reuse the first after.
 
The bug is actually deeper than you know:
dic1 := Dictionary newFrom:{'test' -> 4}.
dic2 := IdentityDictionary newFrom: {'test' copy -> 4}.
dic1 = dic2.
dic2 = dic1.
 
Depending on the order, they are equal or inequal.
 
The bug is caused by Dictionary>>= using a polymorphic type test #isDictionary. Squeak is unaffected: its #= uses a hardcoded class == test. The least disruptive fix I can think of is adding another type test #isIdentityDictionary in Dictionary that returns false but returns true for every identity dictionary class (like SmalllIdentityDictionary). Then only compare the associations if "self isIdentityDictionary = aDictionary isIdentityDictionary".
 
I will open an issue.
 
Sent: Friday, October 09, 2015 at 10:51 AM
From: "Henrik Johansen" <[hidden email]>
To: "Pharo Development List" <[hidden email]>
Subject: Re: [Pharo-dev] hash and Collections
 
On 09 Oct 2015, at 4:33 , Esteban Lorenzano <[hidden email]> wrote:
 
 
On 09 Oct 2015, at 16:21, Gabriel Cotelli <[hidden email]> wrote:
 
If the collection implements = using the objects it holds then you need to consider at least some of them in the hash calculation. I can't conceive a hash calculation for this case independent of the contents (well,  just hardcode a number but this will lead to always collide if used as a key in a hashed collection).
 
it is not :)
 
 
I'm with Levente here, I think the hash implementation is reasonable. And I wouldn't use a mutable object as key in a hashed collection.
 
no idea… I’m not using it, just arrive to that method random and I cannot understand it. 
I still believe is wrong. 
It's not.
 
Either Dictionary >> #= or IdentityDictionary >> #species in Pharo *is* broken though.
 
dic1 := Dictionary newFrom:{1 -> 4}.
dic2 := IdentityDictionary newFrom: dic1 associations .
 
dic1 = dic2     true
dic1 hash = dic2 hash false
 
Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

monty-3
Resending because the last was HTML'd:

Never use #newFrom: with associations from another dictonary if you reuse the first after.
 
The bug is actually deeper than you know:
dic1 := Dictionary newFrom:{'test' -> 4}.
dic2 := IdentityDictionary newFrom: {'test' copy -> 4}.
dic1 = dic2.
dic2 = dic1.
 
Depending on the order, they are equal or inequal.
 
The bug is caused by Dictionary>>= using a polymorphic type test #isDictionary. Squeak is unaffected: its #= uses a hardcoded class == test. The least disruptive fix I can think of is adding another type test #isIdentityDictionary in Dictionary that returns false but returns true for every identity dictionary class (like SmalllIdentityDictionary). Then only compare the associations if "self isIdentityDictionary = aDictionary isIdentityDictionary".
 
I will open an issue.

Reply | Threaded
Open this post in threaded view
|

Re: hash and Collections

monty-3
The issue: https://pharo.fogbugz.com/f/cases/16760/Dictionary-breaks-when-comparing-identity-and-non-identity-dictionaries

> Sent: Sunday, October 11, 2015 at 7:23 AM
> From: monty <[hidden email]>
> To: [hidden email]
> Subject: Re: [Pharo-dev] hash and Collections
>
> Resending because the last was HTML'd:
>
> Never use #newFrom: with associations from another dictonary if you reuse the first after.
>  
> The bug is actually deeper than you know:
> dic1 := Dictionary newFrom:{'test' -> 4}.
> dic2 := IdentityDictionary newFrom: {'test' copy -> 4}.
> dic1 = dic2.
> dic2 = dic1.
>  
> Depending on the order, they are equal or inequal.
>  
> The bug is caused by Dictionary>>= using a polymorphic type test #isDictionary. Squeak is unaffected: its #= uses a hardcoded class == test. The least disruptive fix I can think of is adding another type test #isIdentityDictionary in Dictionary that returns false but returns true for every identity dictionary class (like SmalllIdentityDictionary). Then only compare the associations if "self isIdentityDictionary = aDictionary isIdentityDictionary".
>  
> I will open an issue.
>
>