Here is what I extracted from older emails
From David Allouche in Pharo-dev to see if The ==hash== method is used in Pharo to separate objects in bucket-based data structures. !! Property It must have one required property which is required data integrity: !!!! A. Objects that are equal according to ==\=== must have equal hashes. It should have two desirable properties that are required to support the algorithmic assumptions of bucket-based data structures: !!!! B. Objects that are not equal should have different hashes. Ideally, the probability of hash collision should by 1/N where N is the size of hash value space. !!!! C. The hash values should spread as much as possible over their value space. For example: ProtoObject>>#identityHash. How these two desirable properties can be implemented depends on the actual distribution of the values used to compute the hash. An object can safely use XOR hashing if its instance variables are all different objects that already provide a hash method with these two properties. @@todo check following sentence because the use of singleton is obscure For example, an object whose behaviour is defined by several of singleton delegates of different classes that do not override Object>>#hash. But if the instance variables are not guaranteed to have those properties, it is necessary to more thoroughly "mix the bits". For example the method ==hash== of ==SequenceableCollection== illustrates this point. [[[ SequenceableCollection >> hash | hash | hash := self species hash. 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply]. ^ hash ]]] The mixing of the bits is done by the combination of addition and ==hashMultiply==. The value of ==species hash== does not need to be processed by ==hashMultiply==, because it is probably computed by the method =identityHash== of ==ProtoObject==, and that already provides property C. !!! When we should not use XOR ==LayoutFrame== is a particularly good example of a data structure that should not use XOR hashing. Its instance variables provide none of the required properties: - ==SmallInteger>>hash== is just ==^self==. - In addition, common ==LayoutFrame== instances tend to use pairs of identical values in their instance variables. Like ==0@0 corner: 1@1==, or ==Margin fromNumber: 10==. A good hash function for LayoutFrame needs to: - Get hashes for each instance variable that provides properties A and B. - Combine them in a way that is order-sensitive (to maintain property B) and that does some extra mixing (to provide property C). Now, we can assume that common values for the instance variables will be instances of ==SmallInteger==, ==Float==, or ==Fraction==. You can see in ==Fraction>>hash==, that a comment mentions the assumption that the fraction is already reduced, that is what makes it acceptable to use bitXor. The hash of ==SmallInteger== does provide properties A and B, but not property C. The hash of ==Float== is harder to understand, but tests show that it provides distinct values for 0.5, 0.25 and 0.125. So it hopefully provides B for the range of values of interest to ==LayoutFrame==. Another class that has similar hashing constraints to ==LayoutFrame== is ==Point==. Its ==hash== method is: [[[ Point >> hash "Hash is reimplemented because = is implemented." ^(x hash hashMultiply + y hash) hashMultiply ]]] It does not include species in the computation, which is less than ideal because it favours collisions with objects of different species that have a similar content and a the same hash algorithm. Finally, it is probably not necessary or useful to use the accessors to get to the instance variables. So a good implementation for LayoutFrame is [[[ LayoutFrame >> hash | hash | hash := self species hash hash := (hash + leftFraction hash) hashMultiply hash := (hash + leftOffset hash) hashMultiply hash := (hash + topFraction hash) hashMultiply hash := (hash + topOffset hash) hashMultiply hash := (hash + rightFraction hash) hashMultiply hash := (hash + rightOffset hash) hashMultiply hash := (hash + bottomFraction hash) hashMultiply hash := (hash + bottomOffset hash) hashMultiply ^ hash ]]] On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse <[hidden email]> wrote: > Noury also proposed a trait to avoid to systematically have to implement hash. > Vitor why don't you take it as a small project to propose a solution for Pharo. > > On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]> wrote: >> Yes, canEqual implementation also make #= be commutative. >> >> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]> >> wrote: >>> >>> >>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>: >>>> >>>> >>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>: >>>>> >>>>> I am sorry, not species, but #isKindOf istead of #= to compare classes. >>>> >>>> >>>> It is bad idea. #= should be transitive. >>> >>> >>> Oh, I used wrong word, shame on me :). I tried to say commutative. >>> >>>> >>>> How you will generate it with isKindOf: logic? You need to know common >>>> parent. >>>> >>>> Also I not remember cases where I was needed two instances of different >>>> classes to be equal. >>>> And I can imaging the problems which it will lead to. >>>> >>>>> >>>>> >>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov <[hidden email]> >>>>> wrote: >>>>>> >>>>>> >>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>: >>>>>>> >>>>>>> >>>>>>> Two questions/comments about the generated code: >>>>>>> 1. #= >>>>>>> ... >>>>>>> self class = anObject class "should compare #species instead?" >>>>>>> ifFalse: [ ^ false ]. >>>>>>> ... >>>>>>> Typically, I've seen #species instead of #class in the guard >>>>>>> statement. >>>>>>> Should we change it to that? >>>>>> >>>>>> >>>>>> I doubt that it is important for domain classes. Because I never saw >>>>>> the user of #species which is not a kind of Collection. And for collections >>>>>> this refactoring is not valid anyway. >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2. #hash >>>>>>> ^ var1 hash bitXor: (var2 hash bitXor: var3 hash) >>>>>>> Is this implementation always safe? It's what I usually hand roll >>>>>>> based on >>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on >>>>>>> hashing, so >>>>>>> I've always wondered if I was missing something… >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- >>>>>>> Cheers, >>>>>>> Sean >>>>>>> -- >>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html >>>>>>> >>>>>> >>>>> >>>> >>> >> |
Stephane "Vitor why don't you take it as a small project to propose a solution for Pharo." I will try :) I like the idea of using a Trait. Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]> escreveu: Here is what I extracted from older emails |
Contact noury because I do not remember.
But having a little builder is also nice. On Fri, Oct 6, 2017 at 4:15 PM, Vitor Medina Cruz <[hidden email]> wrote: > Stephane > > "Vitor why don't you take it as a small project to propose a solution for > Pharo." > > I will try :) I like the idea of using a Trait. > > > Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]> > escreveu: > > Here is what I extracted from older emails > > From David Allouche in Pharo-dev to see if > The ==hash== method is used in Pharo to separate objects in > bucket-based data structures. > > > !! Property > > It must have one required property which is required data integrity: > > !!!! A. > Objects that are equal according to ==\=== must have equal hashes. > > > It should have two desirable properties that are required to support > the algorithmic assumptions of bucket-based data structures: > > !!!! B. > Objects that are not equal should have different hashes. Ideally, the > probability of hash collision should by 1/N where N is the size of > hash value space. > > !!!! C. The hash values should spread as much as possible over their > value space. For example: ProtoObject>>#identityHash. > > > > > How these two desirable properties can be implemented depends on the > actual distribution of the values used to compute the hash. > > An object can safely use XOR hashing if its instance variables are all > different objects that already provide a hash method with these two > properties. > > @@todo check following sentence because the use of singleton is obscure > > For example, an object whose behaviour is defined by several of > singleton delegates of different classes that do not override > Object>>#hash. > > But if the instance variables are not guaranteed to have those > properties, it is necessary to more thoroughly "mix the bits". For > example the method ==hash== of ==SequenceableCollection== illustrates > this point. > > > [[[ > SequenceableCollection >> hash > | hash | > hash := self species hash. > 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply]. > ^ hash > ]]] > > > The mixing of the bits is done by the combination of addition and > ==hashMultiply==. The value of ==species hash== does not need to be > processed by ==hashMultiply==, because it is probably computed by the > method =identityHash== of ==ProtoObject==, and that already provides > property C. > > !!! When we should not use XOR > > ==LayoutFrame== is a particularly good example of a data structure > that should not use XOR hashing. Its instance variables provide none > of the required properties: > > - ==SmallInteger>>hash== is just ==^self==. > - In addition, common ==LayoutFrame== instances tend to use pairs of > identical values in their instance variables. Like ==0@0 corner: > 1@1==, or ==Margin fromNumber: 10==. > > A good hash function for LayoutFrame needs to: > > - Get hashes for each instance variable that provides properties A and B. > - Combine them in a way that is order-sensitive (to maintain property > B) and that does some extra mixing (to provide property C). > > Now, we can assume that common values for the instance variables will > be instances of ==SmallInteger==, ==Float==, or ==Fraction==. > > You can see in ==Fraction>>hash==, that a comment mentions the > assumption that the fraction is already reduced, that is what makes it > acceptable to use bitXor. > > The hash of ==SmallInteger== does provide properties A and B, but not > property C. The hash of ==Float== is harder to understand, but tests > show that it provides distinct values for 0.5, 0.25 and 0.125. So it > hopefully provides B for the range of values of interest to > ==LayoutFrame==. > > Another class that has similar hashing constraints to ==LayoutFrame== > is ==Point==. Its ==hash== method is: > > [[[ > Point >> hash > "Hash is reimplemented because = is implemented." > ^(x hash hashMultiply + y hash) hashMultiply > ]]] > > It does not include species in the computation, which is less than > ideal because it favours collisions with objects of different species > that have a similar content and a the same hash algorithm. > > Finally, it is probably not necessary or useful to use the accessors > to get to the instance variables. > > So a good implementation for LayoutFrame is > > [[[ > LayoutFrame >> hash > | hash | > hash := self species hash > hash := (hash + leftFraction hash) hashMultiply > hash := (hash + leftOffset hash) hashMultiply > hash := (hash + topFraction hash) hashMultiply > hash := (hash + topOffset hash) hashMultiply > hash := (hash + rightFraction hash) hashMultiply > hash := (hash + rightOffset hash) hashMultiply > hash := (hash + bottomFraction hash) hashMultiply > hash := (hash + bottomOffset hash) hashMultiply > ^ hash > ]]] > > On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse > <[hidden email]> wrote: >> Noury also proposed a trait to avoid to systematically have to implement >> hash. >> Vitor why don't you take it as a small project to propose a solution for >> Pharo. >> >> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]> >> wrote: >>> Yes, canEqual implementation also make #= be commutative. >>> >>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]> >>> wrote: >>>> >>>> >>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>: >>>>> >>>>> >>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>: >>>>>> >>>>>> I am sorry, not species, but #isKindOf istead of #= to compare >>>>>> classes. >>>>> >>>>> >>>>> It is bad idea. #= should be transitive. >>>> >>>> >>>> Oh, I used wrong word, shame on me :). I tried to say commutative. >>>> >>>>> >>>>> How you will generate it with isKindOf: logic? You need to know common >>>>> parent. >>>>> >>>>> Also I not remember cases where I was needed two instances of different >>>>> classes to be equal. >>>>> And I can imaging the problems which it will lead to. >>>>> >>>>>> >>>>>> >>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov >>>>>> <[hidden email]> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>: >>>>>>>> >>>>>>>> >>>>>>>> Two questions/comments about the generated code: >>>>>>>> 1. #= >>>>>>>> ... >>>>>>>> self class = anObject class "should compare #species >>>>>>>> instead?" >>>>>>>> ifFalse: [ ^ false ]. >>>>>>>> ... >>>>>>>> Typically, I've seen #species instead of #class in the guard >>>>>>>> statement. >>>>>>>> Should we change it to that? >>>>>>> >>>>>>> >>>>>>> I doubt that it is important for domain classes. Because I never saw >>>>>>> the user of #species which is not a kind of Collection. And for >>>>>>> collections >>>>>>> this refactoring is not valid anyway. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2. #hash >>>>>>>> ^ var1 hash bitXor: (var2 hash bitXor: var3 hash) >>>>>>>> Is this implementation always safe? It's what I usually hand roll >>>>>>>> based on >>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on >>>>>>>> hashing, so >>>>>>>> I've always wondered if I was missing something… >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- >>>>>>>> Cheers, >>>>>>>> Sean >>>>>>>> -- >>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> > > |
Hi,
I have developed few years ago a trait-based solution to make it easy to deal with equality. By default, instances of the same class with equal IVs are equal + have the same hash code. You can customize the list of IVs to compare, which is handy in case of IVs that store cache values. All you have to do is to use the TEquality in your class. The code + tests are here http://smalltalkhub.com/#!/~CAR/ReusableBricks/packages/Equals This is part of the "ReusableBricks" repo that gather packages we're using it in different projects in my team. And it works fine. I'd love to have this integrated in Pharo. Note : I no longer receive mailing-lists emails, because I can't handle all the traffic. We can continue the conversation on Discord Noury > > > On 06 Oct 2017, at 18:31, Stephane Ducasse <[hidden email]> wrote: > > Contact noury because I do not remember. > But having a little builder is also nice. > > On Fri, Oct 6, 2017 at 4:15 PM, Vitor Medina Cruz <[hidden email]> wrote: >> Stephane >> >> "Vitor why don't you take it as a small project to propose a solution for >> Pharo." >> >> I will try :) I like the idea of using a Trait. >> >> >> Em 6 de out de 2017 05:01, "Stephane Ducasse" <[hidden email]> >> escreveu: >> >> Here is what I extracted from older emails >> >> From David Allouche in Pharo-dev to see if >> The ==hash== method is used in Pharo to separate objects in >> bucket-based data structures. >> >> >> !! Property >> >> It must have one required property which is required data integrity: >> >> !!!! A. >> Objects that are equal according to ==\=== must have equal hashes. >> >> >> It should have two desirable properties that are required to support >> the algorithmic assumptions of bucket-based data structures: >> >> !!!! B. >> Objects that are not equal should have different hashes. Ideally, the >> probability of hash collision should by 1/N where N is the size of >> hash value space. >> >> !!!! C. The hash values should spread as much as possible over their >> value space. For example: ProtoObject>>#identityHash. >> >> >> >> >> How these two desirable properties can be implemented depends on the >> actual distribution of the values used to compute the hash. >> >> An object can safely use XOR hashing if its instance variables are all >> different objects that already provide a hash method with these two >> properties. >> >> @@todo check following sentence because the use of singleton is obscure >> >> For example, an object whose behaviour is defined by several of >> singleton delegates of different classes that do not override >> Object>>#hash. >> >> But if the instance variables are not guaranteed to have those >> properties, it is necessary to more thoroughly "mix the bits". For >> example the method ==hash== of ==SequenceableCollection== illustrates >> this point. >> >> >> [[[ >> SequenceableCollection >> hash >> | hash | >> hash := self species hash. >> 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply]. >> ^ hash >> ]]] >> >> >> The mixing of the bits is done by the combination of addition and >> ==hashMultiply==. The value of ==species hash== does not need to be >> processed by ==hashMultiply==, because it is probably computed by the >> method =identityHash== of ==ProtoObject==, and that already provides >> property C. >> >> !!! When we should not use XOR >> >> ==LayoutFrame== is a particularly good example of a data structure >> that should not use XOR hashing. Its instance variables provide none >> of the required properties: >> >> - ==SmallInteger>>hash== is just ==^self==. >> - In addition, common ==LayoutFrame== instances tend to use pairs of >> identical values in their instance variables. Like ==0@0 corner: >> 1@1==, or ==Margin fromNumber: 10==. >> >> A good hash function for LayoutFrame needs to: >> >> - Get hashes for each instance variable that provides properties A and B. >> - Combine them in a way that is order-sensitive (to maintain property >> B) and that does some extra mixing (to provide property C). >> >> Now, we can assume that common values for the instance variables will >> be instances of ==SmallInteger==, ==Float==, or ==Fraction==. >> >> You can see in ==Fraction>>hash==, that a comment mentions the >> assumption that the fraction is already reduced, that is what makes it >> acceptable to use bitXor. >> >> The hash of ==SmallInteger== does provide properties A and B, but not >> property C. The hash of ==Float== is harder to understand, but tests >> show that it provides distinct values for 0.5, 0.25 and 0.125. So it >> hopefully provides B for the range of values of interest to >> ==LayoutFrame==. >> >> Another class that has similar hashing constraints to ==LayoutFrame== >> is ==Point==. Its ==hash== method is: >> >> [[[ >> Point >> hash >> "Hash is reimplemented because = is implemented." >> ^(x hash hashMultiply + y hash) hashMultiply >> ]]] >> >> It does not include species in the computation, which is less than >> ideal because it favours collisions with objects of different species >> that have a similar content and a the same hash algorithm. >> >> Finally, it is probably not necessary or useful to use the accessors >> to get to the instance variables. >> >> So a good implementation for LayoutFrame is >> >> [[[ >> LayoutFrame >> hash >> | hash | >> hash := self species hash >> hash := (hash + leftFraction hash) hashMultiply >> hash := (hash + leftOffset hash) hashMultiply >> hash := (hash + topFraction hash) hashMultiply >> hash := (hash + topOffset hash) hashMultiply >> hash := (hash + rightFraction hash) hashMultiply >> hash := (hash + rightOffset hash) hashMultiply >> hash := (hash + bottomFraction hash) hashMultiply >> hash := (hash + bottomOffset hash) hashMultiply >> ^ hash >> ]]] >> >> On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse >> <[hidden email]> wrote: >>> Noury also proposed a trait to avoid to systematically have to implement >>> hash. >>> Vitor why don't you take it as a small project to propose a solution for >>> Pharo. >>> >>> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <[hidden email]> >>> wrote: >>>> Yes, canEqual implementation also make #= be commutative. >>>> >>>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <[hidden email]> >>>> wrote: >>>>> >>>>> >>>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <[hidden email]>: >>>>>> >>>>>> >>>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <[hidden email]>: >>>>>>> >>>>>>> I am sorry, not species, but #isKindOf istead of #= to compare >>>>>>> classes. >>>>>> >>>>>> >>>>>> It is bad idea. #= should be transitive. >>>>> >>>>> >>>>> Oh, I used wrong word, shame on me :). I tried to say commutative. >>>>> >>>>>> >>>>>> How you will generate it with isKindOf: logic? You need to know common >>>>>> parent. >>>>>> >>>>>> Also I not remember cases where I was needed two instances of different >>>>>> classes to be equal. >>>>>> And I can imaging the problems which it will lead to. >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov >>>>>>> <[hidden email]> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <[hidden email]>: >>>>>>>>> >>>>>>>>> >>>>>>>>> Two questions/comments about the generated code: >>>>>>>>> 1. #= >>>>>>>>> ... >>>>>>>>> self class = anObject class "should compare #species >>>>>>>>> instead?" >>>>>>>>> ifFalse: [ ^ false ]. >>>>>>>>> ... >>>>>>>>> Typically, I've seen #species instead of #class in the guard >>>>>>>>> statement. >>>>>>>>> Should we change it to that? >>>>>>>> >>>>>>>> >>>>>>>> I doubt that it is important for domain classes. Because I never saw >>>>>>>> the user of #species which is not a kind of Collection. And for >>>>>>>> collections >>>>>>>> this refactoring is not valid anyway. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2. #hash >>>>>>>>> ^ var1 hash bitXor: (var2 hash bitXor: var3 hash) >>>>>>>>> Is this implementation always safe? It's what I usually hand roll >>>>>>>>> based on >>>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on >>>>>>>>> hashing, so >>>>>>>>> I've always wondered if I was missing something… >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- >>>>>>>>> Cheers, >>>>>>>>> Sean >>>>>>>>> -- >>>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >> |
Administrator
|
Noury Bouraqadi-2 wrote
> The code + tests are here > http://smalltalkhub.com/#!/~CAR/ReusableBricks/packages/Equals I forked to https://github.com/seandenigris/Pharo-Equals with a baseline to make it easier to try out/possibly-integrate. ----- Cheers, Sean -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
Cheers,
Sean |
Free forum by Nabble | Edit this page |