Hi,
i guess i can subsume almost everything i know about hashes in one sentence: it is my understanding that two objects that are equal (obj1=obj2. -->true) have to have the same hash value (which is used for some collection types), whereas objects where obj1=obj2 returns false should have different hash values although it may happen that they have the same one. now here things don't turn out exactly like that: (1 to:4) = #(1 2 3 4). "true" (1 to:4)hash = #(1 2 3 4)hash. "false" well ok, actually these results make - pfffh, in a certain way - sense to me, but i wonder what arguments the people in the know would use to defend that result, if i would have another opinion? werner |
Just a quick guess seeing the implementation
hash for collections is about hash of its elements SequencableCollection>>hasEqualElements: otherCollection "Answer whether the receiver's size is the same as otherCollection's size, and each of the receiver's elements equal the corresponding element of otherCollection. This should probably replace the current definition of #= ." | size | (otherCollection isKindOf: SequenceableCollection) ifFalse: [^ false]. (size := self size) = otherCollection size ifFalse: [^ false]. 1 to: size do: [:index | (self at: index) = (otherCollection at: index) ifFalse: [^ false]]. ^ true Cheers, Cédrick > Le 30 juil. 2018 à 14:07, werner kassens <[hidden email]> a écrit : > > Hi, > i guess i can subsume almost everything i know about hashes in one sentence: > it is my understanding that two objects that are equal (obj1=obj2. > -->true) have to have the same hash value (which is used for some > collection types), whereas objects where obj1=obj2 returns false should > have different hash values although it may happen that they have the > same one. > > now here things don't turn out exactly like that: > (1 to:4) = #(1 2 3 4). "true" > (1 to:4)hash = #(1 2 3 4)hash. "false" > well ok, actually these results make - pfffh, in a certain way - sense > to me, but i wonder what arguments the people in the know would use to > defend that result, if i would have another opinion? > werner > > |
But it raises the question if :
(1 to:4)hash = #(1 2 3 4)hash should return true instead ?? Cedrick > Le 30 juil. 2018 à 14:51, Cédrick Béler <[hidden email]> a écrit : > > Just a quick guess seeing the implementation > > hash for collections is about hash of its elements > > SequencableCollection>>hasEqualElements: otherCollection > "Answer whether the receiver's size is the same as otherCollection's > size, and each of the receiver's elements equal the corresponding > element of otherCollection. > This should probably replace the current definition of #= ." > > | size | > (otherCollection isKindOf: SequenceableCollection) ifFalse: [^ false]. > (size := self size) = otherCollection size ifFalse: [^ false]. > 1 to: size do: > [:index | > (self at: index) = (otherCollection at: index) ifFalse: [^ false]]. > ^ true > > Cheers, > > Cédrick > >> Le 30 juil. 2018 à 14:07, werner kassens <[hidden email]> a écrit : >> >> Hi, >> i guess i can subsume almost everything i know about hashes in one sentence: >> it is my understanding that two objects that are equal (obj1=obj2. >> -->true) have to have the same hash value (which is used for some >> collection types), whereas objects where obj1=obj2 returns false should >> have different hash values although it may happen that they have the >> same one. >> >> now here things don't turn out exactly like that: >> (1 to:4) = #(1 2 3 4). "true" >> (1 to:4)hash = #(1 2 3 4)hash. "false" >> well ok, actually these results make - pfffh, in a certain way - sense >> to me, but i wonder what arguments the people in the know would use to >> defend that result, if i would have another opinion? >> werner >> >> > |
In reply to this post by wernerk
Werner,
I would say that you are right, this is a problem. A (not un-common) source of subtle bugs in Smalltalk is missing this rule that equivalent objects must have the same hash. In GemStone the objects are not equivalent (I’m not arguing that this is right, just that it avoids the problem you identify). I wonder what would happen if the hash comparison were added to the equivalence operator (#’=‘)! James > On Jul 30, 2018, at 5:07 AM, werner kassens <[hidden email]> wrote: > > Hi, > i guess i can subsume almost everything i know about hashes in one sentence: > it is my understanding that two objects that are equal (obj1=obj2. > -->true) have to have the same hash value (which is used for some > collection types), whereas objects where obj1=obj2 returns false should > have different hash values although it may happen that they have the > same one. > > now here things don't turn out exactly like that: > (1 to:4) = #(1 2 3 4). "true" > (1 to:4)hash = #(1 2 3 4)hash. "false" > well ok, actually these results make - pfffh, in a certain way - sense > to me, but i wonder what arguments the people in the know would use to > defend that result, if i would have another opinion? > werner > > > |
jgfoster wrote
> Werner, > > I would say that you are right, this is a problem. A (not un-common) > source of subtle bugs in Smalltalk is missing this rule that equivalent > objects must have the same hash. In GemStone the objects are not > equivalent (I’m not arguing that this is right, just that it avoids the > problem you identify). > > I wonder what would happen if the hash comparison were added to the > equivalence operator (#’=‘)! > > James > >> On Jul 30, 2018, at 5:07 AM, werner kassens < > wkassens@ > > wrote: >> >> Hi, >> i guess i can subsume almost everything i know about hashes in one >> sentence: >> it is my understanding that two objects that are equal (obj1=obj2. >> -->true) have to have the same hash value (which is used for some >> collection types), whereas objects where obj1=obj2 returns false should >> have different hash values although it may happen that they have the >> same one. >> >> now here things don't turn out exactly like that: >> (1 to:4) = #(1 2 3 4). "true" >> (1 to:4)hash = #(1 2 3 4)hash. "false" >> well ok, actually these results make - pfffh, in a certain way - sense >> to me, but i wonder what arguments the people in the know would use to >> defend that result, if i would have another opinion? >> werner >> >> >> +1, this is a bug. Either Interval >> #hash needs to change*, or the equivalency be broken**. Fun fact: #(1 2 3 4) = #[1 2 3 4] false You'd think they might be more similar than an Interval and an Array, or a LinkedList containing ValueLinks and an Interval, but no ;) Cheers, Henry *Interval >> hash "Hash is reimplemented because = is implemented. Since we are equivalent to other collections of our species, we must also hash equivalently" | hash | hash := self species hash. start to: stop by: step do: [:element | hash := (hash + element hash) hashMultiply]. ^hash **both ways, which is challenging, if one neither wants to: - change species of Interval (which has bad consequences) - add hash comparison to sequenceablecollection = (which slows it down even further) -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html |
Hi,
thank you all for your answers. Cédrick Béler wrote: > But it raises the question if : > (1 to:4)hash = #(1 2 3 4)hash should return true instead ?? principally yes, but it would slow down the Interval>>hash implementation considerably. id guess one would first make SequentialCollection>>hash a bit faster & shorter or one could very fast get a memory overflow. i mean the trick with Intervals is not so much (1to:4), but (-100000000000 to: 100000000000) which doesnt use any space and time. jgfoster wrote: > In GemStone the objects are not > equivalent (I’m not arguing that this is right, just that it avoids the > problem you identify). i see, yes, it can happen that hash has its own idea what equivalence _really means. > I wonder what would happen if the hash comparison were added to the > equivalence operator (#’=‘)! you probably mean something like this: #= ^(doWhatYouveDoneBefore)and:[self hash = object hash] of course that would eliminate the problem but it would need to be implemented everywhere. if not, #= could become not symmetric which produces the same problems (actually i stumbled upon this question because i implemented #= for a subobject of Array and gave it the species Array, problematic because with my implementation it opened the possibility of myObject~=anArray but anArray=myObject). actually #= in pharo is symmetric (and transitive, which also is necessary for hash to work) because it is more or less carefully designed , the symmetry is not somehow automatically incorporated in the language, and as i noticed its not too difficult to break that symmetry with ones own objects <stupid grin> On 07/30/2018 05:27 PM, Henrik Sperre Johansen wrote: > +1, this is a bug. > Either Interval >> #hash needs to change*, or the equivalency be broken**. > > Fun fact: > #(1 2 3 4) = #[1 2 3 4] false > You'd think they might be more similar than an Interval and an Array, or a > LinkedList containing ValueLinks and an Interval, but no ;) <g> i guess it is difficult to juggle & balance everything between all the different kinds of collections. thanks again folks werner |
In reply to this post by Henrik Sperre Johansen
The interval
1 to: (10 raisedTo: 100) can be created just fine, yet hashing its elements won't compute. A generous interpretation of the intent of #=, where any wisp of equivalence is promoted to full fledged equality, is problematic in the long run. Here's another one: 17/20 = 0.85, therefore (17/20) hash = 0.85 hash Never mind there's no floating point value that is equal to 17/20 in the first place. It just snowballs from there. For instance, should collections like these be equal? (0.0 to: 1.0 by: 0.3) = #(0.0 0.3 0.6 0.9) Sometimes it's just better if different things stay different. Andres. On 7/30/18 8:27 , Henrik Sperre Johansen wrote: > jgfoster wrote >> Werner, >> >> I would say that you are right, this is a problem. A (not un-common) >> source of subtle bugs in Smalltalk is missing this rule that equivalent >> objects must have the same hash. In GemStone the objects are not >> equivalent (I’m not arguing that this is right, just that it avoids the >> problem you identify). >> >> I wonder what would happen if the hash comparison were added to the >> equivalence operator (#’=‘)! >> >> James >> >>> On Jul 30, 2018, at 5:07 AM, werner kassens < > >> wkassens@ > >> > wrote: >>> >>> Hi, >>> i guess i can subsume almost everything i know about hashes in one >>> sentence: >>> it is my understanding that two objects that are equal (obj1=obj2. >>> -->true) have to have the same hash value (which is used for some >>> collection types), whereas objects where obj1=obj2 returns false should >>> have different hash values although it may happen that they have the >>> same one. >>> >>> now here things don't turn out exactly like that: >>> (1 to:4) = #(1 2 3 4). "true" >>> (1 to:4)hash = #(1 2 3 4)hash. "false" >>> well ok, actually these results make - pfffh, in a certain way - sense >>> to me, but i wonder what arguments the people in the know would use to >>> defend that result, if i would have another opinion? >>> werner >>> >>> >>> > > +1, this is a bug. > Either Interval >> #hash needs to change*, or the equivalency be broken**. > > Fun fact: > #(1 2 3 4) = #[1 2 3 4] false > You'd think they might be more similar than an Interval and an Array, or a > LinkedList containing ValueLinks and an Interval, but no ;) > > Cheers, > Henry > > *Interval >> hash > "Hash is reimplemented because = is implemented. > Since we are equivalent to other collections of our species, we must also > hash equivalently" > | hash | > > hash := self species hash. > start to: stop by: step do: [:element | hash := (hash + element hash) > hashMultiply]. > ^hash > > **both ways, which is challenging, if one neither wants to: > - change species of Interval (which has bad consequences) > - add hash comparison to sequenceablecollection = (which slows it down even > further) > > > > > -- > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html > > . > |
Hi Andres,
<g> that is the kind of argument i was looking for, as i thought i would have a similar situation as in my small example and wondered whether i could keep my slightly incongruent definition of #= and #hash, but in a way your examples show that my problem is different, iow i should change my implementation & i have changed it. werner On 07/31/2018 12:34 AM, Andres Valloud wrote: > The interval > > 1 to: (10 raisedTo: 100) > > can be created just fine, yet hashing its elements won't compute. > > A generous interpretation of the intent of #=, where any wisp of > equivalence is promoted to full fledged equality, is problematic in > the long run. Here's another one: > > 17/20 = 0.85, therefore (17/20) hash = 0.85 hash > > Never mind there's no floating point value that is equal to 17/20 in > the first place. It just snowballs from there. For instance, should > collections like these be equal? > > (0.0 to: 1.0 by: 0.3) = #(0.0 0.3 0.6 0.9) > > Sometimes it's just better if different things stay different. > > Andres. |
In reply to this post by wernerk
I do not think that (1 to: 4) and #(1 2 3 4) should be equal. Let me put it a little more strongly: it's a bug. Taking a := 1 to: 4. b := Array withAll: a. c := OrderedCollection withAll: b. in the two other Smalltalk systems I just tried, no two of these are equal. This is what the ANSI Smalltalk standard requires. Ceteris paribus, two sequences are equivalent if and only if 1. they are instance of the same class 2. they have the same size 3. corresponding elements are equivalent. It is fairly common for Smalltalk systems to distinguish between "these sequences are equivalent" and "these sequences have the same elements in the same order", with no consensus on the name of the second method. One calls it #sameContentsAs:, Squeak #hasEqualElements:. On 31 July 2018 at 00:07, werner kassens <[hidden email]> wrote: Hi, |
In reply to this post by Andres Valloud-4
+1 to what Andreas Valloud (Mr "how to hash in Smalltalk") said. On 31 July 2018 at 10:34, Andres Valloud <[hidden email]> wrote: The interval |
In reply to this post by Richard O'Keefe
This inconsistent naming convention has me wishing for an updated ANSI Standard. There has been much change, wasn’t that standard from the 90s?
/————————————————————/ For encrypted mail use [hidden email] Get a free account at ProtonMail.com Web: www.objectnets.net and www.objectnets.org
|
In reply to this post by Richard O'Keefe
To what extent is it required by ANSI that objects be of the same class? Does Pharo treat a String and a Symbol as equivalent if they have the same characters?
James
|
@James Foster: 100%. There are no exceptions. @John Pferisch: the GNU Smalltalk web site hosted an attempt to update the ANSI standard for a while. I was on the mailing list. I had to be away from it for a bit and when I came back wanting to upload my contribution it had all gone. There *is* a lot that is already shared or could be shared at low cost, but trawling through Smalltalks is pretty labour- intensive so such a project needs money. We are still seeing crazy date & time classes and even Complex classes purporting to inherit Magnitude. On 1 August 2018 at 02:05, James Foster <[hidden email]> wrote:
|
@Richard O’Keefe: I’m trying to find the place in the ANSI Standard you are citing. All I’ve found so far is at 5.3.1.1:
Can you provide a reference for your assertion that objects must be of the same class in order to be equivalent? How about other comparisons between objects of different classes? Is 1.9 < 2? Is 2.1 > 2? How about 2.0 = 2? Is #a < ‘b’? is #b > ‘a’? How about #a = ‘a’? It seems strange to suggest that an object can be less than or greater than, but not equal to! James
|
Can I give a reference? Well, I did: the ANSI Smalltalk standard. Specifically, section 5.7.8.2, the refinement for <sequenceReadableCollection> Numbers are governed by section 5.6.7.1, which requires numbers that are compared for equality to be converted to a common representation, which in this case requires (sigh) integers to be converted to floats. With that definition it is possible to find integers x z and a float y such that x = y and y = z are true but x = z is false, which is a very bad thing. My Smalltalk tries to be ANSI conformant, but some things are just too broken to live with, and that's one of them. (The rule should always be to convert in a way that loses no information.) It may well seem strange to you, but "equivalence" (#=) in Smalltalk is not coupled to "ordering" (the Magnitude methods) and never has been. For example, #= compares the characters of strings, which are equal iff they have the same codepoint. But #< and the other magnitude methods use an "implementation-defined" collation order, which could, for example, ignore alphabetic case. So it would be entirely consistent with the standard and historic practice to have 'A' <= 'a' and 'a' <= 'A' but 'A' ~= 'a'. This is not merely a theoretical possibility; I just tried it in GNU Smalltalk. There was a serious historic bug in many Smalltalks where 'a' = #a was true (using string comparison) but #a = 'a' was false (using identity). This may well be why the standard insists on the classes (not the species) being the same, to ensure that <string> = <symbol> is consistent with <symbol> = <string>. On 1 August 2018 at 16:47, James Foster <[hidden email]> wrote:
|
Thanks. That is helpful.
James
|
In reply to this post by Richard O'Keefe
Richard O'Keefe wrote
> @James Foster: 100%. There are no exceptions. > > > On 1 August 2018 at 02:05, James Foster < > Smalltalk@ > > wrote: > >> To what extent is it required by ANSI that objects be of the same class? >> Does Pharo treat a String and a Symbol as equivalent if they have the >> same >> characters? >> >> James >> Let's not throw the baby out with the bathwater... Changing SequenceableCollection to do class comparison is a change people can get behind without much effort; especially when there's a bug that has to be resolved somehow, and it's the better overall change anyways. Forcing Strings != Symbols is a harder sell (although personally, I'd love to see it). Forcing WideStrings != ByteStrings is never going to happen; but perhaps that can be said to be compliant due to the phrasing "Unless specifically refined", and neither being defined in the standard. Interval/Array (and String/Symbol) don't have the same excuse. Cheers, Henry -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html |
Free forum by Nabble | Edit this page |