a basic hash question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

a basic hash question

wernerk
Hi,
i guess i can subsume almost everything i know about hashes in one sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner


Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

cedreek
Just a quick guess seeing the implementation

hash for collections is about hash of its elements

SequencableCollection>>hasEqualElements: otherCollection
        "Answer whether the receiver's size is the same as otherCollection's
        size, and each of the receiver's elements equal the corresponding
        element of otherCollection.
        This should probably replace the current definition of #= ."

        | size |
        (otherCollection isKindOf: SequenceableCollection) ifFalse: [^ false].
        (size := self size) = otherCollection size ifFalse: [^ false].
        1 to: size do:
                [:index |
                (self at: index) = (otherCollection at: index) ifFalse: [^ false]].
        ^ true

Cheers,

Cédrick

> Le 30 juil. 2018 à 14:07, werner kassens <[hidden email]> a écrit :
>
> Hi,
> i guess i can subsume almost everything i know about hashes in one sentence:
> it is my understanding that two objects that are equal (obj1=obj2.
> -->true) have to have the same hash value (which is used for some
> collection types), whereas objects where obj1=obj2 returns false should
> have different hash values although it may happen that they have the
> same one.
>
> now here things don't turn out exactly like that:
> (1 to:4) = #(1 2 3 4). "true"
> (1 to:4)hash = #(1 2 3 4)hash. "false"
> well ok, actually these results make - pfffh, in a certain way - sense
> to me, but i wonder what arguments the people in the know would use to
> defend that result, if i would have another opinion?
> werner
>
>


Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

cedreek
But it raises the question if :
(1 to:4)hash = #(1 2 3 4)hash should return true instead ??

Cedrick


> Le 30 juil. 2018 à 14:51, Cédrick Béler <[hidden email]> a écrit :
>
> Just a quick guess seeing the implementation
>
> hash for collections is about hash of its elements
>
> SequencableCollection>>hasEqualElements: otherCollection
> "Answer whether the receiver's size is the same as otherCollection's
> size, and each of the receiver's elements equal the corresponding
> element of otherCollection.
> This should probably replace the current definition of #= ."
>
> | size |
> (otherCollection isKindOf: SequenceableCollection) ifFalse: [^ false].
> (size := self size) = otherCollection size ifFalse: [^ false].
> 1 to: size do:
> [:index |
> (self at: index) = (otherCollection at: index) ifFalse: [^ false]].
> ^ true
>
> Cheers,
>
> Cédrick
>
>> Le 30 juil. 2018 à 14:07, werner kassens <[hidden email]> a écrit :
>>
>> Hi,
>> i guess i can subsume almost everything i know about hashes in one sentence:
>> it is my understanding that two objects that are equal (obj1=obj2.
>> -->true) have to have the same hash value (which is used for some
>> collection types), whereas objects where obj1=obj2 returns false should
>> have different hash values although it may happen that they have the
>> same one.
>>
>> now here things don't turn out exactly like that:
>> (1 to:4) = #(1 2 3 4). "true"
>> (1 to:4)hash = #(1 2 3 4)hash. "false"
>> well ok, actually these results make - pfffh, in a certain way - sense
>> to me, but i wonder what arguments the people in the know would use to
>> defend that result, if i would have another opinion?
>> werner
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

jgfoster
In reply to this post by wernerk
Werner,

I would say that you are right, this is a problem. A (not un-common) source of subtle bugs in Smalltalk is missing this rule that equivalent objects must have the same hash. In GemStone the objects are not equivalent (I’m not arguing that this is right, just that it avoids the problem you identify).

I wonder what would happen if the hash comparison were added to the equivalence operator (#’=‘)!

James

> On Jul 30, 2018, at 5:07 AM, werner kassens <[hidden email]> wrote:
>
> Hi,
> i guess i can subsume almost everything i know about hashes in one sentence:
> it is my understanding that two objects that are equal (obj1=obj2.
> -->true) have to have the same hash value (which is used for some
> collection types), whereas objects where obj1=obj2 returns false should
> have different hash values although it may happen that they have the
> same one.
>
> now here things don't turn out exactly like that:
> (1 to:4) = #(1 2 3 4). "true"
> (1 to:4)hash = #(1 2 3 4)hash. "false"
> well ok, actually these results make - pfffh, in a certain way - sense
> to me, but i wonder what arguments the people in the know would use to
> defend that result, if i would have another opinion?
> werner
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Henrik Sperre Johansen
jgfoster wrote

> Werner,
>
> I would say that you are right, this is a problem. A (not un-common)
> source of subtle bugs in Smalltalk is missing this rule that equivalent
> objects must have the same hash. In GemStone the objects are not
> equivalent (I’m not arguing that this is right, just that it avoids the
> problem you identify).
>
> I wonder what would happen if the hash comparison were added to the
> equivalence operator (#’=‘)!
>
> James
>
>> On Jul 30, 2018, at 5:07 AM, werner kassens &lt;

> wkassens@

> &gt; wrote:
>>
>> Hi,
>> i guess i can subsume almost everything i know about hashes in one
>> sentence:
>> it is my understanding that two objects that are equal (obj1=obj2.
>> -->true) have to have the same hash value (which is used for some
>> collection types), whereas objects where obj1=obj2 returns false should
>> have different hash values although it may happen that they have the
>> same one.
>>
>> now here things don't turn out exactly like that:
>> (1 to:4) = #(1 2 3 4). "true"
>> (1 to:4)hash = #(1 2 3 4)hash. "false"
>> well ok, actually these results make - pfffh, in a certain way - sense
>> to me, but i wonder what arguments the people in the know would use to
>> defend that result, if i would have another opinion?
>> werner
>>
>>
>>

+1, this is a bug.
Either Interval >> #hash needs to change*, or the equivalency be broken**.

Fun fact:
#(1 2 3 4) = #[1 2 3 4]   false
You'd think they might be more similar than an Interval and an Array, or a
LinkedList containing ValueLinks and an Interval, but no ;)

Cheers,
Henry

*Interval >> hash
        "Hash is reimplemented because = is implemented.
        Since we are equivalent to other collections of our species, we must also
hash equivalently"
        | hash |

        hash := self species hash.
        start to: stop by: step do: [:element | hash := (hash + element hash)
hashMultiply].
        ^hash

**both ways, which is challenging, if one neither wants to:
- change species of Interval (which has bad consequences)
- add hash comparison to sequenceablecollection = (which slows it down even
further)




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

wernerk
Hi,
thank you all for your answers.

Cédrick Béler wrote:

> But it raises the question if :
> (1 to:4)hash = #(1 2 3 4)hash should return true instead ??
principally yes, but it would slow down the Interval>>hash
implementation considerably. id guess one would first make
SequentialCollection>>hash a bit faster & shorter or one could very fast
get a memory overflow. i mean the trick with Intervals is not so much
(1to:4), but (-100000000000 to: 100000000000) which doesnt use any space
and time.

jgfoster wrote:

> In GemStone the objects are not
> equivalent (I’m not arguing that this is right, just that it avoids the
> problem you identify).
i see, yes, it can happen that hash has its own idea what equivalence
_really means.

> I wonder what would happen if the hash comparison were added to the
> equivalence operator (#’=‘)!
you probably mean something like this:
#=
^(doWhatYouveDoneBefore)and:[self hash = object hash]
of course that would eliminate the problem but it would need to be
implemented everywhere. if not, #= could become not symmetric which
produces the same problems (actually i stumbled upon this question
because i implemented #= for a subobject of Array and gave it the
species Array, problematic because with my implementation it opened the
possibility of myObject~=anArray but anArray=myObject). actually #= in
pharo is symmetric (and transitive, which also is necessary for hash to
work) because it is more or less carefully designed , the symmetry is
not somehow automatically incorporated in the language, and as i noticed
its not too difficult to break that symmetry with ones own objects
<stupid grin>

On 07/30/2018 05:27 PM, Henrik Sperre Johansen wrote:
> +1, this is a bug.
> Either Interval >> #hash needs to change*, or the equivalency be broken**.
>
> Fun fact:
> #(1 2 3 4) = #[1 2 3 4]   false
> You'd think they might be more similar than an Interval and an Array, or a
> LinkedList containing ValueLinks and an Interval, but no ;)
<g> i guess it is difficult to juggle & balance everything between all
the different kinds of collections.

thanks again folks
werner

Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Andres Valloud-4
In reply to this post by Henrik Sperre Johansen
The interval

        1 to: (10 raisedTo: 100)

can be created just fine, yet hashing its elements won't compute.

A generous interpretation of the intent of #=, where any wisp of
equivalence is promoted to full fledged equality, is problematic in the
long run.  Here's another one:

        17/20 = 0.85, therefore (17/20) hash = 0.85 hash

Never mind there's no floating point value that is equal to 17/20 in the
first place.  It just snowballs from there.  For instance, should
collections like these be equal?

        (0.0 to: 1.0 by: 0.3) = #(0.0 0.3 0.6 0.9)

Sometimes it's just better if different things stay different.

Andres.

On 7/30/18 8:27 , Henrik Sperre Johansen wrote:

> jgfoster wrote
>> Werner,
>>
>> I would say that you are right, this is a problem. A (not un-common)
>> source of subtle bugs in Smalltalk is missing this rule that equivalent
>> objects must have the same hash. In GemStone the objects are not
>> equivalent (I’m not arguing that this is right, just that it avoids the
>> problem you identify).
>>
>> I wonder what would happen if the hash comparison were added to the
>> equivalence operator (#’=‘)!
>>
>> James
>>
>>> On Jul 30, 2018, at 5:07 AM, werner kassens &lt;
>
>> wkassens@
>
>> &gt; wrote:
>>>
>>> Hi,
>>> i guess i can subsume almost everything i know about hashes in one
>>> sentence:
>>> it is my understanding that two objects that are equal (obj1=obj2.
>>> -->true) have to have the same hash value (which is used for some
>>> collection types), whereas objects where obj1=obj2 returns false should
>>> have different hash values although it may happen that they have the
>>> same one.
>>>
>>> now here things don't turn out exactly like that:
>>> (1 to:4) = #(1 2 3 4). "true"
>>> (1 to:4)hash = #(1 2 3 4)hash. "false"
>>> well ok, actually these results make - pfffh, in a certain way - sense
>>> to me, but i wonder what arguments the people in the know would use to
>>> defend that result, if i would have another opinion?
>>> werner
>>>
>>>
>>>
>
> +1, this is a bug.
> Either Interval >> #hash needs to change*, or the equivalency be broken**.
>
> Fun fact:
> #(1 2 3 4) = #[1 2 3 4]   false
> You'd think they might be more similar than an Interval and an Array, or a
> LinkedList containing ValueLinks and an Interval, but no ;)
>
> Cheers,
> Henry
>
> *Interval >> hash
> "Hash is reimplemented because = is implemented.
> Since we are equivalent to other collections of our species, we must also
> hash equivalently"
> | hash |
>
> hash := self species hash.
> start to: stop by: step do: [:element | hash := (hash + element hash)
> hashMultiply].
> ^hash
>
> **both ways, which is challenging, if one neither wants to:
> - change species of Interval (which has bad consequences)
> - add hash comparison to sequenceablecollection = (which slows it down even
> further)
>
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>
> .
>

Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

wernerk
Hi Andres,
<g> that is the kind of argument i was looking for, as i thought i would
have a similar situation as in my small example and wondered whether i
could keep my slightly incongruent definition of #= and #hash, but in a
way your examples show that my problem is different, iow i should change
my implementation & i have changed it.
werner

On 07/31/2018 12:34 AM, Andres Valloud wrote:

> The interval
>
>     1 to: (10 raisedTo: 100)
>
> can be created just fine, yet hashing its elements won't compute.
>
> A generous interpretation of the intent of #=, where any wisp of
> equivalence is promoted to full fledged equality, is problematic in
> the long run.  Here's another one:
>
>     17/20 = 0.85, therefore (17/20) hash = 0.85 hash
>
> Never mind there's no floating point value that is equal to 17/20 in
> the first place.  It just snowballs from there.  For instance, should
> collections like these be equal?
>
>     (0.0 to: 1.0 by: 0.3) = #(0.0 0.3 0.6 0.9)
>
> Sometimes it's just better if different things stay different.
>
> Andres.



Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Richard O'Keefe
In reply to this post by wernerk
I do not think that (1 to: 4) and #(1 2 3 4) should be equal.
Let me put it a little more strongly:  it's a bug.
Taking
  a := 1 to: 4.
  b := Array withAll: a.
  c := OrderedCollection withAll: b.
in the two other Smalltalk systems I just tried,
no two of these are equal.  This is what the ANSI
Smalltalk standard requires.  Ceteris paribus,
two sequences are equivalent if and only if
1. they are instance of the same class
2. they have the same size
3. corresponding elements are equivalent.
It is fairly common for Smalltalk systems to distinguish
between "these sequences are equivalent" and "these
sequences have the same elements in the same order",
with no consensus on the name of the second method.
One calls it #sameContentsAs:, Squeak #hasEqualElements:.


On 31 July 2018 at 00:07, werner kassens <[hidden email]> wrote:
Hi,
i guess i can subsume almost everything i know about hashes in one sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner



Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Richard O'Keefe
In reply to this post by Andres Valloud-4
+1 to what Andreas Valloud (Mr "how to hash in Smalltalk") said.

On 31 July 2018 at 10:34, Andres Valloud <[hidden email]> wrote:
The interval

        1 to: (10 raisedTo: 100)

can be created just fine, yet hashing its elements won't compute.

A generous interpretation of the intent of #=, where any wisp of equivalence is promoted to full fledged equality, is problematic in the long run.  Here's another one:

        17/20 = 0.85, therefore (17/20) hash = 0.85 hash

Never mind there's no floating point value that is equal to 17/20 in the first place.  It just snowballs from there.  For instance, should collections like these be equal?

        (0.0 to: 1.0 by: 0.3) = #(0.0 0.3 0.6 0.9)

Sometimes it's just better if different things stay different.

Andres.


On 7/30/18 8:27 , Henrik Sperre Johansen wrote:
jgfoster wrote
Werner,

I would say that you are right, this is a problem. A (not un-common)
source of subtle bugs in Smalltalk is missing this rule that equivalent
objects must have the same hash. In GemStone the objects are not
equivalent (I’m not arguing that this is right, just that it avoids the
problem you identify).

I wonder what would happen if the hash comparison were added to the
equivalence operator (#’=‘)!

James

On Jul 30, 2018, at 5:07 AM, werner kassens &lt;

wkassens@

&gt; wrote:

Hi,
i guess i can subsume almost everything i know about hashes in one
sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner




+1, this is a bug.
Either Interval >> #hash needs to change*, or the equivalency be broken**.

Fun fact:
#(1 2 3 4) = #[1 2 3 4]   false
You'd think they might be more similar than an Interval and an Array, or a
LinkedList containing ValueLinks and an Interval, but no ;)

Cheers,
Henry

*Interval >> hash
        "Hash is reimplemented because = is implemented.
        Since we are equivalent to other collections of our species, we must also
hash equivalently"
        | hash |

        hash := self species hash.
        start to: stop by: step do: [:element | hash := (hash + element hash)
hashMultiply].
        ^hash

**both ways, which is challenging, if one neither wants to:
- change species of Interval (which has bad consequences)
- add hash comparison to sequenceablecollection = (which slows it down even
further)




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

.



Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

John Pfersich
In reply to this post by Richard O'Keefe
This inconsistent naming convention has me wishing for an updated ANSI Standard. There has been much change, wasn’t that standard from the 90s?

/————————————————————/
For encrypted mail use [hidden email]
Get a free account at ProtonMail.com

On Jul 30, 2018, at 22:59, Richard O'Keefe <[hidden email]> wrote:

I do not think that (1 to: 4) and #(1 2 3 4) should be equal.
Let me put it a little more strongly:  it's a bug.
Taking
  a := 1 to: 4.
  b := Array withAll: a.
  c := OrderedCollection withAll: b.
in the two other Smalltalk systems I just tried,
no two of these are equal.  This is what the ANSI
Smalltalk standard requires.  Ceteris paribus,
two sequences are equivalent if and only if
1. they are instance of the same class
2. they have the same size
3. corresponding elements are equivalent.
It is fairly common for Smalltalk systems to distinguish
between "these sequences are equivalent" and "these
sequences have the same elements in the same order",
with no consensus on the name of the second method.
One calls it #sameContentsAs:, Squeak #hasEqualElements:.


On 31 July 2018 at 00:07, werner kassens <[hidden email]> wrote:
Hi,
i guess i can subsume almost everything i know about hashes in one sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner



Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

jgfoster
In reply to this post by Richard O'Keefe
To what extent is it required by ANSI that objects be of the same class? Does Pharo treat a String and a Symbol as equivalent if they have the same characters?

James

On Jul 30, 2018, at 10:59 PM, Richard O'Keefe <[hidden email]> wrote:

I do not think that (1 to: 4) and #(1 2 3 4) should be equal.
Let me put it a little more strongly:  it's a bug.
Taking
  a := 1 to: 4.
  b := Array withAll: a.
  c := OrderedCollection withAll: b.
in the two other Smalltalk systems I just tried,
no two of these are equal.  This is what the ANSI
Smalltalk standard requires.  Ceteris paribus,
two sequences are equivalent if and only if
1. they are instance of the same class
2. they have the same size
3. corresponding elements are equivalent.
It is fairly common for Smalltalk systems to distinguish
between "these sequences are equivalent" and "these
sequences have the same elements in the same order",
with no consensus on the name of the second method.
One calls it #sameContentsAs:, Squeak #hasEqualElements:.


On 31 July 2018 at 00:07, werner kassens <[hidden email]> wrote:
Hi,
i guess i can subsume almost everything i know about hashes in one sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner




Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Richard O'Keefe
@James Foster:  100%.  There are no exceptions.

@John Pferisch:  the GNU Smalltalk web site hosted an attempt to update the
ANSI standard for a while.  I was on the mailing list.  I had to be away
from it for a bit and when I came back wanting to upload my contribution
it had all gone.  There *is* a lot that is already shared or could be
shared at low cost, but trawling through Smalltalks is pretty labour-
intensive so such a project needs money.  We are still seeing crazy date
& time classes and even Complex classes purporting to inherit Magnitude.


On 1 August 2018 at 02:05, James Foster <[hidden email]> wrote:
To what extent is it required by ANSI that objects be of the same class? Does Pharo treat a String and a Symbol as equivalent if they have the same characters?

James


On Jul 30, 2018, at 10:59 PM, Richard O'Keefe <[hidden email]> wrote:

I do not think that (1 to: 4) and #(1 2 3 4) should be equal.
Let me put it a little more strongly:  it's a bug.
Taking
  a := 1 to: 4.
  b := Array withAll: a.
  c := OrderedCollection withAll: b.
in the two other Smalltalk systems I just tried,
no two of these are equal.  This is what the ANSI
Smalltalk standard requires.  Ceteris paribus,
two sequences are equivalent if and only if
1. they are instance of the same class
2. they have the same size
3. corresponding elements are equivalent.
It is fairly common for Smalltalk systems to distinguish
between "these sequences are equivalent" and "these
sequences have the same elements in the same order",
with no consensus on the name of the second method.
One calls it #sameContentsAs:, Squeak #hasEqualElements:.


On 31 July 2018 at 00:07, werner kassens <[hidden email]> wrote:
Hi,
i guess i can subsume almost everything i know about hashes in one sentence:
it is my understanding that two objects that are equal (obj1=obj2.
-->true) have to have the same hash value (which is used for some
collection types), whereas objects where obj1=obj2 returns false should
have different hash values although it may happen that they have the
same one.

now here things don't turn out exactly like that:
(1 to:4) = #(1 2 3 4). "true"
(1 to:4)hash = #(1 2 3 4)hash. "false"
well ok, actually these results make - pfffh, in a certain way - sense
to me, but i wonder what arguments the people in the know would use to
defend that result, if i would have another opinion?
werner





Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

jgfoster
@Richard O’Keefe: I’m trying to find the place in the ANSI Standard you are citing. All I’ve found so far is at 5.3.1.1:

The meaning of "equivalent" cannot be precisely defined but the intent is that two objects are
considered equivalent if they can be used interchangeably. Conforming protocols may choose to
more precisely define the meaning of "equivalent”.

Can you provide a reference for your assertion that objects must be of the same class in order to be equivalent?

How about other comparisons between objects of different classes? Is 1.9 < 2? Is 2.1 > 2? How about 2.0 = 2? Is #a < ‘b’? is #b > ‘a’? How about #a = ‘a’? It seems strange to suggest that an object can be less than or greater than, but not equal to!

James

On Jul 31, 2018, at 9:24 PM, Richard O'Keefe <[hidden email]> wrote:
@James Foster:  100%.  There are no exceptions

On 1 August 2018 at 02:05, James Foster <[hidden email]> wrote:
To what extent is it required by ANSI that objects be of the same class? 
Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Richard O'Keefe
Can I give a reference?  Well, I did: the ANSI Smalltalk standard.
Specifically, section 5.7.8.2, the refinement for <sequenceReadableCollection>

Numbers are governed by section 5.6.7.1, which requires numbers that are
compared for equality to be converted to a common representation, which
in this case requires (sigh) integers to be converted to floats.  With
that definition it is possible to find integers x z and a float y such
that x = y and y = z are true but x = z is false, which is a very bad
thing.  My Smalltalk tries to be ANSI conformant, but some things are
just too broken to live with, and that's one of them.  (The rule should
always be to convert in a way that loses no information.)

It may well seem strange to you, but "equivalence" (#=) in Smalltalk
is not coupled to "ordering" (the Magnitude methods) and never has been.
For example, #= compares the characters of strings, which are equal iff
they have the same codepoint.  But #< and the other magnitude methods
use an "implementation-defined" collation order, which could, for example,
ignore alphabetic case.  So it would be entirely consistent with the
standard and historic practice to have 'A' <= 'a' and 'a' <= 'A' but
'A' ~= 'a'.  This is not merely a theoretical possibility; I just tried
it in GNU Smalltalk.

There was a serious historic bug in many Smalltalks where
'a' = #a was true (using string comparison) but
#a = 'a' was false (using identity).  This may well be
why the standard insists on the classes (not the species)
being the same, to ensure that <string> = <symbol> is
consistent with <symbol> = <string>.

On 1 August 2018 at 16:47, James Foster <[hidden email]> wrote:
@Richard O’Keefe: I’m trying to find the place in the ANSI Standard you are citing. All I’ve found so far is at 5.3.1.1:

The meaning of "equivalent" cannot be precisely defined but the intent is that two objects are
considered equivalent if they can be used interchangeably. Conforming protocols may choose to
more precisely define the meaning of "equivalent”.

Can you provide a reference for your assertion that objects must be of the same class in order to be equivalent?

How about other comparisons between objects of different classes? Is 1.9 < 2? Is 2.1 > 2? How about 2.0 = 2? Is #a < ‘b’? is #b > ‘a’? How about #a = ‘a’? It seems strange to suggest that an object can be less than or greater than, but not equal to!

James

On Jul 31, 2018, at 9:24 PM, Richard O'Keefe <[hidden email]> wrote:
@James Foster:  100%.  There are no exceptions

On 1 August 2018 at 02:05, James Foster <[hidden email]> wrote:
To what extent is it required by ANSI that objects be of the same class? 

Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

jgfoster
Thanks. That is helpful.

James

On Jul 31, 2018, at 10:28 PM, Richard O'Keefe <[hidden email]> wrote:

Can I give a reference?  Well, I did: the ANSI Smalltalk standard.
Specifically, section 5.7.8.2, the refinement for <sequenceReadableCollection>

Numbers are governed by section 5.6.7.1, which requires numbers that are
compared for equality to be converted to a common representation, which
in this case requires (sigh) integers to be converted to floats.  With
that definition it is possible to find integers x z and a float y such
that x = y and y = z are true but x = z is false, which is a very bad
thing.  My Smalltalk tries to be ANSI conformant, but some things are
just too broken to live with, and that's one of them.  (The rule should
always be to convert in a way that loses no information.)

It may well seem strange to you, but "equivalence" (#=) in Smalltalk
is not coupled to "ordering" (the Magnitude methods) and never has been.
For example, #= compares the characters of strings, which are equal iff
they have the same codepoint.  But #< and the other magnitude methods
use an "implementation-defined" collation order, which could, for example,
ignore alphabetic case.  So it would be entirely consistent with the
standard and historic practice to have 'A' <= 'a' and 'a' <= 'A' but
'A' ~= 'a'.  This is not merely a theoretical possibility; I just tried
it in GNU Smalltalk.

There was a serious historic bug in many Smalltalks where
'a' = #a was true (using string comparison) but
#a = 'a' was false (using identity).  This may well be
why the standard insists on the classes (not the species)
being the same, to ensure that <string> = <symbol> is
consistent with <symbol> = <string>.

On 1 August 2018 at 16:47, James Foster <[hidden email]> wrote:
@Richard O’Keefe: I’m trying to find the place in the ANSI Standard you are citing. All I’ve found so far is at 5.3.1.1:

The meaning of "equivalent" cannot be precisely defined but the intent is that two objects are
considered equivalent if they can be used interchangeably. Conforming protocols may choose to
more precisely define the meaning of "equivalent”.

Can you provide a reference for your assertion that objects must be of the same class in order to be equivalent?

How about other comparisons between objects of different classes? Is 1.9 < 2? Is 2.1 > 2? How about 2.0 = 2? Is #a < ‘b’? is #b > ‘a’? How about #a = ‘a’? It seems strange to suggest that an object can be less than or greater than, but not equal to!

James

On Jul 31, 2018, at 9:24 PM, Richard O'Keefe <[hidden email]> wrote:
@James Foster:  100%.  There are no exceptions

On 1 August 2018 at 02:05, James Foster <[hidden email]> wrote:
To what extent is it required by ANSI that objects be of the same class? 


Reply | Threaded
Open this post in threaded view
|

Re: a basic hash question

Henrik Sperre Johansen
In reply to this post by Richard O'Keefe
Richard O'Keefe wrote
> @James Foster:  100%.  There are no exceptions.
>
>
> On 1 August 2018 at 02:05, James Foster &lt;

> Smalltalk@

> &gt; wrote:
>
>> To what extent is it required by ANSI that objects be of the same class?
>> Does Pharo treat a String and a Symbol as equivalent if they have the
>> same
>> characters?
>>
>> James
>>

Let's not throw the baby out with the bathwater...
Changing SequenceableCollection to do class comparison is a change people
can get behind without much effort; especially when there's a bug that has
to be resolved somehow, and it's the better overall change anyways.
Forcing Strings != Symbols is a harder sell (although personally, I'd love
to see it).
Forcing WideStrings != ByteStrings is never going to happen; but perhaps
that can be said to be compliant due to the phrasing "Unless specifically
refined", and neither being defined in the standard.
Interval/Array (and String/Symbol) don't have the same excuse.

Cheers,
Henry




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html