CompiledMethod>>hash can produce clashes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

CompiledMethod>>hash can produce clashes

Max Leske
We found an issue where CompiledMethods can produce the same hash event within the same MethodDictionary. Reported here:

https://pharo.fogbugz.com/f/cases/14246/CompiledMethod-hash-can-produce-clashes

Cheers,
Max
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Eliot Miranda-2
I responded...

On Wed, Oct 15, 2014 at 7:11 AM, Max Leske <[hidden email]> wrote:
We found an issue where CompiledMethods can produce the same hash event within the same MethodDictionary. Reported here:

https://pharo.fogbugz.com/f/cases/14246/CompiledMethod-hash-can-produce-clashes

Cheers,
Max



--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Richard Sargent
Administrator
Eliot Miranda-2 wrote
I responded...
I have to disagree with your recommendation. You say that you intend #= to mean "has the same effect as" rather than "is the same as".

One of the best things about Smalltalk is how easily we can say what we mean. I think you would be better off creating a method named something like #hasSameEffectAs: to answer what you are presently using #= to do, and change #= to answer the, in my opinion, more sensible "is the same as" that we conventionally think of #= meaning.


$0.02 worth and worth everything you paid for it. :-)
Richard
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Eliot Miranda-2
Hi Richard,

On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <[hidden email]> wrote:
Eliot Miranda-2 wrote
> I responded...

I have to disagree with your recommendation. You say that you intend #= to
mean "has the same effect as" rather than "is the same as".

One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as" that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #== isn't useful, so one has to choose some value-based semantic for CompiledMethod>>#= and the one that's there is useful.  Defining what #= means for some value type is far easier than defining what it might mean for something as complex as a CompiledMethod.  The definition in Squeak/Pharo has been useful to me in implementing a closure-based system, so I'm unapologetic about the current definition.  It is a good one but it doesn't preclude defining others.

$0.02 worth and worth everything you paid for it. :-)
Richard




--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784771.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Richard Sargent
Administrator
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
[hidden email]> wrote:
> One of the best things about Smalltalk is how easily we can say what we
> mean. I think you would be better off creating a method named something
> like
> #hasSameEffectAs: to answer what you are presently using #= to do, and
> change #= to answer the, in my opinion, more sensible "is the same as" that
> we conventionally think of #= meaning.
>

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs: provides greater clarity and add an argument against something I didn't say.

I also don't think defining equality for a CompiledMethod is particularly difficult. If I were to recompile a method's source code, I would get a new instance of a CompiledMethod that would, in my opinion, be equal to the one already installed in the class (and perhaps cached in the VM's optimizations). So one would be able to say that we would not replace an existing CompiledMethod with an equal one. The current implementation of #= has no such characteristic, since it proclaims a CompiledMethod named #a to be equal to one named #z.

The blue book say #= means "Answer whether the receiver and the argument represent the same component." The current implementation does so only for some, in my opinion, counter-intuitive definition of "same component".
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Eliot Miranda-2


On Wed, Oct 15, 2014 at 11:53 AM, Richard Sargent <[hidden email]> wrote:
Eliot Miranda-2 wrote
> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <

> richard.sargent@

>> wrote:
>> One of the best things about Smalltalk is how easily we can say what we
>> mean. I think you would be better off creating a method named something
>> like
>> #hasSameEffectAs: to answer what you are presently using #= to do, and
>> change #= to answer the, in my opinion, more sensible "is the same as"
>> that
>> we conventionally think of #= meaning.
>>
>
> But that's the point.  #= has to mean something and having it mean #==
> isn't useful, so one has to choose some value-based semantic for
> CompiledMethod>>#= and the one that's there is useful.  Defining what #=
> means for some value type is far easier than defining what it might mean
> for something as complex as a CompiledMethod.  The definition in
> Squeak/Pharo has been useful to me in implementing a closure-based system,
> so I'm unapologetic about the current definition.  It is a good one but it
> doesn't preclude defining others.

An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.

It's a given.  But the selector still isn't perfectly informative, see below.  And what's the effect?  The effect when executed, so why not hasSameEffectWhenExecuted:? It's a mouthful.

I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

And for some uses that is correct, one names a /is/ equal to one named z, even though their selectors differ.  For example if you do the following:

     aClass compile: 'a ^1'; compile: 'b ^ 1'.
     aClass compiledMethodAt: #a put: aClass >> #b

then "aClass new a" still answers 1.  It's only if one introspects (thisContext method selector, opens the debugger, etc) that one sees that the selector doesn't match.  If you go back to Smalltalk-80 you'll see that compiled methods didn't store their selector and to find out their selector one searched the method dictionary of the method's method class.  As far as what most methods do (excluding introspecting code) the selector is merely a cache of the key in the relevant method dictionary.

So #hasSameEffect: *doesn't* mean what one might think it means for some uses (it doesn't only depend on the method in question, but on usage, whether the method is used in an introspective context, etc).  So naming a more explanatory selector is more difficult than you might think.

The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".

Well Smalltalk-80 says
Answer true if the receiver and the argument represent the same object 
and false otherwise. If = is redefined in any subclass, consider also
redefining the message hash.

and that's just as vague (AFAIA the blue book doesn't define what a component is), because whether objects are equal or not depends on usage.  In the end the system is full of definitions of #= which are more or less generally useful in various contexts.  Things are easy for the arithmetic types, but for more complex objects there are always caveats

| s1 s2 |
s1 := Set new.
s1 add: s1.
s2 := Set new.
s2 add: s2.
s1 = s2

doesn't terminate.  So should Set's #= be called isEqualIfNonRecursive: ?  No.  Its limited #= is fine in practice and much better than Smalltalk-80's original fall back to #==.  But it is not a perfect equality.  Neither is CompiledMethod's.  But it being imperfect is not an argument for changing it to another, inevitably also flawed definition without good reason.

--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Ben Coman
In reply to this post by Richard Sargent
Richard Sargent wrote:

> Eliot Miranda-2 wrote
>> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
>
>> richard.sargent@
>
>>> wrote:
>>> One of the best things about Smalltalk is how easily we can say what we
>>> mean. I think you would be better off creating a method named something
>>> like
>>> #hasSameEffectAs: to answer what you are presently using #= to do, and
>>> change #= to answer the, in my opinion, more sensible "is the same as"
>>> that
>>> we conventionally think of #= meaning.
>>>
>> But that's the point.  #= has to mean something and having it mean #==
>> isn't useful, so one has to choose some value-based semantic for
>> CompiledMethod>>#= and the one that's there is useful.  Defining what #=
>> means for some value type is far easier than defining what it might mean
>> for something as complex as a CompiledMethod.  The definition in
>> Squeak/Pharo has been useful to me in implementing a closure-based system,
>> so I'm unapologetic about the current definition.  It is a good one but it
>> doesn't preclude defining others.
>
> An interesting response. You ignored the point that e.g. #hasSameEffectAs:
> provides greater clarity and add an argument against something I didn't say.
>
> I also don't think defining equality for a CompiledMethod is particularly
> difficult. If I were to recompile a method's source code, I would get a new
> instance of a CompiledMethod that would, in my opinion, be equal to the one
> already installed in the class (and perhaps cached in the VM's
> optimizations). So one would be able to say that we would not replace an
> existing CompiledMethod with an equal one. The current implementation of #=
> has no such characteristic, since it proclaims a CompiledMethod named #a to
> be equal to one named #z.
>

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
        ^1

[2] SomeClass>>a "recompiled instance"
        ^1

[3] SomeClass>>z
        ^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?


cheers -ben


> The blue book say #= means "Answer whether the receiver and the argument
> represent the same component." The current implementation does so only for
> some, in my opinion, counter-intuitive definition of "same component".
>
>
>
>
> --
> View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske

On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Tudor Girba-2
But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.

Doru

On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--

"Every thing has its own flow"
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske

On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote:

But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.

Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.


Doru

On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--

"Every thing has its own flow"

Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Tudor Girba-2
Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality.

And I am also of the opinion that equality should take the name of the selector and even the name of the class into account.

Doru

On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote:

But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.

Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.


Doru

On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--

"Every thing has its own flow"




--

"Every thing has its own flow"
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Sven Van Caekenberghe-2

On 17 Oct 2014, at 11:12, Tudor Girba <[hidden email]> wrote:

> Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality.
>
> And I am also of the opinion that equality should take the name of the selector and even the name of the class into account.

From a modelling standpoint it sounds as if one object (CompiledMethod) is used for two different things which results in the conflicting ideas about implementing equality and hashing. A CompiledMethod should hold a CompiledCode object while adding the selector and class. The CompiledCode object could then be equivalent or even be optionally shared among similar methods (like all those implementing ^self).

Just an external observation / idea.

> Doru
>
> On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote:
>
>> On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote:
>>
>> But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.
>
> Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.
>
>>
>> Doru
>>
>> On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:
>>
>>> On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:
>>>
>>> Richard Sargent wrote:
>>>> Eliot Miranda-2 wrote
>>>>> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
>>>>> richard.sargent@
>>>>>> wrote:
>>>>>> One of the best things about Smalltalk is how easily we can say what we
>>>>>> mean. I think you would be better off creating a method named something
>>>>>> like
>>>>>> #hasSameEffectAs: to answer what you are presently using #= to do, and
>>>>>> change #= to answer the, in my opinion, more sensible "is the same as"
>>>>>> that
>>>>>> we conventionally think of #= meaning.
>>>>>>
>>>>> But that's the point.  #= has to mean something and having it mean #==
>>>>> isn't useful, so one has to choose some value-based semantic for
>>>>> CompiledMethod>>#= and the one that's there is useful.  Defining what #=
>>>>> means for some value type is far easier than defining what it might mean
>>>>> for something as complex as a CompiledMethod.  The definition in
>>>>> Squeak/Pharo has been useful to me in implementing a closure-based system,
>>>>> so I'm unapologetic about the current definition.  It is a good one but it
>>>>> doesn't preclude defining others.
>>>> An interesting response. You ignored the point that e.g. #hasSameEffectAs:
>>>> provides greater clarity and add an argument against something I didn't say.
>>>> I also don't think defining equality for a CompiledMethod is particularly
>>>> difficult. If I were to recompile a method's source code, I would get a new
>>>> instance of a CompiledMethod that would, in my opinion, be equal to the one
>>>> already installed in the class (and perhaps cached in the VM's
>>>> optimizations). So one would be able to say that we would not replace an
>>>> existing CompiledMethod with an equal one. The current implementation of #=
>>>> has no such characteristic, since it proclaims a CompiledMethod named #a to
>>>> be equal to one named #z.
>>>
>>> @Richard
>>>
>>> That doesn't seem to be a good example for what your trying to say.
>>> Given...
>>>
>>> [1] SomeClass>>a "original instance"
>>> ^1
>>>
>>> [2] SomeClass>>a "recompiled instance"
>>> ^1
>>>
>>> [3] SomeClass>>z
>>> ^1
>>>
>>> ...you seem to be saying that its useful to know if [1]=[2],
>>> but imply that is invalidated by [2]=[3] ?
>>>
>>> But [1]=[2] remains true, and just as useful for your example.
>>>
>>>
>>> @Max
>>>
>>> I guess to call it a bug, you bumped into a different use case
>>> where [2]=[3] is problematic. Can you describe that?
>>
>> Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.
>>
>> The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:
>>
>> Collection methods select: #isAbstract.
>>
>> All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).
>>
>> So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.
>>
>> In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).
>>
>> Cheers,
>> Max
>>
>>>
>>>
>>> cheers -ben
>>>
>>>
>>>> The blue book say #= means "Answer whether the receiver and the argument
>>>> represent the same component." The current implementation does so only for
>>>> some, in my opinion, counter-intuitive definition of "same component".
>>>> --
>>>> View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
>>>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>>
>>
>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Every thing has its own flow"
>
>
>
>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"


Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske
In reply to this post by Tudor Girba-2

On 17.10.2014, at 11:12, Tudor Girba <[hidden email]> wrote:

Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality.

And I am also of the opinion that equality should take the name of the selector and even the name of the class into account.

Did you read Eliot’s argument? He needs the equality definition to find duplicates.

I don’t agree with you (anymore). The selector and the class are simply associated with a given CompiledMethod. But the CompiledMethod is still one without a name and without a class it is installed in. From that point of view, neither the class nor the selector should be included in the definition of equality.

I do agree however, that it kind of goes against the way programmers tend to think of methods, thus my idea (which I did not think through at all) to have something like CompiledMethodWrapper, that lets CompiledMethod be what it is and abstracts the object for use with class and selector (see my answer to Ben’s e-mail).

Cheers,
Max


Doru

On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote:

But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.

Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.


Doru

On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:

On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--

"Every thing has its own flow"




--

"Every thing has its own flow"

Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske
In reply to this post by Sven Van Caekenberghe-2

> On 17.10.2014, at 11:39, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> On 17 Oct 2014, at 11:12, Tudor Girba <[hidden email]> wrote:
>
>> Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality.
>>
>> And I am also of the opinion that equality should take the name of the selector and even the name of the class into account.
>
> From a modelling standpoint it sounds as if one object (CompiledMethod) is used for two different things which results in the conflicting ideas about implementing equality and hashing. A CompiledMethod should hold a CompiledCode object while adding the selector and class. The CompiledCode object could then be equivalent or even be optionally shared among similar methods (like all those implementing ^self).
>
> Just an external observation / idea.

Yes, pretty much what I was thinking. I don’t know what the consequences would be for the VM though...

>
>> Doru
>>
>> On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote:
>>
>>> On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote:
>>>
>>> But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality.
>>
>> Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.
>>
>>>
>>> Doru
>>>
>>> On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:
>>>
>>>> On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:
>>>>
>>>> Richard Sargent wrote:
>>>>> Eliot Miranda-2 wrote
>>>>>> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
>>>>>> richard.sargent@
>>>>>>> wrote:
>>>>>>> One of the best things about Smalltalk is how easily we can say what we
>>>>>>> mean. I think you would be better off creating a method named something
>>>>>>> like
>>>>>>> #hasSameEffectAs: to answer what you are presently using #= to do, and
>>>>>>> change #= to answer the, in my opinion, more sensible "is the same as"
>>>>>>> that
>>>>>>> we conventionally think of #= meaning.
>>>>>>>
>>>>>> But that's the point.  #= has to mean something and having it mean #==
>>>>>> isn't useful, so one has to choose some value-based semantic for
>>>>>> CompiledMethod>>#= and the one that's there is useful.  Defining what #=
>>>>>> means for some value type is far easier than defining what it might mean
>>>>>> for something as complex as a CompiledMethod.  The definition in
>>>>>> Squeak/Pharo has been useful to me in implementing a closure-based system,
>>>>>> so I'm unapologetic about the current definition.  It is a good one but it
>>>>>> doesn't preclude defining others.
>>>>> An interesting response. You ignored the point that e.g. #hasSameEffectAs:
>>>>> provides greater clarity and add an argument against something I didn't say.
>>>>> I also don't think defining equality for a CompiledMethod is particularly
>>>>> difficult. If I were to recompile a method's source code, I would get a new
>>>>> instance of a CompiledMethod that would, in my opinion, be equal to the one
>>>>> already installed in the class (and perhaps cached in the VM's
>>>>> optimizations). So one would be able to say that we would not replace an
>>>>> existing CompiledMethod with an equal one. The current implementation of #=
>>>>> has no such characteristic, since it proclaims a CompiledMethod named #a to
>>>>> be equal to one named #z.
>>>>
>>>> @Richard
>>>>
>>>> That doesn't seem to be a good example for what your trying to say.
>>>> Given...
>>>>
>>>> [1] SomeClass>>a "original instance"
>>>> ^1
>>>>
>>>> [2] SomeClass>>a "recompiled instance"
>>>> ^1
>>>>
>>>> [3] SomeClass>>z
>>>> ^1
>>>>
>>>> ...you seem to be saying that its useful to know if [1]=[2],
>>>> but imply that is invalidated by [2]=[3] ?
>>>>
>>>> But [1]=[2] remains true, and just as useful for your example.
>>>>
>>>>
>>>> @Max
>>>>
>>>> I guess to call it a bug, you bumped into a different use case
>>>> where [2]=[3] is problematic. Can you describe that?
>>>
>>> Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.
>>>
>>> The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:
>>>
>>> Collection methods select: #isAbstract.
>>>
>>> All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).
>>>
>>> So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.
>>>
>>> In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).
>>>
>>> Cheers,
>>> Max
>>>
>>>>
>>>>
>>>> cheers -ben
>>>>
>>>>
>>>>> The blue book say #= means "Answer whether the receiver and the argument
>>>>> represent the same component." The current implementation does so only for
>>>>> some, in my opinion, counter-intuitive definition of "same component".
>>>>> --
>>>>> View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
>>>>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>>>
>>>
>>>
>>>
>>> --
>>> www.tudorgirba.com
>>>
>>> "Every thing has its own flow"
>>
>>
>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Every thing has its own flow"
>
>


Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Marcus Denker-4

-- 
Marcus Denker
Sent with Airmail

On 17 Oct 2014 at 11:45:23, Max Leske ([hidden email]) wrote:


> On 17.10.2014, at 11:39, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> On 17 Oct 2014, at 11:12, Tudor Girba <[hidden email]> wrote:
>
>> Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality.
>>
>> And I am also of the opinion that equality should take the name of the selector and even the name of the class into account.
>
> From a modelling standpoint it sounds as if one object (CompiledMethod) is used for two different things which results in the conflicting ideas about implementing equality and hashing. A CompiledMethod should hold a CompiledCode object while adding the selector and class. The CompiledCode object could then be equivalent or even be optionally shared among similar methods (like all those implementing ^self).
>
> Just an external observation / idea.

Yes, pretty much what I was thinking. I don’t know what the consequences would be for the VM though...

Yes, I think, too, that many of the problems with CompiledMethod come from the fact that we “abuse” a low level object in a context where people
want a higher level concept.

Marcus
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Eliot Miranda-2
In reply to this post by Max Leske
Hi Max,

On Oct 17, 2014, at 12:34 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

Surely the issue is that "aClass methods" should answer an IdentitySet right?


So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Eliot (phone)
Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske

On 17.10.2014, at 15:25, Eliot Miranda <[hidden email]> wrote:

Hi Max,

On Oct 17, 2014, at 12:34 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

Surely the issue is that "aClass methods" should answer an IdentitySet right?

Well, in the case of my example that wouldn’t change anything. As soon as you put the methods into a set you’ll end up with less entries than before again. Selectors are unique per class anyway so I don’t quite see the benefit of using an IdentitySet over an OrderedCollection (or a Bag for that matter), except for making it more obvious that the result of the message #methods doesn’t need to be filtered further.

I should add that the code, the student who discovered the clash wrote, looked more like this:

result := Set new.
Collection methods do: [ :m |
m isAbstract ifTrue: [ result add: m ] ].

Max



So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Eliot (phone)

Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Nicolas Cellier


2014-10-17 15:49 GMT+02:00 Max Leske <[hidden email]>:

On 17.10.2014, at 15:25, Eliot Miranda <[hidden email]> wrote:

Hi Max,

On Oct 17, 2014, at 12:34 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

Surely the issue is that "aClass methods" should answer an IdentitySet right?

Well, in the case of my example that wouldn’t change anything. As soon as you put the methods into a set you’ll end up with less entries than before again. Selectors are unique per class anyway so I don’t quite see the benefit of using an IdentitySet over an OrderedCollection (or a Bag for that matter), except for making it more obvious that the result of the message #methods doesn’t need to be filtered further.

I should add that the code, the student who discovered the clash wrote, looked more like this:

result := Set new.
Collection methods do: [ :m |
m isAbstract ifTrue: [ result add: m ] ].

Max


Why not just keep a dictionary, like methodDictionary select: #isAbstract or something like that...
 


So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Eliot (phone)


Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Max Leske

On 17.10.2014, at 15:52, Nicolas Cellier <[hidden email]> wrote:



2014-10-17 15:49 GMT+02:00 Max Leske <[hidden email]>:

On 17.10.2014, at 15:25, Eliot Miranda <[hidden email]> wrote:

Hi Max,

On Oct 17, 2014, at 12:34 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

Surely the issue is that "aClass methods" should answer an IdentitySet right?

Well, in the case of my example that wouldn’t change anything. As soon as you put the methods into a set you’ll end up with less entries than before again. Selectors are unique per class anyway so I don’t quite see the benefit of using an IdentitySet over an OrderedCollection (or a Bag for that matter), except for making it more obvious that the result of the message #methods doesn’t need to be filtered further.

I should add that the code, the student who discovered the clash wrote, looked more like this:

result := Set new.
Collection methods do: [ :m |
m isAbstract ifTrue: [ result add: m ] ].

Max


Why not just keep a dictionary, like methodDictionary select: #isAbstract or something like that…

You’re right of course, there’s no need to put methods into a set. The code is from an exercise and the students aren’t used to Smalltalk, so they end up with all sorts of code.

 


So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Eliot (phone)

Reply | Threaded
Open this post in threaded view
|

Re: CompiledMethod>>hash can produce clashes

Eliot Miranda-2
In reply to this post by Max Leske
Hi Max,


On Oct 17, 2014, at 6:49 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 15:25, Eliot Miranda <[hidden email]> wrote:

Hi Max,

On Oct 17, 2014, at 12:34 AM, Max Leske <[hidden email]> wrote:


On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote:

Richard Sargent wrote:
Eliot Miranda-2 wrote
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <
richard.sargent@
wrote:
One of the best things about Smalltalk is how easily we can say what we
mean. I think you would be better off creating a method named something
like
#hasSameEffectAs: to answer what you are presently using #= to do, and
change #= to answer the, in my opinion, more sensible "is the same as"
that
we conventionally think of #= meaning.

But that's the point.  #= has to mean something and having it mean #==
isn't useful, so one has to choose some value-based semantic for
CompiledMethod>>#= and the one that's there is useful.  Defining what #=
means for some value type is far easier than defining what it might mean
for something as complex as a CompiledMethod.  The definition in
Squeak/Pharo has been useful to me in implementing a closure-based system,
so I'm unapologetic about the current definition.  It is a good one but it
doesn't preclude defining others.
An interesting response. You ignored the point that e.g. #hasSameEffectAs:
provides greater clarity and add an argument against something I didn't say.
I also don't think defining equality for a CompiledMethod is particularly
difficult. If I were to recompile a method's source code, I would get a new
instance of a CompiledMethod that would, in my opinion, be equal to the one
already installed in the class (and perhaps cached in the VM's
optimizations). So one would be able to say that we would not replace an
existing CompiledMethod with an equal one. The current implementation of #=
has no such characteristic, since it proclaims a CompiledMethod named #a to
be equal to one named #z.

@Richard

That doesn't seem to be a good example for what your trying to say.
Given...

[1] SomeClass>>a "original instance"
^1

[2] SomeClass>>a "recompiled instance"
^1

[3] SomeClass>>z
^1

...you seem to be saying that its useful to know if [1]=[2],
but imply that is invalidated by [2]=[3] ?

But [1]=[2] remains true, and just as useful for your example.


@Max

I guess to call it a bug, you bumped into a different use case
where [2]=[3] is problematic. Can you describe that?

Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash.

The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:

Collection methods select: #isAbstract.

All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is).

Surely the issue is that "aClass methods" should answer an IdentitySet right?

Well, in the case of my example that wouldn’t change anything. As soon as you put the methods into a set you’ll end up with less entries than before again.

No.  An IdentitySet uses #== and identityHash so that won't be the case.  Also IIRC. anIdentitySet collect: answers another IdentitySet.


Selectors are unique per class anyway so I don’t quite see the benefit of using an IdentitySet over an OrderedCollection (or a Bag for that matter), except for making it more obvious that the result of the message #methods doesn’t need to be filtered further.

Well then (IdentitySet withAll: aClass methods) collect: will not lose elements.  You'd have exactly the same issue if you tried to collect the set of all literals in the system.  Unless you used an IdentitySet you'd collect only the different values, not all literals.  Or any set of all objects that represent values.


I should add that the code, the student who discovered the clash wrote, looked more like this:

result := Set new.
Collection methods do: [ :m |
m isAbstract ifTrue: [ result add: m ] ].

So change it to IdentitySet new.  This is good for the student to encounter.  It is one of the essential differences between OO and functional languages. We have state and identity.

Max



So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod.

In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing).

Cheers,
Max



cheers -ben


The blue book say #= means "Answer whether the receiver and the argument
represent the same component." The current implementation does so only for
some, in my opinion, counter-intuitive definition of "same component".
--
View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Eliot (phone)

Cheers,
Eliot
12