We found an issue where CompiledMethods can produce the same hash event within the same MethodDictionary. Reported here: Cheers, Max
|
I responded... On Wed, Oct 15, 2014 at 7:11 AM, Max Leske <[hidden email]> wrote:
best, Eliot
|
Administrator
|
I have to disagree with your recommendation. You say that you intend #= to mean "has the same effect as" rather than "is the same as". One of the best things about Smalltalk is how easily we can say what we mean. I think you would be better off creating a method named something like #hasSameEffectAs: to answer what you are presently using #= to do, and change #= to answer the, in my opinion, more sensible "is the same as" that we conventionally think of #= meaning. $0.02 worth and worth everything you paid for it. :-) Richard |
Hi Richard,
On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent <[hidden email]> wrote: Eliot Miranda-2 wrote But that's the point. #= has to mean something and having it mean #== isn't useful, so one has to choose some value-based semantic for CompiledMethod>>#= and the one that's there is useful. Defining what #= means for some value type is far easier than defining what it might mean for something as complex as a CompiledMethod. The definition in Squeak/Pharo has been useful to me in implementing a closure-based system, so I'm unapologetic about the current definition. It is a good one but it doesn't preclude defining others. $0.02 worth and worth everything you paid for it. :-) best, Eliot
|
Administrator
|
An interesting response. You ignored the point that e.g. #hasSameEffectAs: provides greater clarity and add an argument against something I didn't say. I also don't think defining equality for a CompiledMethod is particularly difficult. If I were to recompile a method's source code, I would get a new instance of a CompiledMethod that would, in my opinion, be equal to the one already installed in the class (and perhaps cached in the VM's optimizations). So one would be able to say that we would not replace an existing CompiledMethod with an equal one. The current implementation of #= has no such characteristic, since it proclaims a CompiledMethod named #a to be equal to one named #z. The blue book say #= means "Answer whether the receiver and the argument represent the same component." The current implementation does so only for some, in my opinion, counter-intuitive definition of "same component". |
On Wed, Oct 15, 2014 at 11:53 AM, Richard Sargent <[hidden email]> wrote: Eliot Miranda-2 wrote It's a given. But the selector still isn't perfectly informative, see below. And what's the effect? The effect when executed, so why not hasSameEffectWhenExecuted:? It's a mouthful. I also don't think defining equality for a CompiledMethod is particularly And for some uses that is correct, one names a /is/ equal to one named z, even though their selectors differ. For example if you do the following: aClass compile: 'a ^1'; compile: 'b ^ 1'. aClass compiledMethodAt: #a put: aClass >> #b then "aClass new a" still answers 1. It's only if one introspects (thisContext method selector, opens the debugger, etc) that one sees that the selector doesn't match. If you go back to Smalltalk-80 you'll see that compiled methods didn't store their selector and to find out their selector one searched the method dictionary of the method's method class. As far as what most methods do (excluding introspecting code) the selector is merely a cache of the key in the relevant method dictionary. So #hasSameEffect: *doesn't* mean what one might think it means for some uses (it doesn't only depend on the method in question, but on usage, whether the method is used in an introspective context, etc). So naming a more explanatory selector is more difficult than you might think. The blue book say #= means "Answer whether the receiver and the argument Well Smalltalk-80 says Answer true if the receiver and the argument represent the same object and false otherwise. If = is redefined in any subclass, consider also redefining the message hash. and that's just as vague (AFAIA the blue book doesn't define what a component is), because whether objects are equal or not depends on usage. In the end the system is full of definitions of #= which are more or less generally useful in various contexts. Things are easy for the arithmetic types, but for more complex objects there are always caveats | s1 s2 | s1 := Set new. s1 add: s1. s2 := Set new. s2 add: s2. s1 = s2 doesn't terminate. So should Set's #= be called isEqualIfNonRecursive: ? No. Its limited #= is fine in practice and much better than Smalltalk-80's original fall back to #==. But it is not a perfect equality. Neither is CompiledMethod's. But it being imperfect is not an argument for changing it to another, inevitably also flawed definition without good reason. -- best, Eliot
|
In reply to this post by Richard Sargent
Richard Sargent wrote:
> Eliot Miranda-2 wrote >> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent < > >> richard.sargent@ > >>> wrote: >>> One of the best things about Smalltalk is how easily we can say what we >>> mean. I think you would be better off creating a method named something >>> like >>> #hasSameEffectAs: to answer what you are presently using #= to do, and >>> change #= to answer the, in my opinion, more sensible "is the same as" >>> that >>> we conventionally think of #= meaning. >>> >> But that's the point. #= has to mean something and having it mean #== >> isn't useful, so one has to choose some value-based semantic for >> CompiledMethod>>#= and the one that's there is useful. Defining what #= >> means for some value type is far easier than defining what it might mean >> for something as complex as a CompiledMethod. The definition in >> Squeak/Pharo has been useful to me in implementing a closure-based system, >> so I'm unapologetic about the current definition. It is a good one but it >> doesn't preclude defining others. > > An interesting response. You ignored the point that e.g. #hasSameEffectAs: > provides greater clarity and add an argument against something I didn't say. > > I also don't think defining equality for a CompiledMethod is particularly > difficult. If I were to recompile a method's source code, I would get a new > instance of a CompiledMethod that would, in my opinion, be equal to the one > already installed in the class (and perhaps cached in the VM's > optimizations). So one would be able to say that we would not replace an > existing CompiledMethod with an equal one. The current implementation of #= > has no such characteristic, since it proclaims a CompiledMethod named #a to > be equal to one named #z. > @Richard That doesn't seem to be a good example for what your trying to say. Given... [1] SomeClass>>a "original instance" ^1 [2] SomeClass>>a "recompiled instance" ^1 [3] SomeClass>>z ^1 ...you seem to be saying that its useful to know if [1]=[2], but imply that is invalidated by [2]=[3] ? But [1]=[2] remains true, and just as useful for your example. @Max I guess to call it a bug, you bumped into a different use case where [2]=[3] is problematic. Can you describe that? cheers -ben > The blue book say #= means "Answer whether the receiver and the argument > represent the same component." The current implementation does so only for > some, in my opinion, counter-intuitive definition of "same component". > > > > > -- > View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html > Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com. > > |
Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash. The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent:
So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod. In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing). Cheers, Max
|
But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality. Doru On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote:
"Every thing has its own flow"
|
Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set.
|
Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality. And I am also of the opinion that equality should take the name of the selector and even the name of the class into account. Doru On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote:
"Every thing has its own flow"
|
On 17 Oct 2014, at 11:12, Tudor Girba <[hidden email]> wrote: > Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality. > > And I am also of the opinion that equality should take the name of the selector and even the name of the class into account. From a modelling standpoint it sounds as if one object (CompiledMethod) is used for two different things which results in the conflicting ideas about implementing equality and hashing. A CompiledMethod should hold a CompiledCode object while adding the selector and class. The CompiledCode object could then be equivalent or even be optionally shared among similar methods (like all those implementing ^self). Just an external observation / idea. > Doru > > On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote: > >> On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote: >> >> But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality. > > Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set. > >> >> Doru >> >> On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote: >> >>> On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote: >>> >>> Richard Sargent wrote: >>>> Eliot Miranda-2 wrote >>>>> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent < >>>>> richard.sargent@ >>>>>> wrote: >>>>>> One of the best things about Smalltalk is how easily we can say what we >>>>>> mean. I think you would be better off creating a method named something >>>>>> like >>>>>> #hasSameEffectAs: to answer what you are presently using #= to do, and >>>>>> change #= to answer the, in my opinion, more sensible "is the same as" >>>>>> that >>>>>> we conventionally think of #= meaning. >>>>>> >>>>> But that's the point. #= has to mean something and having it mean #== >>>>> isn't useful, so one has to choose some value-based semantic for >>>>> CompiledMethod>>#= and the one that's there is useful. Defining what #= >>>>> means for some value type is far easier than defining what it might mean >>>>> for something as complex as a CompiledMethod. The definition in >>>>> Squeak/Pharo has been useful to me in implementing a closure-based system, >>>>> so I'm unapologetic about the current definition. It is a good one but it >>>>> doesn't preclude defining others. >>>> An interesting response. You ignored the point that e.g. #hasSameEffectAs: >>>> provides greater clarity and add an argument against something I didn't say. >>>> I also don't think defining equality for a CompiledMethod is particularly >>>> difficult. If I were to recompile a method's source code, I would get a new >>>> instance of a CompiledMethod that would, in my opinion, be equal to the one >>>> already installed in the class (and perhaps cached in the VM's >>>> optimizations). So one would be able to say that we would not replace an >>>> existing CompiledMethod with an equal one. The current implementation of #= >>>> has no such characteristic, since it proclaims a CompiledMethod named #a to >>>> be equal to one named #z. >>> >>> @Richard >>> >>> That doesn't seem to be a good example for what your trying to say. >>> Given... >>> >>> [1] SomeClass>>a "original instance" >>> ^1 >>> >>> [2] SomeClass>>a "recompiled instance" >>> ^1 >>> >>> [3] SomeClass>>z >>> ^1 >>> >>> ...you seem to be saying that its useful to know if [1]=[2], >>> but imply that is invalidated by [2]=[3] ? >>> >>> But [1]=[2] remains true, and just as useful for your example. >>> >>> >>> @Max >>> >>> I guess to call it a bug, you bumped into a different use case >>> where [2]=[3] is problematic. Can you describe that? >> >> Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash. >> >> The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent: >> >> Collection methods select: #isAbstract. >> >> All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is). >> >> So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod. >> >> In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing). >> >> Cheers, >> Max >> >>> >>> >>> cheers -ben >>> >>> >>>> The blue book say #= means "Answer whether the receiver and the argument >>>> represent the same component." The current implementation does so only for >>>> some, in my opinion, counter-intuitive definition of "same component". >>>> -- >>>> View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html >>>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com. >> >> >> >> >> -- >> www.tudorgirba.com >> >> "Every thing has its own flow" > > > > > -- > www.tudorgirba.com > > "Every thing has its own flow" |
In reply to this post by Tudor Girba-2
Did you read Eliot’s argument? He needs the equality definition to find duplicates. I don’t agree with you (anymore). The selector and the class are simply associated with a given CompiledMethod. But the CompiledMethod is still one without a name and without a class it is installed in. From that point of view, neither the class nor the selector should be included in the definition of equality. I do agree however, that it kind of goes against the way programmers tend to think of methods, thus my idea (which I did not think through at all) to have something like CompiledMethodWrapper, that lets CompiledMethod be what it is and abstracts the object for use with class and selector (see my answer to Ben’s e-mail). Cheers, Max
|
In reply to this post by Sven Van Caekenberghe-2
> On 17.10.2014, at 11:39, Sven Van Caekenberghe <[hidden email]> wrote: > > > On 17 Oct 2014, at 11:12, Tudor Girba <[hidden email]> wrote: > >> Exactly. So, the problem with Set is not in hash at all, but in equality. Of course, we can still enhance hash, but we should first focus on equality. >> >> And I am also of the opinion that equality should take the name of the selector and even the name of the class into account. > > From a modelling standpoint it sounds as if one object (CompiledMethod) is used for two different things which results in the conflicting ideas about implementing equality and hashing. A CompiledMethod should hold a CompiledCode object while adding the selector and class. The CompiledCode object could then be equivalent or even be optionally shared among similar methods (like all those implementing ^self). > > Just an external observation / idea. Yes, pretty much what I was thinking. I don’t know what the consequences would be for the VM though... > >> Doru >> >> On Fri, Oct 17, 2014 at 9:52 AM, Max Leske <[hidden email]> wrote: >> >>> On 17.10.2014, at 09:37, Tudor Girba <[hidden email]> wrote: >>> >>> But why is Set being affected by hash? Hash is never guaranteed to be unique. Set should be affected by equality. >> >> Well, actually it’s both #hash and #=. First the set tries to find a suitable place for the element using the elements hash. If that place is already taken it then checks equality. Since the equality definition is mostly the same (same literals, same byte codes etc.), the second element is rejected because it’s already in the set. >> >>> >>> Doru >>> >>> On Fri, Oct 17, 2014 at 9:34 AM, Max Leske <[hidden email]> wrote: >>> >>>> On 17.10.2014, at 02:46, Ben Coman <[hidden email]> wrote: >>>> >>>> Richard Sargent wrote: >>>>> Eliot Miranda-2 wrote >>>>>> On Wed, Oct 15, 2014 at 10:50 AM, Richard Sargent < >>>>>> richard.sargent@ >>>>>>> wrote: >>>>>>> One of the best things about Smalltalk is how easily we can say what we >>>>>>> mean. I think you would be better off creating a method named something >>>>>>> like >>>>>>> #hasSameEffectAs: to answer what you are presently using #= to do, and >>>>>>> change #= to answer the, in my opinion, more sensible "is the same as" >>>>>>> that >>>>>>> we conventionally think of #= meaning. >>>>>>> >>>>>> But that's the point. #= has to mean something and having it mean #== >>>>>> isn't useful, so one has to choose some value-based semantic for >>>>>> CompiledMethod>>#= and the one that's there is useful. Defining what #= >>>>>> means for some value type is far easier than defining what it might mean >>>>>> for something as complex as a CompiledMethod. The definition in >>>>>> Squeak/Pharo has been useful to me in implementing a closure-based system, >>>>>> so I'm unapologetic about the current definition. It is a good one but it >>>>>> doesn't preclude defining others. >>>>> An interesting response. You ignored the point that e.g. #hasSameEffectAs: >>>>> provides greater clarity and add an argument against something I didn't say. >>>>> I also don't think defining equality for a CompiledMethod is particularly >>>>> difficult. If I were to recompile a method's source code, I would get a new >>>>> instance of a CompiledMethod that would, in my opinion, be equal to the one >>>>> already installed in the class (and perhaps cached in the VM's >>>>> optimizations). So one would be able to say that we would not replace an >>>>> existing CompiledMethod with an equal one. The current implementation of #= >>>>> has no such characteristic, since it proclaims a CompiledMethod named #a to >>>>> be equal to one named #z. >>>> >>>> @Richard >>>> >>>> That doesn't seem to be a good example for what your trying to say. >>>> Given... >>>> >>>> [1] SomeClass>>a "original instance" >>>> ^1 >>>> >>>> [2] SomeClass>>a "recompiled instance" >>>> ^1 >>>> >>>> [3] SomeClass>>z >>>> ^1 >>>> >>>> ...you seem to be saying that its useful to know if [1]=[2], >>>> but imply that is invalidated by [2]=[3] ? >>>> >>>> But [1]=[2] remains true, and just as useful for your example. >>>> >>>> >>>> @Max >>>> >>>> I guess to call it a bug, you bumped into a different use case >>>> where [2]=[3] is problematic. Can you describe that? >>> >>> Well, not problematic. Once you accept that neither selector nor class are part of a CompiledMethod it is obvious that two instances with the same byte codes produce the same hash. >>> >>> The actual problem is more one of understanding and use. The following code answers a collection with the CompiledMethods Collection>>add:, Collection>>do: and Collection>>remove:ifAbsent: >>> >>> Collection methods select: #isAbstract. >>> >>> All three CompiledMethods are implemented as ‘^ self subclassResponsibility’, so they have the same byte codes. Now, if you take that collection and make a set out of it you’ll lose Collection>>do: since #do: and #add: produce the same hash, but #remove:ifAbsent: doesn’t because the number of arguments is calculated into the hash (actually the CompiledMethod header is). >>> >>> So, as long as you think of CompiledMethods as objects that have a name, it looks like a bug and in my opinion this behaviour is something that messes with the mind of newcomers. Just a (silly) idea: something like a CompiledMethodWrapper might solve the problem (at least from the user perspective; everything is slightly different from the VM perspective :) ), as it could hold on to the class and the selector independently of the actual CompiledMethod. >>> >>> In the end however, one doesn’t work with compiled methods a lot and the hash situation is unlikely to cause a lot of problems (people working with CompiledMethod usually know what they are doing). >>> >>> Cheers, >>> Max >>> >>>> >>>> >>>> cheers -ben >>>> >>>> >>>>> The blue book say #= means "Answer whether the receiver and the argument >>>>> represent the same component." The current implementation does so only for >>>>> some, in my opinion, counter-intuitive definition of "same component". >>>>> -- >>>>> View this message in context: http://forum.world.st/CompiledMethod-hash-can-produce-clashes-tp4784722p4784779.html >>>>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com. >>> >>> >>> >>> >>> -- >>> www.tudorgirba.com >>> >>> "Every thing has its own flow" >> >> >> >> >> -- >> www.tudorgirba.com >> >> "Every thing has its own flow" > > |
-- Marcus Denker Sent with Airmail On 17 Oct 2014 at 11:45:23, Max Leske ([hidden email]) wrote: Yes, I think, too, that many of the problems with CompiledMethod come from the fact that we “abuse” a low level object in a context where people want a higher level concept. Marcus
|
In reply to this post by Max Leske
Hi Max,
Eliot (phone) |
Well, in the case of my example that wouldn’t change anything. As soon as you put the methods into a set you’ll end up with less entries than before again. Selectors are unique per class anyway so I don’t quite see the benefit of using an IdentitySet over an OrderedCollection (or a Bag for that matter), except for making it more obvious that the result of the message #methods doesn’t need to be filtered further. I should add that the code, the student who discovered the clash wrote, looked more like this: result := Set new. Collection methods do: [ :m | m isAbstract ifTrue: [ result add: m ] ]. Max
|
2014-10-17 15:49 GMT+02:00 Max Leske <[hidden email]>:
Why not just keep a dictionary, like methodDictionary select: #isAbstract or something like that...
|
You’re right of course, there’s no need to put methods into a set. The code is from an exercise and the students aren’t used to Smalltalk, so they end up with all sorts of code.
|
In reply to this post by Max Leske
Hi Max,
Eliot
|
Free forum by Nabble | Edit this page |