On 05/03/2011 07:06 AM, [hidden email] wrote:
> Using #deepCopy, I have trouble understanding why it the > code fails. Thank you very much for the solution - I need to define '#=' and '#hash' in MyContact. Without examining the entire inheritance chain, how do we determine whether the inherited methods suffice? How do we determine when test first approach would be necessary? Why does Object decide that identity equals is the appropriate implementation? Dense :( Thanks for help _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
The implementation of Object's #= is the best thing to do and immediately reasonable. The reason
for this is the basic object-oriented concept of identity. What can we tell for an object when we ask "are you equal to this?" We *can* tell that an object is equal to another if it is identical. This is the obvious truth for everything, even in real world. So it's definitely correct to implement #= with "^self == anObject". Everything else is a matter of modeling. It's the designer's task to determine which inst.var. assignment makes up an equal setup. Think about twins: When are twins equal? That depends on how you see them. They are not equal per se. But if your point of view (the model aspects that you are interested in) is simply height or face similarity or clothes (you know all twins wear equal clothes all the time :-) ) or last name, then the twins might be equal. This is what your model must express. If not, then the twin are never equal, although they have so much things in common. The same is with your deep copied clones. At a certain level there is a difference. Again the example with the twins: Faces might look very similar, but with close look there are tiny differences. If your point of view ignores certain small differences, then you will get equality. If not, you can't get equality. In general it is not necessary to check the entire inheritance chain for equality definitions. We live well with the predefined definition without caring too much. At each level of our domain we will add some refinements, for the object under consideration. And this is the point where I check the inheritance, when I add an implementation of #=. I fear that your troubles with equality come from the use of cloning. Normally, we avoid this pattern, and - as Alan pointed out - you won't find #deepCopy in the base (with good reason). Usually, a simple #copy, maybe with refinement by reimplementing #postCopy will do a good job. I won't discuss whether cloning is really necessary or not. But your need of differentiating clones from originals hints on possibly inappropriate mixture. If you clone, then you should keep the copies in "quarantine", i.e. in use cases where you might need an editing or backup copy, but these uses cases should go with clones exclusively and not with originals, and they should prevent from "infecting" the domain with clones. Regards Holger Guhl -- Senior Consultant * Certified Scrum Master * [hidden email] Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20 Georg Heeg eK Dortmund Handelsregister: Amtsgericht Dortmund A 12812 Am 04.05.2011 02:38, schrieb intrader: > On 05/03/2011 07:06 AM, [hidden email] wrote: >> Using #deepCopy, I have trouble understanding why it the >> code fails. > Thank you very much for the solution - I need to define '#=' and '#hash' > in MyContact. > > Without examining the entire inheritance chain, how do we determine > whether the inherited methods suffice? > How do we determine when test first approach would be necessary? > > Why does Object decide that identity equals is the appropriate > implementation? > > Dense :( > > Thanks for help > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Intrader Intrader
Hello.
My response is inline. On 4.5.2011 02:38, intrader wrote: > On 05/03/2011 07:06 AM, [hidden email] wrote: >> Using #deepCopy, I have trouble understanding why it the >> code fails. > > Thank you very much for the solution - I need to define '#=' and '#hash' > in MyContact. > > Without examining the entire inheritance chain, how do we determine > whether the inherited methods suffice? > How do we determine when test first approach would be necessary? You asked a general question so the following is only my subjective opinion about inheritance and its overall usefulness based on my own experience with OO in Smalltalk (and OO in general). I think that the inheritance is overrated. It gives you short term benefits (quick prototyping because of code reuse), but it can bite you just as easily if you do not watch your back. An alternative to the inheritance is composition & delegation which is much easier to maintain over time, because it is easy to plug a different delegate to the object chain. I think that you should always examine each and every inherited method you're about to use on your class. Only when you know what it does you can decide responsibly whether it is the right fit for your _current_ needs. And you document your decision by using it in a working test case of your class. Later, when the current needs are altogether different and the inherited implementation no longer suffices, the failing test should inform you about this fact. By this I don't mean you should write a separate test case for every method of a class, only that you should write a test case where the particular method is called (along with others) to accomplish a portion of your business logic AND that the current implementation of the method is important for it, i.e. has some impact that can be asserted in the test case. This is not as hard as you might think because in a while you will gain confidence about what most of the inherited methods do. And Smalltalk is really a wonderful environment which encourages you to try - you can play with objects and their methods in the workspace or directly in the debugger all day long. > Why does Object decide that identity equals is the appropriate > implementation? It's not an appropriate implementation, it is simply a default one which works most of the time. Just my 2c anyway, Ladislav Lenart _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Intrader Intrader
On 11-May-3, at 8:38 PM, intrader wrote: > On 05/03/2011 07:06 AM, [hidden email] wrote: >> Using #deepCopy, I have trouble understanding why it the >> code fails. > > Thank you very much for the solution - I need to define '#=' and > '#hash' > in MyContact. > > Without examining the entire inheritance chain, how do we determine > whether the inherited methods suffice? As a general rule, there will be one implementation of #= and #hash in Object, and then one more in a subclass if the default behaviour is insufficient. You would only see more implementations than that in deep and somewhat complex hierarchies. > How do we determine when test first approach would be necessary? I'm not sure what you're asking. Can you clarify please? > Why does Object decide that identity equals is the appropriate > implementation? The reason is that instances of Object have no state (ie no instance variables). Each instance of Object (or a subclass that adds no instance variables) is indistinguishable from another. The only way to tell instances apart is to ask if you are talking about identically the same objects or not. Likewise for #hash -- since there is no state, the only thing you can base a hash on is object identity. As soon as you add state (instance variables) to a subclass, you have an opportunity to override #= and #hash. The key to doing this correctly is to determine which instance variables determine the equality of two objects, and use only those. So for example, I could implement DriversLicense that has instance variables name, address, licenseNumber. To my mind, the implementation of #= should be = anObject anObject class = DriversLicense ifFalse: [^false]. ^licenseNumber = anObject licenseNumber and the implementation of #hash should be hash ^licenseNumber hash The reason for choosing only licenseNumber for #= is that it is the only bit of state that is unique to each object. Two DriversLicenses can have the same name, or the same address and still be (fairly) not equal, but once the licenseNumber is the same, then we are talking about the same license. Regarding #hash, there is more latitude. The rule is that objects that are equal must have the same hash. If just hashing the license number was producing hashes that were causing too many collisions, you could mix in other variables with it, for example ^licenseNumber hash bitAnd: name hash. By and large, though, this will not be required unless you are storing many, many instances in sets or as dictionary keys. Hope this helps, -Anthony > > Dense :( > > Thanks for help > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hello,
On 4.5.2011 14:15, Anthony Lander wrote: > > On 11-May-3, at 8:38 PM, intrader wrote: > >> On 05/03/2011 07:06 AM, [hidden email] wrote: >>> Using #deepCopy, I have trouble understanding why it the >>> code fails. >> >> Thank you very much for the solution - I need to define '#=' and >> '#hash' >> in MyContact. >> >> Without examining the entire inheritance chain, how do we determine >> whether the inherited methods suffice? > > As a general rule, there will be one implementation of #= and #hash in > Object, and then one more in a subclass if the default behaviour is > insufficient. You would only see more implementations than that in > deep and somewhat complex hierarchies. > >> How do we determine when test first approach would be necessary? > > I'm not sure what you're asking. Can you clarify please? > >> Why does Object decide that identity equals is the appropriate >> implementation? > > The reason is that instances of Object have no state (ie no instance > variables). Each instance of Object (or a subclass that adds no > instance variables) is indistinguishable from another. The only way to > tell instances apart is to ask if you are talking about identically > the same objects or not. Likewise for #hash -- since there is no > state, the only thing you can base a hash on is object identity. > > As soon as you add state (instance variables) to a subclass, you have > an opportunity to override #= and #hash. The key to doing this > correctly is to determine which instance variables determine the > equality of two objects, and use only those. So for example, I could > implement DriversLicense that has instance variables name, address, > licenseNumber. To my mind, the implementation of #= should be > > = anObject > anObject class = DriversLicense ifFalse: [^false]. > ^licenseNumber = anObject licenseNumber > > and the implementation of #hash should be > > hash > ^licenseNumber hash > > The reason for choosing only licenseNumber for #= is that it is the > only bit of state that is unique to each object. Two DriversLicenses > can have the same name, or the same address and still be (fairly) not > equal, but once the licenseNumber is the same, then we are talking > about the same license. > > Regarding #hash, there is more latitude. The rule is that objects that > are equal must have the same hash. If just hashing the license number > was producing hashes that were causing too many collisions, you could > mix in other variables with it, for example ^licenseNumber hash > bitAnd: name hash. By and large, though, this will not be required > unless you are storing many, many instances in sets or as dictionary > keys. Doesn't this violate the rule that equal objects must have the same hash? Let's suppose there are two DriversLicense instances that have the same licenseNumber (thus are equal by definition of #=) but each has a different name. The modified hash method: DriversLicense>>hash ^licenseNumber hash bitAnd: name hash will produce different hash values for these two instances and thus breaking the rule, no? Please correct me if I am wrong. (On a side note: It seems to me that almost all existing #hash implementations use #bitXor: as the preferred merging operation, but I have never seen them to use #bitAnd:.) Ladislav Lenart _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Apologies, you are correct. I was typing before the first coffee,
which is always a bad idea. Indeed the hash has to be more general than the equals comparison. Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one which results in the best hashed collection performance, which should be determined on a case-by-case basis. -Anthony On 11-May-4, at 9:06 AM, Ladislav Lenart wrote: > Hello, > > > On 4.5.2011 14:15, Anthony Lander wrote: >> >> On 11-May-3, at 8:38 PM, intrader wrote: >> >>> On 05/03/2011 07:06 AM, [hidden email] wrote: >>>> Using #deepCopy, I have trouble understanding why it the >>>> code fails. >>> >>> Thank you very much for the solution - I need to define '#=' and >>> '#hash' >>> in MyContact. >>> >>> Without examining the entire inheritance chain, how do we determine >>> whether the inherited methods suffice? >> >> As a general rule, there will be one implementation of #= and #hash >> in >> Object, and then one more in a subclass if the default behaviour is >> insufficient. You would only see more implementations than that in >> deep and somewhat complex hierarchies. >> >>> How do we determine when test first approach would be necessary? >> >> I'm not sure what you're asking. Can you clarify please? >> >>> Why does Object decide that identity equals is the appropriate >>> implementation? >> >> The reason is that instances of Object have no state (ie no instance >> variables). Each instance of Object (or a subclass that adds no >> instance variables) is indistinguishable from another. The only way >> to >> tell instances apart is to ask if you are talking about identically >> the same objects or not. Likewise for #hash -- since there is no >> state, the only thing you can base a hash on is object identity. >> >> As soon as you add state (instance variables) to a subclass, you have >> an opportunity to override #= and #hash. The key to doing this >> correctly is to determine which instance variables determine the >> equality of two objects, and use only those. So for example, I could >> implement DriversLicense that has instance variables name, address, >> licenseNumber. To my mind, the implementation of #= should be >> >> = anObject >> anObject class = DriversLicense ifFalse: [^false]. >> ^licenseNumber = anObject licenseNumber >> >> and the implementation of #hash should be >> >> hash >> ^licenseNumber hash >> >> The reason for choosing only licenseNumber for #= is that it is the >> only bit of state that is unique to each object. Two DriversLicenses >> can have the same name, or the same address and still be (fairly) not >> equal, but once the licenseNumber is the same, then we are talking >> about the same license. >> >> Regarding #hash, there is more latitude. The rule is that objects >> that >> are equal must have the same hash. If just hashing the license number >> was producing hashes that were causing too many collisions, you could >> mix in other variables with it, for example ^licenseNumber hash >> bitAnd: name hash. By and large, though, this will not be required >> unless you are storing many, many instances in sets or as dictionary >> keys. > > Doesn't this violate the rule that equal objects must have the same > hash? Let's suppose there are two DriversLicense instances that have > the same licenseNumber (thus are equal by definition of #=) but each > has a different name. The modified hash method: > > DriversLicense>>hash > ^licenseNumber hash bitAnd: name hash > > will produce different hash values for these two instances and thus > breaking the rule, no? Please correct me if I am wrong. > > (On a side note: It seems to me that almost all existing #hash > implementations use #bitXor: as the preferred merging operation, > but I have never seen them to use #bitAnd:.) > > > Ladislav Lenart > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Thank you for your explanation,
Ladislav Lenart On 4.5.2011 15:20, Anthony Lander wrote: > Apologies, you are correct. I was typing before the first coffee, which is always a bad idea. > > Indeed the hash has to be more general than the equals comparison. Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one which results in the best hashed collection performance, which > should be determined on a case-by-case basis. [snip] >>> Regarding #hash, there is more latitude. The rule is that objects that >>> are equal must have the same hash. If just hashing the license number >>> was producing hashes that were causing too many collisions, you could >>> mix in other variables with it, for example ^licenseNumber hash >>> bitAnd: name hash. By and large, though, this will not be required >>> unless you are storing many, many instances in sets or as dictionary >>> keys. >> >> Doesn't this violate the rule that equal objects must have the same >> hash? Let's suppose there are two DriversLicense instances that have >> the same licenseNumber (thus are equal by definition of #=) but each >> has a different name. The modified hash method: >> >> DriversLicense>>hash >> ^licenseNumber hash bitAnd: name hash >> >> will produce different hash values for these two instances and thus >> breaking the rule, no? Please correct me if I am wrong. >> >> (On a side note: It seems to me that almost all existing #hash >> implementations use #bitXor: as the preferred merging operation, >> but I have never seen them to use #bitAnd:.) >> >> >> Ladislav Lenart _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Anthony Lander-2
For strings, #bitAnd: can only ever increase the collisions of hashed
values: you'd be better off using just "licenseNumber hash" rather than "licenseNumber hash bitAnd: name hash". This is true for any decent subobject hashes, which spread their values over most of the positive SmallInteger range. For these, simply use #bitXor:. If you have two subobjects whose hash range is poor (less than 14 bits), you can usefully shift the bits of one left before combining. E.g. if you have integer screen co-ordinates both in the range 0..4095, their hashes are the same as the actual values, so you could use "(x hash bitShift: 12) bitAnd: y hash". Steve Anthony Lander wrote: > Apologies, you are correct. I was typing before the first coffee, > which is always a bad idea. > > Indeed the hash has to be more general than the equals comparison. > Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one > which results in the best hashed collection performance, which should > be determined on a case-by-case basis. > > -Anthony > > >> implement DriversLicense that has instance variables name, address, > >> licenseNumber. To my mind, the implementation of #= should be > >> > >> = anObject > >> anObject class = DriversLicense ifFalse: [^false]. > >> ^licenseNumber = anObject licenseNumber > >> > >> and the implementation of #hash should be > >> > >> hash > >> ^licenseNumber hash > >> > >> The reason for choosing only licenseNumber for #= is that it is the > >> only bit of state that is unique to each object. Two > >> can have the same name, or the same address and still be (fairly) > not > >> equal, but once the licenseNumber is the same, then we are talking > >> about the same license. > >> > >> Regarding #hash, there is more latitude. The rule is that objects > >> that > >> are equal must have the same hash. If just hashing the license > number > >> was producing hashes that were causing too many collisions, you > could > >> mix in other variables with it, for example ^licenseNumber hash > >> bitAnd: name hash. By and large, though, this will not be required > >> unless you are storing many, many instances in sets or as > >> keys. > > > > Doesn't this violate the rule that equal objects must have the same > > hash? Let's suppose there are two DriversLicense instances that have > > the same licenseNumber (thus are equal by definition of #=) but each > > has a different name. The modified hash method: > > > > DriversLicense>>hash > > ^licenseNumber hash bitAnd: name hash > > > > will produce different hash values for these two instances and thus > > breaking the rule, no? Please correct me if I am wrong. > > > > (On a side note: It seems to me that almost all existing #hash > > implementations use #bitXor: as the preferred merging operation, > > but I have never seen them to use #bitAnd:.) > > > > > > Ladislav Lenart _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Anthony Lander-2
Anthony Lander wrote:
> Apologies, you are correct. I was typing before the first coffee, > which is always a bad idea. > > Indeed the hash has to be more general than the equals comparison. > Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one > which results in the best hashed collection performance, which should > be determined on a case-by-case basis. > > -Anthony > For a good hashed collection performance, you will want each hash value to have a similar probability. bitAnd: will give higher probability to hash values with many zeros in their bit pattern, while bitOr: will give higher probability to hash values with many ones in their bit pattern. Only bitXor: will result in a uniform distribution (assuming that both values being xor'ed have a uniform distribution). Therefore bitXor: is always better than bitAnd: and bitOr:. Execute this workspace code: and := Bag new. or := Bag new. xor := Bag new. 0 to: 7 do: [:i | 0 to: 7 do: [:j | and add: (i bitAnd: j). or add: (i bitOr: j). xor add: (i bitXor: j)]]. When you inspect the three bags afterwards, you will notice that in the xor-bag all values appear 8 times, in the and-bag 0 appears 27 times while 7 appears only once. So the hash values 0,1,2, and 4 would appear very frequently, leading to more hash collisions. Ralf -- Ralf Propach, [hidden email] Tel: +49 231 975 99 38 Fax: +49 231 975 99 20 Georg Heeg eK (Dortmund) Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Steven Kelly
On 5/4/2011 3:41 PM, Steven Kelly wrote:
> If you have two subobjects whose hash range is poor (less than 14 bits), > you can usefully shift the bits of one left before combining. E.g. if > you have integer screen co-ordinates both in the range 0..4095, their > hashes are the same as the actual values, so you could use "(x hash > bitShift: 12) bitAnd: y hash". But this still requires a lot of attention to get the bits usage 'right' for your current SmallInteger range. I suggest using #hashMultiply in an asymmetric way to avoid reinventing the wheel again and again. See for example (in 7.7.1) Point>>hash ^x hash hashMultiply bitXor: y hash R - _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Reinout Heeck wrote:
> On 5/4/2011 3:41 PM, Steven Kelly wrote: > > If you have two subobjects whose hash range is poor (<14 bits), > > you can usefully shift the bits of one left before combining. > > I suggest using #hashMultiply in an asymmetric way to avoid reinventing > the wheel again and again. > > Point>>hash > ^x hash hashMultiply bitXor: y hash Nice - I hadn't looked into all the details of the 7.6 hash changes yet. Don E. Knuth = done enough. To quote the 7.6 release notes: For interest['s] sake, a tool, called Hash Analysis Tool, is available from the public Store repository. To use it, load the bundle called Hash Analysis Tool and then evaluate: HashAnalysisToolUI open Steve _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |