Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Intrader Intrader
On 05/03/2011 07:06 AM, [hidden email] wrote:
> Using #deepCopy, I have trouble understanding why it the
>        code fails.

Thank you very much for the solution - I need to define '#=' and '#hash'
in MyContact.

Without examining the entire inheritance chain, how do we determine
whether the inherited methods suffice?
How do we determine when test first approach would be necessary?

Why does Object decide that identity equals is the appropriate
implementation?

Dense :(

Thanks for help

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Holger Guhl
  The implementation of Object's #= is the best thing to do and immediately reasonable. The reason
for this is the basic object-oriented concept of identity.
What can we tell for an object when we ask "are you equal to this?" We *can* tell that an object is
equal to another if it is identical. This is the obvious truth for everything, even in real world.
So it's definitely correct to implement #= with "^self == anObject".
Everything else is a matter of modeling. It's the designer's task to determine which inst.var.
assignment makes up an equal setup. Think about twins: When are twins equal? That depends on how you
see them. They are not equal per se. But if your point of view (the model aspects that you are
interested in) is simply height or face similarity or clothes (you know all twins wear equal clothes
all the time :-)  ) or last name, then the twins might be equal. This is what your model must
express. If not, then the twin are never equal, although they have so much things in common.
The same is with your deep copied clones. At a certain level there is a difference. Again the
example with the twins: Faces might look very similar, but with close look there are tiny
differences. If your point of view ignores certain small differences, then you will get equality. If
not, you can't get equality.
In general it is not necessary to check the entire inheritance chain for equality definitions. We
live well with the predefined definition without caring too much. At each level of our domain we
will add some refinements, for the object under consideration. And this is the point where I check
the inheritance, when I add an implementation of #=.

I fear that your troubles with equality come from the use of cloning. Normally, we avoid this
pattern, and - as Alan pointed out - you won't find #deepCopy in the base (with good reason).
Usually, a simple #copy, maybe with refinement by reimplementing #postCopy will do a good job. I
won't discuss whether cloning is really necessary or not. But your need of differentiating clones
from originals hints on possibly inappropriate mixture. If you clone, then you should keep the
copies in "quarantine", i.e. in use cases where you might need an editing or backup copy, but these
uses cases should go with clones exclusively and not with originals, and they should prevent from
"infecting" the domain with clones.

Regards

Holger Guhl
--
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812


Am 04.05.2011 02:38, schrieb intrader:

> On 05/03/2011 07:06 AM, [hidden email] wrote:
>> Using #deepCopy, I have trouble understanding why it the
>>         code fails.
> Thank you very much for the solution - I need to define '#=' and '#hash'
> in MyContact.
>
> Without examining the entire inheritance chain, how do we determine
> whether the inherited methods suffice?
> How do we determine when test first approach would be necessary?
>
> Why does Object decide that identity equals is the appropriate
> implementation?
>
> Dense :(
>
> Thanks for help
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Ladislav Lenart
In reply to this post by Intrader Intrader
Hello.

My response is inline.


On 4.5.2011 02:38, intrader wrote:

> On 05/03/2011 07:06 AM, [hidden email] wrote:
>> Using #deepCopy, I have trouble understanding why it the
>>         code fails.
>
> Thank you very much for the solution - I need to define '#=' and '#hash'
> in MyContact.
>
> Without examining the entire inheritance chain, how do we determine
> whether the inherited methods suffice?
> How do we determine when test first approach would be necessary?

You asked a general question so the following is only my subjective
opinion about inheritance and its overall usefulness based on my
own experience with OO in Smalltalk (and OO in general). I think
that the inheritance is overrated. It gives you short term benefits
(quick prototyping because of code reuse), but it can bite you just
as easily if you do not watch your back. An alternative to the
inheritance is composition & delegation which is much easier to
maintain over time, because it is easy to plug a different delegate
to the object chain.

I think that you should always examine each and every inherited
method you're about to use on your class. Only when you know what
it does you can decide responsibly whether it is the right fit for
your _current_ needs. And you document your decision by using it
in a working test case of your class. Later, when the current needs
are altogether different and the inherited implementation no longer
suffices, the failing test should inform you about this fact.

By this I don't mean you should write a separate test case for
every method of a class, only that you should write a test case
where the particular method is called (along with others) to
accomplish a portion of your business logic AND that the current
implementation of the method is important for it, i.e. has some
impact that can be asserted in the test case.

This is not as hard as you might think because in a while you
will gain confidence about what most of the inherited methods
do. And Smalltalk is really a wonderful environment which
encourages you to try - you can play with objects and their
methods in the workspace or directly in the debugger all day
long.


> Why does Object decide that identity equals is the appropriate
> implementation?

It's not an appropriate implementation, it is simply a default
one which works most of the time.


Just my 2c anyway,

Ladislav Lenart

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Anthony Lander-2
In reply to this post by Intrader Intrader

On 11-May-3, at 8:38 PM, intrader wrote:

> On 05/03/2011 07:06 AM, [hidden email] wrote:
>> Using #deepCopy, I have trouble understanding why it the
>>       code fails.
>
> Thank you very much for the solution - I need to define '#=' and  
> '#hash'
> in MyContact.
>
> Without examining the entire inheritance chain, how do we determine
> whether the inherited methods suffice?

As a general rule, there will be one implementation of #= and #hash in  
Object, and then one more in a subclass if the default behaviour is  
insufficient. You would only see more implementations than that in  
deep and somewhat complex hierarchies.

> How do we determine when test first approach would be necessary?

I'm not sure what you're asking. Can you clarify please?

> Why does Object decide that identity equals is the appropriate
> implementation?

The reason is that instances of Object have no state (ie no instance  
variables). Each instance of Object (or a subclass that adds no  
instance variables) is indistinguishable from another. The only way to  
tell instances apart is to ask if you are talking about identically  
the same objects or not. Likewise for #hash -- since there is no  
state, the only thing you can base a hash on is object identity.

As soon as you add state (instance variables) to a subclass, you have  
an opportunity to override #= and #hash. The key to doing this  
correctly is to determine which instance variables determine the  
equality of two objects, and use only those. So for example, I could  
implement DriversLicense that has instance variables name, address,  
licenseNumber. To my mind, the implementation of #= should be

= anObject
        anObject class = DriversLicense ifFalse: [^false].
        ^licenseNumber = anObject licenseNumber

and the implementation of #hash should be

hash
        ^licenseNumber hash

The reason for choosing only licenseNumber for #= is that it is the  
only bit of state that is unique to each object. Two DriversLicenses  
can have the same name, or the same address and still be (fairly) not  
equal, but once the licenseNumber is the same, then we are talking  
about the same license.

Regarding #hash, there is more latitude. The rule is that objects that  
are equal must have the same hash. If just hashing the license number  
was producing hashes that were causing too many collisions, you could  
mix in other variables with it, for example ^licenseNumber hash  
bitAnd: name hash. By and large, though, this will not be required  
unless you are storing many, many instances in sets or as dictionary  
keys.

Hope this helps,

  -Anthony


>
> Dense :(
>
> Thanks for help
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Ladislav Lenart
Hello,


On 4.5.2011 14:15, Anthony Lander wrote:

>
> On 11-May-3, at 8:38 PM, intrader wrote:
>
>> On 05/03/2011 07:06 AM, [hidden email] wrote:
>>> Using #deepCopy, I have trouble understanding why it the
>>>        code fails.
>>
>> Thank you very much for the solution - I need to define '#=' and
>> '#hash'
>> in MyContact.
>>
>> Without examining the entire inheritance chain, how do we determine
>> whether the inherited methods suffice?
>
> As a general rule, there will be one implementation of #= and #hash in
> Object, and then one more in a subclass if the default behaviour is
> insufficient. You would only see more implementations than that in
> deep and somewhat complex hierarchies.
>
>> How do we determine when test first approach would be necessary?
>
> I'm not sure what you're asking. Can you clarify please?
>
>> Why does Object decide that identity equals is the appropriate
>> implementation?
>
> The reason is that instances of Object have no state (ie no instance
> variables). Each instance of Object (or a subclass that adds no
> instance variables) is indistinguishable from another. The only way to
> tell instances apart is to ask if you are talking about identically
> the same objects or not. Likewise for #hash -- since there is no
> state, the only thing you can base a hash on is object identity.
>
> As soon as you add state (instance variables) to a subclass, you have
> an opportunity to override #= and #hash. The key to doing this
> correctly is to determine which instance variables determine the
> equality of two objects, and use only those. So for example, I could
> implement DriversLicense that has instance variables name, address,
> licenseNumber. To my mind, the implementation of #= should be
>
> = anObject
> anObject class = DriversLicense ifFalse: [^false].
> ^licenseNumber = anObject licenseNumber
>
> and the implementation of #hash should be
>
> hash
> ^licenseNumber hash
>
> The reason for choosing only licenseNumber for #= is that it is the
> only bit of state that is unique to each object. Two DriversLicenses
> can have the same name, or the same address and still be (fairly) not
> equal, but once the licenseNumber is the same, then we are talking
> about the same license.
>
> Regarding #hash, there is more latitude. The rule is that objects that
> are equal must have the same hash. If just hashing the license number
> was producing hashes that were causing too many collisions, you could
> mix in other variables with it, for example ^licenseNumber hash
> bitAnd: name hash. By and large, though, this will not be required
> unless you are storing many, many instances in sets or as dictionary
> keys.

Doesn't this violate the rule that equal objects must have the same
hash? Let's suppose there are two DriversLicense instances that have
the same licenseNumber (thus are equal by definition of #=) but each
has a different name. The modified hash method:

DriversLicense>>hash
     ^licenseNumber hash bitAnd: name hash

will produce different hash values for these two instances and thus
breaking the rule, no? Please correct me if I am wrong.

(On a side note: It seems to me that almost all existing #hash
implementations use #bitXor: as the preferred merging operation,
but I have never seen them to use #bitAnd:.)


Ladislav Lenart

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Anthony Lander-2
Apologies, you are correct. I was typing before the first coffee,  
which is always a bad idea.

Indeed the hash has to be more general than the equals comparison.  
Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one  
which results in the best hashed collection performance, which should  
be determined on a case-by-case basis.

   -Anthony

On 11-May-4, at 9:06 AM, Ladislav Lenart wrote:

> Hello,
>
>
> On 4.5.2011 14:15, Anthony Lander wrote:
>>
>> On 11-May-3, at 8:38 PM, intrader wrote:
>>
>>> On 05/03/2011 07:06 AM, [hidden email] wrote:
>>>> Using #deepCopy, I have trouble understanding why it the
>>>>       code fails.
>>>
>>> Thank you very much for the solution - I need to define '#=' and
>>> '#hash'
>>> in MyContact.
>>>
>>> Without examining the entire inheritance chain, how do we determine
>>> whether the inherited methods suffice?
>>
>> As a general rule, there will be one implementation of #= and #hash  
>> in
>> Object, and then one more in a subclass if the default behaviour is
>> insufficient. You would only see more implementations than that in
>> deep and somewhat complex hierarchies.
>>
>>> How do we determine when test first approach would be necessary?
>>
>> I'm not sure what you're asking. Can you clarify please?
>>
>>> Why does Object decide that identity equals is the appropriate
>>> implementation?
>>
>> The reason is that instances of Object have no state (ie no instance
>> variables). Each instance of Object (or a subclass that adds no
>> instance variables) is indistinguishable from another. The only way  
>> to
>> tell instances apart is to ask if you are talking about identically
>> the same objects or not. Likewise for #hash -- since there is no
>> state, the only thing you can base a hash on is object identity.
>>
>> As soon as you add state (instance variables) to a subclass, you have
>> an opportunity to override #= and #hash. The key to doing this
>> correctly is to determine which instance variables determine the
>> equality of two objects, and use only those. So for example, I could
>> implement DriversLicense that has instance variables name, address,
>> licenseNumber. To my mind, the implementation of #= should be
>>
>> = anObject
>> anObject class = DriversLicense ifFalse: [^false].
>> ^licenseNumber = anObject licenseNumber
>>
>> and the implementation of #hash should be
>>
>> hash
>> ^licenseNumber hash
>>
>> The reason for choosing only licenseNumber for #= is that it is the
>> only bit of state that is unique to each object. Two DriversLicenses
>> can have the same name, or the same address and still be (fairly) not
>> equal, but once the licenseNumber is the same, then we are talking
>> about the same license.
>>
>> Regarding #hash, there is more latitude. The rule is that objects  
>> that
>> are equal must have the same hash. If just hashing the license number
>> was producing hashes that were causing too many collisions, you could
>> mix in other variables with it, for example ^licenseNumber hash
>> bitAnd: name hash. By and large, though, this will not be required
>> unless you are storing many, many instances in sets or as dictionary
>> keys.
>
> Doesn't this violate the rule that equal objects must have the same
> hash? Let's suppose there are two DriversLicense instances that have
> the same licenseNumber (thus are equal by definition of #=) but each
> has a different name. The modified hash method:
>
> DriversLicense>>hash
>     ^licenseNumber hash bitAnd: name hash
>
> will produce different hash values for these two instances and thus
> breaking the rule, no? Please correct me if I am wrong.
>
> (On a side note: It seems to me that almost all existing #hash
> implementations use #bitXor: as the preferred merging operation,
> but I have never seen them to use #bitAnd:.)
>
>
> Ladislav Lenart
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Ladislav Lenart
Thank you for your explanation,

Ladislav Lenart


On 4.5.2011 15:20, Anthony Lander wrote:
> Apologies, you are correct. I was typing before the first coffee, which is always a bad idea.
>
> Indeed the hash has to be more general than the equals comparison. Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one which results in the best hashed collection performance, which
> should be determined on a case-by-case basis.

[snip]

>>> Regarding #hash, there is more latitude. The rule is that objects that
>>> are equal must have the same hash. If just hashing the license number
>>> was producing hashes that were causing too many collisions, you could
>>> mix in other variables with it, for example ^licenseNumber hash
>>> bitAnd: name hash. By and large, though, this will not be required
>>> unless you are storing many, many instances in sets or as dictionary
>>> keys.
>>
>> Doesn't this violate the rule that equal objects must have the same
>> hash? Let's suppose there are two DriversLicense instances that have
>> the same licenseNumber (thus are equal by definition of #=) but each
>> has a different name. The modified hash method:
>>
>> DriversLicense>>hash
>> ^licenseNumber hash bitAnd: name hash
>>
>> will produce different hash values for these two instances and thus
>> breaking the rule, no? Please correct me if I am wrong.
>>
>> (On a side note: It seems to me that almost all existing #hash
>> implementations use #bitXor: as the preferred merging operation,
>> but I have never seen them to use #bitAnd:.)
>>
>>
>> Ladislav Lenart

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Steven Kelly
In reply to this post by Anthony Lander-2
For strings, #bitAnd: can only ever increase the collisions of hashed
values: you'd be better off using just "licenseNumber hash" rather than
"licenseNumber hash bitAnd: name hash". This is true for any decent
subobject hashes, which spread their values over most of the positive
SmallInteger range. For these, simply use #bitXor:.

If you have two subobjects whose hash range is poor (less than 14 bits),
you can usefully shift the bits of one left before combining. E.g. if
you have integer screen co-ordinates both in the range 0..4095, their
hashes are the same as the actual values, so you could use "(x hash
bitShift: 12) bitAnd: y hash".

Steve

Anthony Lander wrote:

> Apologies, you are correct. I was typing before the first coffee,
> which is always a bad idea.
>
> Indeed the hash has to be more general than the equals comparison.
> Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one
> which results in the best hashed collection performance, which should
> be determined on a case-by-case basis.
>
>    -Anthony
>
> >> implement DriversLicense that has instance variables name, address,
> >> licenseNumber. To my mind, the implementation of #= should be
> >>
> >> = anObject
> >> anObject class = DriversLicense ifFalse: [^false].
> >> ^licenseNumber = anObject licenseNumber
> >>
> >> and the implementation of #hash should be
> >>
> >> hash
> >> ^licenseNumber hash
> >>
> >> The reason for choosing only licenseNumber for #= is that it is the
> >> only bit of state that is unique to each object. Two
DriversLicenses

> >> can have the same name, or the same address and still be (fairly)
> not
> >> equal, but once the licenseNumber is the same, then we are talking
> >> about the same license.
> >>
> >> Regarding #hash, there is more latitude. The rule is that objects
> >> that
> >> are equal must have the same hash. If just hashing the license
> number
> >> was producing hashes that were causing too many collisions, you
> could
> >> mix in other variables with it, for example ^licenseNumber hash
> >> bitAnd: name hash. By and large, though, this will not be required
> >> unless you are storing many, many instances in sets or as
dictionary

> >> keys.
> >
> > Doesn't this violate the rule that equal objects must have the same
> > hash? Let's suppose there are two DriversLicense instances that have
> > the same licenseNumber (thus are equal by definition of #=) but each
> > has a different name. The modified hash method:
> >
> > DriversLicense>>hash
> >     ^licenseNumber hash bitAnd: name hash
> >
> > will produce different hash values for these two instances and thus
> > breaking the rule, no? Please correct me if I am wrong.
> >
> > (On a side note: It seems to me that almost all existing #hash
> > implementations use #bitXor: as the preferred merging operation,
> > but I have never seen them to use #bitAnd:.)
> >
> >
> > Ladislav Lenart

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Ralf Propach
In reply to this post by Anthony Lander-2
Anthony Lander wrote:

> Apologies, you are correct. I was typing before the first coffee,  
> which is always a bad idea.
>
> Indeed the hash has to be more general than the equals comparison.  
> Regarding bitXor: vs bitAnd:, to my mind the correct choice is the one  
> which results in the best hashed collection performance, which should  
> be determined on a case-by-case basis.
>
>    -Anthony
>

For a good hashed collection performance, you will want each hash value to
have a similar probability. bitAnd: will give higher probability to hash values
with many zeros in their bit pattern, while bitOr: will give higher probability
to hash values with many ones in their bit pattern. Only bitXor: will result
in a uniform distribution (assuming that both values being xor'ed have a uniform distribution).
Therefore bitXor: is always better than bitAnd: and bitOr:.

Execute this workspace code:
and := Bag new.
or := Bag new.
xor := Bag new.

0 to: 7 do: [:i |
        0 to: 7 do: [:j |
                and add: (i bitAnd: j).
                or add: (i bitOr: j).
                xor add: (i bitXor: j)]].

When you inspect the three bags afterwards, you will notice that in the xor-bag all values
appear 8 times, in the and-bag 0 appears 27 times while 7 appears only once.
So the hash values 0,1,2, and 4 would appear very frequently, leading to more hash collisions.

Ralf


--
Ralf Propach, [hidden email]
Tel: +49 231 975 99 38   Fax: +49 231 975 99 20
Georg Heeg eK (Dortmund)
Handelsregister: Amtsgericht Dortmund  A 12812
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Reinout Heeck-2
In reply to this post by Steven Kelly
On 5/4/2011 3:41 PM, Steven Kelly wrote:
> If you have two subobjects whose hash range is poor (less than 14 bits),
> you can usefully shift the bits of one left before combining. E.g. if
> you have integer screen co-ordinates both in the range 0..4095, their
> hashes are the same as the actual values, so you could use "(x hash
> bitShift: 12) bitAnd: y hash".

But this still requires a lot of attention to get the bits usage 'right'
for your current SmallInteger range.

I suggest using #hashMultiply in an asymmetric way to avoid reinventing
the wheel again and again.
See for example (in 7.7.1)

Point>>hash

     ^x hash hashMultiply bitXor: y hash




R
-

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Issues 6 - Using #deepCopy, I have trouble understanding why it the, code fails.

Steven Kelly
Reinout Heeck wrote:
> On 5/4/2011 3:41 PM, Steven Kelly wrote:
> > If you have two subobjects whose hash range is poor (<14 bits),
> > you can usefully shift the bits of one left before combining.
>
> I suggest using #hashMultiply in an asymmetric way to avoid
reinventing
> the wheel again and again.
>
> Point>>hash
>      ^x hash hashMultiply bitXor: y hash

Nice - I hadn't looked into all the details of the 7.6 hash changes yet.
Don E. Knuth = done enough. To quote the 7.6 release notes:

For interest['s] sake, a tool, called Hash Analysis Tool, is available
from the public Store repository. To use it, load the bundle called Hash
Analysis Tool and then evaluate:
HashAnalysisToolUI open

Steve

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc