Branch: refs/heads/5.0
Home: https://github.com/pharo-project/pharo-core Commit: 974630dc178b6b59156dac23623fe2dc9f98446c https://github.com/pharo-project/pharo-core/commit/974630dc178b6b59156dac23623fe2dc9f98446c Author: Jenkins Build Server <[hidden email]> Date: 2016-01-10 (Sun, 10 Jan 2016) Changed paths: A Morphic-Base.package/LayoutFrame.class/class/costly instance creation/fractions_.st A Morphic-Base.package/LayoutFrame.class/class/costly instance creation/fractions_offsets_.st A Morphic-Base.package/LayoutFrame.class/class/costly instance creation/offsets_.st R Morphic-Base.package/LayoutFrame.class/class/instance creation/fractions_.st R Morphic-Base.package/LayoutFrame.class/class/instance creation/fractions_offsets_.st R Morphic-Base.package/LayoutFrame.class/class/instance creation/offsets_.st M Morphic-Base.package/LayoutFrame.class/instance/accessing/bottomFraction.st A Morphic-Base.package/LayoutFrame.class/instance/comparing/=.st A Morphic-Base.package/LayoutFrame.class/instance/comparing/hash.st A Morphic-Tests.package/LayoutFrameTest.class/instance/tests - conversion/testIdentity.st M Morphic-Tests.package/PaginatedMorphTreeMorphTests.class/instance/tests/testPager.st M Morphic-Widgets-Tree.package/PaginatedMorphTreeModel.class/class/examples/example.st R ScriptLoader50.package/ScriptLoader.class/instance/pharo - scripts/script50519.st A ScriptLoader50.package/ScriptLoader.class/instance/pharo - scripts/script50520.st R ScriptLoader50.package/ScriptLoader.class/instance/pharo - updates/update50519.st A ScriptLoader50.package/ScriptLoader.class/instance/pharo - updates/update50520.st M ScriptLoader50.package/ScriptLoader.class/instance/public/commentForCurrentUpdate.st Log Message: ----------- 50520 17361 Better LayoutFrame class comments and class method comments are needed https://pharo.fogbugz.com/f/cases/17361 17363 (LayoutFrame fractions: (0 @ 0 corner: 1 @ 1)) = LayoutFrame identity should be true and it is not. https://pharo.fogbugz.com/f/cases/17363 17364 PaginateMorph uses Wrong LayoutFrame API https://pharo.fogbugz.com/f/cases/17364 http://files.pharo.org/image/50/50520.zip |
I happened to look at the new code for LayoutFrame>>#hash
hash ^self species hash bitXor: (self leftFraction bitXor: (self leftOffset bitXor: (self topFraction bitXor: (self topOffset bitXor: (self rightFraction bitXor: (self rightOffset bitXor: (self bottomFraction bitXor: self bottomOffset))))))) This is a terrible, terrible way to compute a hash. 0 xor 0 = 1 xor 1 = 2 xor 2 = etc. NEVER compute hashes with xor, that's wrong, and it causes horrible, hard to debug, performance bugs down the road. SequenceableCollection>>#hash looks more like a correct way of doing it. I do not know what is the correct, efficient way to reuse this logic in LayoutFrame>>#hash. By the way, how do you guys do code reviews for code integrated into the core? |
Since it's not obvious why xor-ing is a bad way to produce hash, here is a link that gives some background
https://en.wikipedia.org/wiki/Hash_function#Uniformity > On 11 Jan 2016, at 10:01, David Allouche <[hidden email]> wrote: > > I happened to look at the new code for LayoutFrame>>#hash > > hash > ^self species hash bitXor: > (self leftFraction bitXor: > (self leftOffset bitXor: > (self topFraction bitXor: > (self topOffset bitXor: > (self rightFraction bitXor: > (self rightOffset bitXor: > (self bottomFraction bitXor: > self bottomOffset))))))) > > This is a terrible, terrible way to compute a hash. > > 0 xor 0 = 1 xor 1 = 2 xor 2 = etc. > > NEVER compute hashes with xor, that's wrong, and it causes horrible, hard to debug, performance bugs down the road. > > SequenceableCollection>>#hash looks more like a correct way of doing it. I do not know what is the correct, efficient way to reuse this logic in LayoutFrame>>#hash. > > By the way, how do you guys do code reviews for code integrated into the core? > |
In reply to this post by David Allouche
David,
> On 11 Jan 2016, at 10:01, David Allouche <[hidden email]> wrote: > > I happened to look at the new code for LayoutFrame>>#hash > > hash > ^self species hash bitXor: > (self leftFraction bitXor: > (self leftOffset bitXor: > (self topFraction bitXor: > (self topOffset bitXor: > (self rightFraction bitXor: > (self rightOffset bitXor: > (self bottomFraction bitXor: > self bottomOffset))))))) > > This is a terrible, terrible way to compute a hash. > > 0 xor 0 = 1 xor 1 = 2 xor 2 = etc. > > NEVER compute hashes with xor, that's wrong, and it causes horrible, hard to debug, performance bugs down the road. Yeah, that doesn't look very good ;-) > SequenceableCollection>>#hash looks more like a correct way of doing it. I do not know what is the correct, efficient way to reuse this logic in LayoutFrame>>#hash. There is actually a 2 part book just about hashing in Smalltalk: http://www.lulu.com/shop/andres-valloud/hashing-in-smalltalk-theory-and-practice/paperback/product-3788892.html It is said to be very good, I sadly have not yet read it ;-) > By the way, how do you guys do code reviews for code integrated into the core? 'Pharo is yours' which means: if you encounter something odd, you are encouraged to improve it. Maybe first discuss it on the ML. If necessary create an issue. Better provide a fix. Sven |
Yep, a must read if you want to improve the hash methods. 2016-01-11 10:09 GMT+01:00 Sven Van Caekenberghe <[hidden email]>: David, |
In reply to this post by David Allouche
You see K. Beck in his book smalltalk best pratices promotes this form.
So can you propose a better solution so that we learn? Stef Le 11/1/16 10:09, David Allouche a écrit : > Since it's not obvious why xor-ing is a bad way to produce hash, here is a link that gives some background > > https://en.wikipedia.org/wiki/Hash_function#Uniformity > >> On 11 Jan 2016, at 10:01, David Allouche <[hidden email]> wrote: >> >> I happened to look at the new code for LayoutFrame>>#hash >> >> hash >> ^self species hash bitXor: >> (self leftFraction bitXor: >> (self leftOffset bitXor: >> (self topFraction bitXor: >> (self topOffset bitXor: >> (self rightFraction bitXor: >> (self rightOffset bitXor: >> (self bottomFraction bitXor: >> self bottomOffset))))))) >> >> This is a terrible, terrible way to compute a hash. >> >> 0 xor 0 = 1 xor 1 = 2 xor 2 = etc. >> >> NEVER compute hashes with xor, that's wrong, and it causes horrible, hard to debug, performance bugs down the road. >> >> SequenceableCollection>>#hash looks more like a correct way of doing it. I do not know what is the correct, efficient way to reuse this logic in LayoutFrame>>#hash. >> >> By the way, how do you guys do code reviews for code integrated into the core? >> > > |
In reply to this post by Nicolas Cellier
Ok but be concrete. What will be the solution for this particular
case?
Stef Le 11/1/16 22:30, Nicolas Cellier a
écrit :
|
In reply to this post by David Allouche
Hi david
I rad the wikipedia page you mention and this looks a bit obvious to me. I knew this. now it does not really help fixing the problem you mention. Stef Le 11/1/16 10:09, David Allouche a écrit : > Since it's not obvious why xor-ing is a bad way to produce hash, here is a link that gives some background > > https://en.wikipedia.org/wiki/Hash_function#Uniformity > >> On 11 Jan 2016, at 10:01, David Allouche <[hidden email]> wrote: >> >> I happened to look at the new code for LayoutFrame>>#hash >> >> hash >> ^self species hash bitXor: >> (self leftFraction bitXor: >> (self leftOffset bitXor: >> (self topFraction bitXor: >> (self topOffset bitXor: >> (self rightFraction bitXor: >> (self rightOffset bitXor: >> (self bottomFraction bitXor: >> self bottomOffset))))))) >> >> This is a terrible, terrible way to compute a hash. >> >> 0 xor 0 = 1 xor 1 = 2 xor 2 = etc. >> >> NEVER compute hashes with xor, that's wrong, and it causes horrible, hard to debug, performance bugs down the road. >> >> SequenceableCollection>>#hash looks more like a correct way of doing it. I do not know what is the correct, efficient way to reuse this logic in LayoutFrame>>#hash. >> >> By the way, how do you guys do code reviews for code integrated into the core? >> > > |
On 12 Jan 2016, at 20:59, stepharo <[hidden email]> wrote: Since you asked for it… The #hash method is used in Pharo to separate objects in bucket-based data structures. It must have one required property which is required data integrity:
It should have two desirable properties that are required to support the algorithmic assumptions of bucket-based data structures:
How these two desirable properties can be implemented depend on the actual distribution of the value used to compute the hash. An object can safely use XOR hashing if its instance variables are all different objects that already provide a hash method with these two properties. For example, an object whose behaviour is defined by several of singleton delegates of different classes that do not override Object>>#hash. But if the instance variables are not guaranteed to have those properties, it is necessary to more thoroughly "mix the bits". For example: SequenceableCollection>>#hash. hash | hash | hash := self species hash. 1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply]. ^hash The mixing of the bits is done by the combination of addition and hashMultiply. The value of "species hash" does not need to be processed by hashMultiply, because it is probably computed by ProtoObject>>identityHash, and that already provides C. LayoutFrame is a particularly good example of a data structure that should not use XOR hashing. Its instance variables provide none of the required properties: SmallInteger>>#hash is just ^self. In addition, common LayoutFrame instances tend to use pairs of identical values in their instance variables. Like 0@0 corner: 1@1, or Margin fromNumber: 10. A good hash function for LayoutFrame needs to:
Now, we can assume that common values for the instance variables will be instances of SmallInteger, Float, or Fraction. By the way, you can see in Fraction>>#hash, that a comment mentions the assumption that the fraction is already reduced, that is what makes it acceptable to use bitXor. The hash of SmallInteger does provide A and B, but not C. The hash of Float is harder to understand, but tests show that it provide distinct values for 0.5, 0.25 and 0.125. So it hopefully provides B for the range of values of interest to LayoutFrame. Another class that has similar hashing constraints to LayoutFrame is Point. Its hash method is: hash "Hash is reimplemented because = is implemented." ^(x hash hashMultiply + y hash) hashMultiply It does not include species in the computation, which is less than ideal because it favours collisions with objects of different species that have a similar content and a the same #hash algorithm. Finally, it is probably not necessary or useful to use the accessors to get to the instance variables. So a good implementation would be this (untested, there might be a typo or two) hash | hash | hash := self species hash hash := (hash + leftFraction hash) hashMultiply hash := (hash + leftOffset hash) hashMultiply hash := (hash + topFraction hash) hashMultiply hash := (hash + topOffset hash) hashMultiply hash := (hash + rightFraction hash) hashMultiply hash := (hash + rightOffset hash) hashMultiply hash := (hash + bottomFraction hash) hashMultiply hash := (hash + bottomOffset hash) hashMultiply ^ hash I am sure you could have figured that by yourself, by thinking about how hash values are used and by looking at the code of kernel classes, instead of getting offended and turning all defensive. I hope we can both learn from this episode. I do not enjoy antagonising people, but I will not write lengthy messages like this one every time someone questions my thinking. Have a nice day. Le 11/1/16 10:09, David Allouche a écrit :Since it's not obvious why xor-ing is a bad way to produce hash, here is a link that gives some background |
Name:
SLICE-Issue-17363-LayoutFrame-fractions-0--0-corner-1--1--LayoutFrame-identity-should-be-true-and-it-is-not-StephanEggermont.2 Author: StephanEggermont Time: 13 January 2016, 6:06:01.571074 pm UUID: 9a5e9ae3-1357-4bfb-921a-3762d57402fd Ancestors: Dependencies: Morphic-Base-StephanEggermont.529 Better hash both in distribution and in working with Float coordinates hash ^(((((((( self species hash + leftFraction hash) hashMultiply + leftOffset hash) hashMultiply + topFraction hash) hashMultiply + topOffset hash) hashMultiply + rightFraction hash) hashMultiply + rightOffset hash) hashMultiply + bottomFraction hash) hashMultiply + bottomOffset hash) hashMultiply |
> On 13 Jan 2016, at 18:09, Stephan Eggermont <[hidden email]> wrote: > > Name: SLICE-Issue-17363-LayoutFrame-fractions-0--0-corner-1--1--LayoutFrame-identity-should-be-true-and-it-is-not-StephanEggermont.2 > Author: StephanEggermont > Time: 13 January 2016, 6:06:01.571074 pm > UUID: 9a5e9ae3-1357-4bfb-921a-3762d57402fd > Ancestors: > Dependencies: Morphic-Base-StephanEggermont.529 > > Better hash both in distribution and in working with Float coordinates > > > > hash > ^(((((((( self species hash > + leftFraction hash) hashMultiply > + leftOffset hash) hashMultiply > + topFraction hash) hashMultiply > + topOffset hash) hashMultiply > + rightFraction hash) hashMultiply > + rightOffset hash) hashMultiply > + bottomFraction hash) hashMultiply > + bottomOffset hash) hashMultiply Thank you for integrating the fix. I just realised I forgot to put in "." expression separators in my code. Why did you use a single expression instead of using a temporary variable? I find it harder to understand. |
In reply to this post by David Allouche
2016-01-13 17:22 GMT+01:00 David Allouche <[hidden email]>:
Note that Squeak has reduced the number of collisions for floats by not erasing the most significant bits of each word. I've tested the squeak implementation with Andres Valloud tool in VW, it does not perform so bad for the sets I tested in term of collisions... Float>>hash "Hash is reimplemented because = is implemented. Both words of the float are used. (The bitShift:'s ensure that the intermediate results do not become a large integer.) Care is taken to answer same hash as an equal Integer." (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated hash]. ^ ((self basicAt: 1) bitShift: -4) + ((self basicAt: 2) bitShift: -4) I also changed Fraction to use a hashMultiply and made a more explicit isAnExactFloat test. Fraction>>hash "Hash is reimplemented because = is implemented." "Care is taken that a Fraction equal to a Float also has an equal hash" self isAnExactFloat ifTrue: [^self asExactFloat hash]. "Else, I cannot be exactly equal to a Float, use own hash algorithm." ^numerator hash hashMultiply bitXor: denominator hash Fraction>>isAnExactFloat "Answer true if this Fraction can be converted exactly to a Float" ^ denominator isPowerOfTwo and: ["I have a reasonable significand: not too big" numerator highBitOfMagnitude <= Float precision and: ["I have a reasonable exponent: not too small" Float emin + denominator highBitOfMagnitude <= Float precision]] Fraction>>asExactFloat "When we know that this Fraction is an exact Float, this conversion is much faster than asFloat." ^numerator asFloat timesTwoPower: 1 - denominator highBit Not that Fraction should allways be reduced (this is an invariant of the class, I hope well defined in class comment). Integer hash is not that good because the risk of occupying consecutive buckets is high... I wonder if returning ^self hashMultiply would not arrange things here and reduce the requirements for spreading hashMultiply everywhere else? Nicolas
|
The risk does not really lie in occupying consecutive buckets. The risk is rather that integers used as keys could have a pattern that cause bucket collisions. For example, if all key are multiple of 100, and the bucket count is 2, 4, 5, 10, 20, 25 or 50. It would make more sense to solve this problem by using multiplyHash in the bucket data structure classes, instead of having to pay this computation cost in SmallInteger. Also, using it in SmallInteger would not dispense from using it in the #hash method of classes that use SmallInteger, because it matters that each component of the hash is processed a different number of times by multiplyHash. |
In reply to this post by David Allouche
Thanks for your post it was really nice.
If I asked this is that I could not. And I read the book on hash numbers. I do not know where you see me offended and defensive. I asked because the wikipedia page was giving no information that I could use. So please do not read between the lines else tell me I can also ignore your emails. Stef
|
In reply to this post by David Allouche
Hi david
I would like to turn your post into a chapter for a forthcoming book so that people can read and get the idea. I did not think about the same object. Is it ok for you? Stef PS: May be this was obviosu for you that we should all get this out of your previous mail but it was not my case and not after 8 hours on intense work. Le 13/1/16 17:22, David Allouche a
écrit :
|
In reply to this post by David Allouche
On 13-01-16 18:38, David Allouche wrote:
> Why did you use a single expression instead of using a temporary variable? I find it harder to understand. I did find it very slightly easier to understand. It is not a strong opinion however. The ideomatic smalltalk would be ^#(leftFraction leftOffset topFraction topOffset rightFraction rightOffset bottomFraction bottomOffset) inject: self species hash into: [:hashValue :selector | (hashValue + self perform: selector) hashMultiply ] but that would be slower. I would be tempted to add a binary message Integer>>+** aNumber ^(aNumber +self hash) hashMultiply (I'd prefer +#*, but that doesn't parse) hash ^self species hash +** leftFraction +** leftOffset +** topFraction +** topOffset +** rightFraction +** rightOffset +** bottomFraction +** bottomOffset Stephan |
In reply to this post by Nicolas Cellier
Nicolas
two questions: - should we integrate your float changes in Pharo? - I would like to get a chapter explaining the points raised by david, would you like to review it? Stef Le 13/1/16 19:31, Nicolas Cellier a
écrit :
|
In reply to this post by David Allouche
2016-01-13 19:55 GMT+01:00 David Allouche <[hidden email]>:
Ah yes, I thought the same.
Agree again, bad suggestion from my side, because using + or bitXor: without breaking symmetry with hashMultiply would lead to bad distribution of hashes... Consider I posted too fast :) BTW a 64 bits spur could have a larger hash pattern (small integers are 61 bits wide sign included, and identityHash is longer too, so there's no reason to stick to 28 bits). |
In reply to this post by stepharo
2016-01-13 20:34 GMT+01:00 stepharo <[hidden email]>:
1) yes, that's low hanging fruits if you have a good soul to review it. while at reviewing the other hashes... 2) maybe
|
In reply to this post by stepharo
> On 13 Jan 2016, at 20:32, stepharo <[hidden email]> wrote: > > Hi david > > I would like to turn your post into a chapter for a forthcoming book so that people can read and get the idea. Yes I would be glad :-) Please just remember to credit the author. Is there a recommended license for Pharo documentation? If not, I am happy to release that text under the MIT License. I would also like to review your derived work. But that is not a requirement. > I did not think about the same object. Is it ok for you? > > Stef > > PS: May be this was obviosu for you that we should all get this out of your previous mail but it was not my case > and not after 8 hours on intense work. Sorry, I do not understand what you are saying here. I do not want to read between the lines, because I would probably misunderstand your intention. Regards. |
Free forum by Nabble | Edit this page |