IdentitySet but using #hash rather than #identityHash ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Stéphane Ducasse
I will   :)
Thanks mariano

On Dec 22, 2011, at 9:11 PM, Mariano Martinez Peck wrote:

> So...the fix for  #pointsTo: and #instVarsInclude:  is in inbox:
>
> http://code.google.com/p/pharo/issues/detail?id=5118
>
> If anyone wants to take a look...
>
> Cheers
>
> On Tue, Dec 20, 2011 at 11:01 PM, Mariano Martinez Peck <[hidden email]> wrote:
>
>
> 2011/12/20 Levente Uzonyi <[hidden email]>
> On Tue, 20 Dec 2011, Mariano Martinez Peck wrote:
>
>
>  Thanks Levente. Well, now all our tests and benchmarks pass like a charm
> :)  We will soon include it in the "trunk" of Fuel. Will them be available
> in a repository for squeak? because if so, I can adapt ConfiguratioOfFuel
> so that in Squeak it loads another collections.
>
>
> The best would be to add the patch to Pharo, which fixes #pointsTo: and
> adds #instVarsIncludes:. Then change the code to use #instVarsInclude: in
> these collections. If that's all done, then this code should work in both
> Pharo and Squeak.
>
>
> Yes, I agree. But anyway, Fuel will load in previous versions of Pharo,
> where such fix won't be integrated.  So we will need to handle this.
>
> This can easily solved with a separate Metacello configuration.
>
>
> Yes, I will do that.
>
>  
>
>
>
>
>
>
> So...from the benchmarks I can say that in large graphs the improve in
> performance was 30% in serialization, and only use in one place the set
> and
> in another one the dictionary.
>
>
> Great, hopefully the new primitive will increase this even more. :)
>
>
>
> Hopefully. Did you answer Eliot?  he was asking what you needed...
>
> I did answer his private email about the primitive.
>
>
> Ahh ok, excellent :)
>  
>
>
> Ahh BTW, I found another small bug.. LargeIdentityDictionary sends
> #keyNotFound but it is not implemented. Just adding:
>
> errorKeyNotFound: aKey
>
>   KeyNotFound signalFor: aKey
>
> is enough.
>
> Go ahead. :)
>
>
> Done
>  
>
> Levente
>
>
> Cheers
>
>
>
> Levente
>
>
> Thanks!!!
>
>
>
>
> Levente
>
>
>  Cheers
>
>
>
>  Levente
>
>
>  On Sun, Dec 18, 2011 at 7:17 PM, Mariano Martinez Peck <
>
> [hidden email]> wrote:
>
>
>
>  Now I noticed that SmallIntegers storing is really slow
>
>
>
>  This is not entirely true. Every hash table can be made slow if
> you
>
> know
> how it's hash function works. Since in your example all numbers are
> congruent to 1 modulo 4096 and the size of the array is also 4096,
> therefore they will be mapped to the same slot (1). Try using
> random
> (or at
> least more realistic) numbers in your test. If this is a real
> problem,
> then
> it may be necessary to implement a new hash method, which uses the
> primitive for general objects and does something else with
> SmallIntegers
> and that method shoul be used from these large collections.
>
>
>  I uploaded a new mcz to the FuelExperiments repository, which is
>
> unaffected by the identity hash differences between Pharo and Squeak
> and
> works better with SmallIntegers. It includes both the dictionary and
> the
> set and (in theory) has slightly better performance for
> non-SmallInteger
> objects.
>
>
>  Hi Levente. First let me say I really appreaciate your effort in
> make
>
>  it
> work. It makes definitvly Fuel faster and it would make sense to
> integrate
> them also in Squeak and Pharo.
> Now, I found another bug. To reproduce it:
>
> | dict |
> dict := LargeIdentityDictionary new.
> dict at: 16941057 put: 1035.
>
> Now, it seems 16941057 largeIdentityHash answers 4096 and the comment
> of
> #largeIdentityHash  says ""Return an integer between 0 and 4095 based
> on
> all of my bits. Make sure that for nearby receivers the result will
> not
> be
> nearby with high probability.""
>
> The thing is that LargeIdentityHashedCollection permuteHash: hash + 1
> answers 4096, and hence, when in #at:put: you do:
>
> at: key put: value
>
>  | hash |
>  (keys at: (hash := key largeIdentityHash + 1))
>
> then you get the out of bounds.
>
> Modyfing
>
> initialize
>
>  | rng |
>  rng := Random seed: 664399324.
>  PermutationMap := (1 to: 4096) asArray shuffleBy: rng
>
>  to 4095 make some tests to pass, but there are some other failing.
>
> I will try to continue to see if I find the error.
>
> Thanks!
>
>
>  Levente
>
>
>
>
>
>  Levente
>
>
>
>  numbers :=  OrderedCollection new.
>
>  numbers add: 1.
>  numbers addAll: (1 to: 1 << 29 by: 1 << 14) asArray.
>  numbers addAll: ((1 to: 1 << 29 by: 1 << 14) asArray collect:
> [:each
> |
> each negated] ).
>
>  dict := LargeIdentityDictionary new.
>  [numbers do: [:each |
>    dict at: each put: dict size + 1.
>   ].
>  ] timeToRun
>  -> 12657
>
>
> whereas with IdentityDictionary it is 37.  I guess it could be
> related
> to
> #identityHash or #basicIdentityHash.  I tried in Squeak 4.3 and it
> is
> also
> also there.
> I will try to take a look into it during the next week.
>
> Thanks in advance
>
>
>
>  Nicolas
>
>
>
>  Thanks for any idea.
>
>
>
>
> On Sat, Dec 17, 2011 at 1:50 PM, Stéphane Ducasse
> <[hidden email]> wrote:
>
>
>
>  On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:
>
>  Cool. One more thing: in Squeak the method using primitive
> 132
>
>
>  directly
>
>
>
>  was renamed to #instVarsInclude:, so now #pointsTo: works as
>
>
>
>
>  expected. If
>
>
>
>
>  this was also added to Pharo, then the #pointsTo: sends
> should
>
> be
>
>
>
>  changed to
>
>
>
>
>  #instVarsInclude:, otherwise Array can be reported as
> included
>
> even
>
>
>
>  if it
>
>
>
>
>  wasn't added.
>
>
>
>  I'll upload my LargeIdentityDictionary implementation to the
> same
>
>
>
>  place
>
>
>
>  this evening, since it's still 2-3 factor faster than other
>
>
>
>
>  solutionts and
>
>
>
>
>  there seem to be demand for it.
>
>
>
>
>
>  Levente
>
>
>  "in Squeak the method using primitive 132 directly was
> renamed
> to
>
>  #instVarsInclude:, so now #pointsTo: works as expected."
>
>
>  I do not get the following. Indeed pointTo: looks like
>
>  instVarsInclude:
> now I do not understand the rest of your paragraph.
>
> Stef
>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.********com<http://marianopeck.**
> wordpress.com <http://marianopeck.wordpress.****com<
> http://marianopeck.**wordpress.com<http://marianopeck.wordpress.com>
>
>
>
>
>
>
>
>
>  --
>
> Mariano
> http://marianopeck.wordpress.********com <
> http://marianopeck.wordpress.******
> com <http://marianopeck.wordpress.****com<http://marianopeck.**
> wordpress.com <http://marianopeck.wordpress.com>>
>
>
>
>
>
>
>  --
> Mariano
> http://marianopeck.wordpress.******com <
> http://marianopeck.wordpress.****
> com <http://marianopeck.wordpress.**com<http://marianopeck.wordpress.com>
>
>
>
>
>
>  --
> Mariano
> http://marianopeck.wordpress.******com <http://marianopeck.wordpress.
>
> ****
> com <http://marianopeck.wordpress.**com<http://marianopeck.wordpress.com>
>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.****com <http://marianopeck.wordpress.**
> com <http://marianopeck.wordpress.com>>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.**com <http://marianopeck.wordpress.com>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>


Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Andres Valloud-4
In reply to this post by Mariano Martinez Peck
I betcha the performance problem is because the collision "trains" end
up hitting each other.  In this case you likely won't be able to avoid
collisions.  So, instead of the regular hashed collections that use open
addressing linear probing, use a hashed collection that stores objects
in hash buckets.

It would be great if hashed collections did not inline a specific
storage strategy, but alas... :(.

On 12/12/11 0:58 , Mariano Martinez Peck wrote:

> Hi guys. I hope this is not a very stupid question. Background: in Fuel
> we have a IdentityDictionary where we put each of the objects we find
> while traversing the graph. We need to use IdentitySet because we cannot
> have repetitions (and to avoid loops) so we NEED to use #==. In such
> dictionary we put ALL objects of the graph, so it can be very big. Since
> IdentitySet uses #identityHash, it means it will be using those ONLY 12
> bits in the object header. It means that we have 2^12 = 4096  different
> values.
>
> Question:  having explained the previous, I wanted to be able to use
> #hash rather than #identityHash since several classes implement #hash
> and hence I thought that using #hash I could have less colisions and
> hence a better performance. I tried to make a subclass of IdentitySet
> that uses #hash rather than #identityHash but my image freezes. I also
> tried something like:
>
> set := PluggableSet new.
>      set hashBlock: [ :elem | elem hash ].
>      set equalBlock: [ :a :b | a == b ].
>
> But it doesn't work either. I works with simple tests in a workspace but
> when I run the full tests of Fuel, it enters in a loop in the method
> #hash of Collection..
>
> Anyway, my question is, should that work? if not, what is the exact reason?
>
> Thanks in advance,
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

1234