HashTable vs Dictionary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Perfect hash [was: Re: HashTable vs Dictionary]

Andres Valloud-4
Igor et al,

> Let me simplify the goal of perfect hashing:
> - suppose you having N unique integer numbers, without strictly
> specified range (any number could lie in range 0..infinity).
> - now we need to find a function such that, for each element 'x' in
> the above finite set, following conditions met:
>
> 0 <= f(x) < k*N
>
> and for each pair of two different elements x , y in original set
>   f(x) <> f(y)
>
> and k < infinity
>
> As you can see , this function maps an arbitrary integer values
> without specific range, to an
> integer values in range [0..k*N], (k>=1).
> Given that we need to deal with a limited number of elemens for each
> particular set of numbers, such function can be found.
> We just need to search for an optimal (minimal) k, which is < 2 and in
> perfect case == 1.
>
>  

I thought I'd point out a subtle potential problem.  Assume N = 1000,
f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.  
Clearly, f(x) is a perfect hash function as defined above.  However, if
the size of the table used by the set is a multiple of 7, performance
will be terrible.

By the way, you may want to check out the Hash Analysis Tool I wrote for
VisualWorks.  It's in the public Store repository (bundle name "Hash
Analysis Tool", there's also an extensions bundle with more hash
functions and data sets).  My blog has a (somewhat passable excuse for
a) manual here:
http://blogten.blogspot.com/2008/02/hash-analysis-tool-manual.html.

Andres.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Perfect hash [was: Re: HashTable vs Dictionary]

Igor Stasenko
2009/10/22 Andres Valloud <[hidden email]>:

> Igor et al,
>
>> Let me simplify the goal of perfect hashing:
>> - suppose you having N unique integer numbers, without strictly
>> specified range (any number could lie in range 0..infinity).
>> - now we need to find a function such that, for each element 'x' in
>> the above finite set, following conditions met:
>>
>> 0 <= f(x) < k*N
>>
>> and for each pair of two different elements x , y in original set
>>   f(x) <> f(y)
>>
>> and k < infinity
>>
>> As you can see , this function maps an arbitrary integer values
>> without specific range, to an
>> integer values in range [0..k*N], (k>=1).
>> Given that we need to deal with a limited number of elemens for each
>> particular set of numbers, such function can be found.
>> We just need to search for an optimal (minimal) k, which is < 2 and in
>> perfect case == 1.
>>
>>
>
> I thought I'd point out a subtle potential problem.  Assume N = 1000,
> f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
> 1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.
> Clearly, f(x) is a perfect hash function as defined above.  However, if
> the size of the table used by the set is a multiple of 7, performance
> will be terrible.
>

err.. i'm not sure i understood,
if you having 1000 objects, which answering an unique hash values,
which can be represented by:
 element hash == 7i

and as you stating that
1 <= 7i <= 1000
, then there can't be 1000 unique hash values in this range for 1000 objects.
Obviously you can have only 1000/7 unique hash values meeting such criteria.
But if you read carefully, i'm not adressing the problem of
duplicating hash values in
input set, and presuming they are all unique.

I'm not addressing the quality of hash function for object(s), i'm just
illustrating a hash function which could be employed by set(s), which
could work perfectly,
once all objects having unique hash values.

Surely, despite how good the hash function used by set, if hashes of
its elements clashing,
you will have a performance degradation.

> By the way, you may want to check out the Hash Analysis Tool I wrote for
> VisualWorks.  It's in the public Store repository (bundle name "Hash
> Analysis Tool", there's also an extensions bundle with more hash
> functions and data sets).  My blog has a (somewhat passable excuse for
> a) manual here:
> http://blogten.blogspot.com/2008/02/hash-analysis-tool-manual.html.
>
> Andres.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Stéphane Ducasse
In reply to this post by Lukas Renggli
Yes I know
Now in pharo if I do not change the hash of my objects and I get a lot  
of them in a dictionary
I do get bad hash: at least this is what I understood from the limited  
number of hash bits.
So this means that by default we have bad performance. no?

Stef


>> yyyyyyyyeeeeeeeeeeesssssssssssssssssssssssss
>> and get linearrrrrrrrrrrrrrrrrrrrrly boring and slow.
>
> Yes, but only if your objects don't implement #hash properly. HashMap
> trades space and time to get O(1) behavior IF you have lots of
> elements with bad hash values. If the hash values are good (which is
> the case most of the time) or if you have not that many elements
> (which is the case most of the time), it is a waste of time and space
> though.
>
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Lukas Renggli
> So this means that by default we have bad performance. no?

No.

In a fresh Pharo Web image less than 6% of the keys in Dictionaries
and less than 10% of the values in Sets have a weak-hashes. Moreover
the largest set with weak-hash values has 516 elements (on average
only 1.8 elements), the largest dictionary with weak-hash keys has
1002 elements (on average only 4.3 elements). Using HashTable in such
a situation would introduce a major speed penalty and waste a lot of
memory.

It would be cool if the Set and the Dictionary would choose their
implementation strategy automatically depending on the use-case. In a
standard Pharo image however that would just be the current
implementation. There are simply no instances in the image where it
would be worthwhile (large amount of data with bad hash) to use a
HashMap.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

johnmci
In the far past Visual Age would become the implementation logic for  
Sets depending
on how big the set was, so for example with 10 elements it would be  
linear list.  Of course
it's been 14 years, perhaps I'm remembering it wrong.

On 2009-10-22, at 6:46 AM, Lukas Renggli wrote:

>> So this means that by default we have bad performance. no?
>
> No.
>
> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
> and less than 10% of the values in Sets have a weak-hashes. Moreover
> the largest set with weak-hash values has 516 elements (on average
> only 1.8 elements), the largest dictionary with weak-hash keys has
> 1002 elements (on average only 4.3 elements). Using HashTable in such
> a situation would introduce a major speed penalty and waste a lot of
> memory.
>
> It would be cool if the Set and the Dictionary would choose their
> implementation strategy automatically depending on the use-case. In a
> standard Pharo image however that would just be the current
> implementation. There are simply no instances in the image where it
> would be worthwhile (large amount of data with bad hash) to use a
> HashMap.
>
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Perfect hash [was: Re: HashTable vs Dictionary]

Martin McClure-2
In reply to this post by Andres Valloud-4
Andres Valloud wrote:
>
> I thought I'd point out a subtle potential problem.  Assume N = 1000,
> f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
> 1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.  
> Clearly, f(x) is a perfect hash function as defined above.  However, if
> the size of the table used by the set is a multiple of 7, performance
> will be terrible.
>

Huh? If your hash function produces integers between 0 and 999, your
table size must be at least 1000, otherwise you store past the end of
your table.

Regards,

-Martin

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Andres Valloud-4
In reply to this post by johnmci
The nice way of doing this would be to have a layer of indirection so
that the storage strategy decides how to use the hashed storage.  
However, compared to the current Set/Dictionary implementation, that may
be a bit slower.  It would not be a terrific idea to have multiple
subclasses of Set/Dictionary depending on the storage strategy, as that
would result in an explosion of classes.

John M McIntosh wrote:

> In the far past Visual Age would become the implementation logic for  
> Sets depending
> on how big the set was, so for example with 10 elements it would be  
> linear list.  Of course
> it's been 14 years, perhaps I'm remembering it wrong.
>
> On 2009-10-22, at 6:46 AM, Lukas Renggli wrote:
>
>  
>>> So this means that by default we have bad performance. no?
>>>      
>> No.
>>
>> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
>> and less than 10% of the values in Sets have a weak-hashes. Moreover
>> the largest set with weak-hash values has 516 elements (on average
>> only 1.8 elements), the largest dictionary with weak-hash keys has
>> 1002 elements (on average only 4.3 elements). Using HashTable in such
>> a situation would introduce a major speed penalty and waste a lot of
>> memory.
>>
>> It would be cool if the Set and the Dictionary would choose their
>> implementation strategy automatically depending on the use-case. In a
>> standard Pharo image however that would just be the current
>> implementation. There are simply no instances in the image where it
>> would be worthwhile (large amount of data with bad hash) to use a
>> HashMap.
>>
>> Lukas
>>
>> --
>> Lukas Renggli
>> http://www.lukas-renggli.ch
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>    
>
> --
> =
> =
> =
> ========================================================================
> John M. McIntosh <[hidden email]>   Twitter:  
> squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> =
> =
> =
> ========================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>  

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Perfect hash [was: Re: HashTable vs Dictionary]

Andres Valloud-4
In reply to this post by Martin McClure-2
My point would be that either:

* the hashed storage is always of size 1000 (or greater), so if you
store a few objects in the hashed collection it wastes a lot of space;

* the hashed storage changes size to match the objects stored in it, but
looking at f(x) mod p risks reintroducing the collisions you wanted to
avoid in the first place.

So, to not waste space in this particular scenario, then either N should
be small or basically all N objects should be stored in the hashed
collection.  If N is small, then why not set f(x) to produce values
between 1 and N, and then use an array?  And if N is large and the
utilization of the hashed collection is close to 100% of N, then why not
also use an array in that case?  In the end, really nice perfect hash
functions point the finger at alternatives that may obviate using hashed
collections.  Of course, your mileage may vary...

Andres.


Martin McClure wrote:

> Andres Valloud wrote:
>  
>> I thought I'd point out a subtle potential problem.  Assume N = 1000,
>> f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
>> 1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.  
>> Clearly, f(x) is a perfect hash function as defined above.  However, if
>> the size of the table used by the set is a multiple of 7, performance
>> will be terrible.
>>
>>    
>
> Huh? If your hash function produces integers between 0 and 999, your
> table size must be at least 1000, otherwise you store past the end of
> your table.
>
> Regards,
>
> -Martin
>
>  

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Igor Stasenko
In reply to this post by Andres Valloud-4
2009/10/22 Andres Valloud <[hidden email]>:
> The nice way of doing this would be to have a layer of indirection so
> that the storage strategy decides how to use the hashed storage.
> However, compared to the current Set/Dictionary implementation, that may
> be a bit slower.  It would not be a terrific idea to have multiple
> subclasses of Set/Dictionary depending on the storage strategy, as that
> would result in an explosion of classes.
>
Not necessary it should lead to explosion.

We already having a storage in Set - array slot.
So, its easy to imagine that this slot may change the class which
implements the storage,
while from outside you still seeing a Set.

> John M McIntosh wrote:
>> In the far past Visual Age would become the implementation logic for
>> Sets depending
>> on how big the set was, so for example with 10 elements it would be
>> linear list.  Of course
>> it's been 14 years, perhaps I'm remembering it wrong.
>>
>> On 2009-10-22, at 6:46 AM, Lukas Renggli wrote:
>>
>>
>


--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Perfect hash [was: Re: HashTable vs Dictionary]

Igor Stasenko
In reply to this post by Andres Valloud-4
2009/10/22 Andres Valloud <[hidden email]>:
> My point would be that either:
>
> * the hashed storage is always of size 1000 (or greater), so if you
> store a few objects in the hashed collection it wastes a lot of space;
>
> * the hashed storage changes size to match the objects stored in it, but
> looking at f(x) mod p risks reintroducing the collisions you wanted to
> avoid in the first place.
>

a different hash function picked if size of input set is changed.
So, there is no chance to have storage of size 1000 , holding few
elements (unless your implementation really suck :).
That's why you can always be sure that your storage is good enough to
store all of the elements and
with minimal possible chances of collision.
The only problem here, as i noted, that you have to rehash the storage
each time you modifying
the input set, because you changing the hash function which is
'optimal' for new input set.
And that's why this approach is bad for sets which changing often.


> So, to not waste space in this particular scenario, then either N should
> be small or basically all N objects should be stored in the hashed
> collection.  If N is small, then why not set f(x) to produce values
> between 1 and N, and then use an array?  And if N is large and the
> utilization of the hashed collection is close to 100% of N, then why not
> also use an array in that case?  In the end, really nice perfect hash
> functions point the finger at alternatives that may obviate using hashed
> collections.  Of course, your mileage may vary...
>
perfect hash is just about that: find a function which having minimal
, or no hash collisions and
with storage size very close to the number of elements in input set.

> Andres.
>
>
> Martin McClure wrote:
>> Andres Valloud wrote:
>>
>>> I thought I'd point out a subtle potential problem.  Assume N = 1000,
>>> f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
>>> 1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.
>>> Clearly, f(x) is a perfect hash function as defined above.  However, if
>>> the size of the table used by the set is a multiple of 7, performance
>>> will be terrible.
>>>
>>>
>>
>> Huh? If your hash function produces integers between 0 and 999, your
>> table size must be at least 1000, otherwise you store past the end of
>> your table.
>>
>> Regards,
>>
>> -Martin
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Stéphane Ducasse
In reply to this post by Lukas Renggli
Ok I see. I would be curious to see if SystemDictionary does not  
degrade once we load
Moose, Mondrian, Glamour....

>> So this means that by default we have bad performance. no?
>
> No.
>
> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
> and less than 10% of the values in Sets have a weak-hashes. Moreover
> the largest set with weak-hash values has 516 elements (on average
> only 1.8 elements), the largest dictionary with weak-hash keys has
> 1002 elements (on average only 4.3 elements). Using HashTable in such
> a situation would introduce a major speed penalty and waste a lot of
> memory.
>
> It would be cool if the Set and the Dictionary would choose their
> implementation strategy automatically depending on the use-case. In a
> standard Pharo image however that would just be the current
> implementation. There are simply no instances in the image where it
> would be worthwhile (large amount of data with bad hash) to use a
> HashMap.
>
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Stéphane Ducasse
In reply to this post by johnmci
I was told that they had that and they removed it in VA5 I guess.

On Oct 22, 2009, at 8:00 PM, John M McIntosh wrote:

> In the far past Visual Age would become the implementation logic for
> Sets depending
> on how big the set was, so for example with 10 elements it would be
> linear list.  Of course
> it's been 14 years, perhaps I'm remembering it wrong.
>
> On 2009-10-22, at 6:46 AM, Lukas Renggli wrote:
>
>>> So this means that by default we have bad performance. no?
>>
>> No.
>>
>> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
>> and less than 10% of the values in Sets have a weak-hashes. Moreover
>> the largest set with weak-hash values has 516 elements (on average
>> only 1.8 elements), the largest dictionary with weak-hash keys has
>> 1002 elements (on average only 4.3 elements). Using HashTable in such
>> a situation would introduce a major speed penalty and waste a lot of
>> memory.
>>
>> It would be cool if the Set and the Dictionary would choose their
>> implementation strategy automatically depending on the use-case. In a
>> standard Pharo image however that would just be the current
>> implementation. There are simply no instances in the image where it
>> would be worthwhile (large amount of data with bad hash) to use a
>> HashMap.
>>
>> Lukas
>>
>> --  
>> Lukas Renggli
>> http://www.lukas-renggli.ch
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>   Twitter:
> squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Lukas Renggli
In reply to this post by Stéphane Ducasse
Yeah, that could be a problem. Strangely SystemDictionary is a
subclass of IdentityDictionary.

It can be easily fixed, don't know if this has much of an impact on
performance if the dictionary is smaller:

  SystemDictionary superclass: Dictionary.
  SystemDictionary allInstances do: [ :each | each rehash ]

Lukas

2009/10/25 Stéphane Ducasse <[hidden email]>:

> Ok I see. I would be curious to see if SystemDictionary does not
> degrade once we load
> Moose, Mondrian, Glamour....
>
>>> So this means that by default we have bad performance. no?
>>
>> No.
>>
>> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
>> and less than 10% of the values in Sets have a weak-hashes. Moreover
>> the largest set with weak-hash values has 516 elements (on average
>> only 1.8 elements), the largest dictionary with weak-hash keys has
>> 1002 elements (on average only 4.3 elements). Using HashTable in such
>> a situation would introduce a major speed penalty and waste a lot of
>> memory.
>>
>> It would be cool if the Set and the Dictionary would choose their
>> implementation strategy automatically depending on the use-case. In a
>> standard Pharo image however that would just be the current
>> implementation. There are simply no instances in the image where it
>> would be worthwhile (large amount of data with bad hash) to use a
>> HashMap.
>>
>> Lukas
>>
>> --
>> Lukas Renggli
>> http://www.lukas-renggli.ch
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Martin McClure-2
In reply to this post by Stéphane Ducasse
Stéphane Ducasse wrote:
> Ok I see. I would be curious to see if SystemDictionary does not  
> degrade once we load
> Moose, Mondrian, Glamour....

SystemDictionary is a subclass of IdentityDictionary, which does have
some performance problems right around 2K elements, and more severe
performance problems at sizes around 4K elements, but it's not nearly as
bad as Set and Dictionary.

But the hashed collection hash-spreading and prime table size code I
sent yesterday gets rid of those bumps, so it's fast from very small
sizes up through sizes > 10000, with very gradual degradation after that.

Regards,

-Martin

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Andres Valloud-4
Martin et al,

Here is a series of checkpointed changesets that mutate the latest RC
Pharo image.  Load the changesets in strict alphabetical order.  I made
a few small changes compared to Martin's changes, hopefully I didn't
miss anything.

Note that the changesets S1 through S3 modify SystemDictionary to be a
hash based collection.  For the SystemDictionary sizes I have in my
image, this makes things slower.  However, I'd expect that, as the
system dictionary grows, using #hash will become faster than using
#scaledIdentityHash.  If you want to load these changes later, make sure
to load changeset A, then S1 through S3, and finally Z (I am hoping B's
hacks, which are undone by W, are not needed).

Enjoy!... I hope :).  Let me know what you think.

Andres.


Martin McClure wrote:

> Stéphane Ducasse wrote:
>  
>> Ok I see. I would be curious to see if SystemDictionary does not  
>> degrade once we load
>> Moose, Mondrian, Glamour....
>>    
>
> SystemDictionary is a subclass of IdentityDictionary, which does have
> some performance problems right around 2K elements, and more severe
> performance problems at sizes around 4K elements, but it's not nearly as
> bad as Set and Dictionary.
>
> But the hashed collection hash-spreading and prime table size code I
> sent yesterday gets rid of those bumps, so it's fast from very small
> sizes up through sizes > 10000, with very gradual degradation after that.
>
> Regards,
>
> -Martin
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>  

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

HashChanges.zip (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: HashTable vs Dictionary

Andres Valloud-4
In reply to this post by Lukas Renggli
Quick tests show using #hash instead of #scaledIdentityHash makes the
system dictionary need 30% more time to retrieve all its keys.  I didn't
try adding.  The system dictionary size was about 3500.

Lukas Renggli wrote:

> Yeah, that could be a problem. Strangely SystemDictionary is a
> subclass of IdentityDictionary.
>
> It can be easily fixed, don't know if this has much of an impact on
> performance if the dictionary is smaller:
>
>   SystemDictionary superclass: Dictionary.
>   SystemDictionary allInstances do: [ :each | each rehash ]
>
> Lukas
>
> 2009/10/25 Stéphane Ducasse <[hidden email]>:
>  
>> Ok I see. I would be curious to see if SystemDictionary does not
>> degrade once we load
>> Moose, Mondrian, Glamour....
>>
>>    
>>>> So this means that by default we have bad performance. no?
>>>>        
>>> No.
>>>
>>> In a fresh Pharo Web image less than 6% of the keys in Dictionaries
>>> and less than 10% of the values in Sets have a weak-hashes. Moreover
>>> the largest set with weak-hash values has 516 elements (on average
>>> only 1.8 elements), the largest dictionary with weak-hash keys has
>>> 1002 elements (on average only 4.3 elements). Using HashTable in such
>>> a situation would introduce a major speed penalty and waste a lot of
>>> memory.
>>>
>>> It would be cool if the Set and the Dictionary would choose their
>>> implementation strategy automatically depending on the use-case. In a
>>> standard Pharo image however that would just be the current
>>> implementation. There are simply no instances in the image where it
>>> would be worthwhile (large amount of data with bad hash) to use a
>>> HashMap.
>>>
>>> Lukas
>>>
>>> --
>>> Lukas Renggli
>>> http://www.lukas-renggli.ch
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>      
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>    
>
>
>
>  

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
12