Large dictionaries are slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Large dictionaries are slow

Carl Gundel-2
I've discovered that dictionaries that have many thousands of keys can be very slow.  For example the following code executes in less than a second on my desktop PC.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [
   1 to: 10000 do: [ :i |
     d at: i asString put: i asString
   ]
]
 
But the following runs in 15 seconds.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [ 
   1 to: 50000 do: [ :i |
  d at: i asString put: i asString
 ]
]
 
My application has some very large dictionaries.  Any tips?
 
Thanks,
 
-Carl Gundel

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

Marten Feldtmann-2
This is because VA uses only Hash-16 Bit values and therefore with large numbers (around 32000 !) of entries you may get performance problems.

Am Dienstag, 21. Mai 2013 18:49:20 UTC+2 schrieb Carl Gundel:
I've discovered that dictionaries that have many thousands of keys can be very slow.  For example the following code executes in less than a second on my desktop PC.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [
   1 to: 10000 do: [ :i |
     d at: i asString put: i asString
   ]
]
 
But the following runs in 15 seconds.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [ 
   1 to: 50000 do: [ :i |
  d at: i asString put: i asString
 ]
]
 
My application has some very large dictionaries.  Any tips?
 
Thanks,
 
-Carl Gundel

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

Louis LaBrunda
In reply to this post by Carl Gundel-2
Hi Carl,

There is a large hash key dictionary or lookuptable but I forget its class name at the moment and have to run.  If you can't find it I will post it later.

Lou

On Tuesday, May 21, 2013 12:49:20 PM UTC-4, Carl Gundel wrote:
I've discovered that dictionaries that have many thousands of keys can be very slow.  For example the following code executes in less than a second on my desktop PC.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [
   1 to: 10000 do: [ :i |
     d at: i asString put: i asString
   ]
]
 
But the following runs in 15 seconds.
 
| d |
d := Dictionary new.
Time millisecondsToRun: [ 
   1 to: 50000 do: [ :i |
  d at: i asString put: i asString
 ]
]
 
My application has some very large dictionaries.  Any tips?
 
Thanks,
 
-Carl Gundel

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

Wayne Johnston
AbtHighCapacityDictionary etc.
Although for smaller sets of data, it's slower than regular Dictionary.
I made a class extension on Dictionary - a method taking an anticipated size as an argument, which sends #new: to either Dictionary or AbtHighCapacityDictionary depending on the parameter.  At the moment the threshold I am using is 15,000.  I am not sure if that is the best number, or whether it may depend on the 'hashy-ness' of the data.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

Carl Gundel-2
Thanks guys, that helped!

On Tuesday, May 21, 2013 2:05:35 PM UTC-4, Wayne Johnston wrote:
AbtHighCapacityDictionary etc.
Although for smaller sets of data, it's slower than regular Dictionary.
I made a class extension on Dictionary - a method taking an anticipated size as an argument, which sends #new: to either Dictionary or AbtHighCapacityDictionary depending on the parameter.  At the moment the threshold I am using is 15,000.  I am not sure if that is the best number, or whether it may depend on the 'hashy-ness' of the data.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

John O'Keefe-3
In reply to this post by Wayne Johnston
Wayne -
 
We have a long-standing work item for VA Smalltalk to implement what I'll call 'hashing policies'. What this means is that there would be a set of hashing algorithms to choose from and criteria for making the choice. When this gets implemented (in the fullness of time), many of the Dictionary clones (AbtHighCapacityDictionary, etc.) could become nothing but 'policy choosers'.
 
John
On Tuesday, May 21, 2013 2:05:35 PM UTC-4, Wayne Johnston wrote:
AbtHighCapacityDictionary etc.
Although for smaller sets of data, it's slower than regular Dictionary.
I made a class extension on Dictionary - a method taking an anticipated size as an argument, which sends #new: to either Dictionary or AbtHighCapacityDictionary depending on the parameter.  At the moment the threshold I am using is 15,000.  I am not sure if that is the best number, or whether it may depend on the 'hashy-ness' of the data.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Large dictionaries are slow

bgridley-2
In reply to this post by Carl Gundel-2
I remember running into this ~15 years ago in VW.  At that time I solved it by creating a ConcatenatedDictionary which was a dictionary of dictionaries. It used the same external interface but internally had a dictionary of dictionaries, each of which held a subset. There might still be a copy of the implementation on a server that Ralph Johnson put up...

I've discovered that dictionaries that have many thousands of keys can be very slow.  For example the following code executes in less than a second on my desktop PC.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.