Hi Paul/Martin,
I've made an attempt to write a better general purpose WeakValueDictionary than OtWeakValueDictionary. I think I've fairly succeeded (see the benchmarks at the end of the mail). In general this is what I obtained in the experiments done below: - #at:put: is about 10x faster - #includesKey: (when the key is in the dictionary) is about 19x faster - #includesKey: (when the key is NOT in the dictionary) is about 7x faster OtWeakValueDictionary is using #hash ^self on LPI. My dictionary uses buckets and has arrays of keys, values, hashtable (containing the hash buckets) and a linked list encoded in an array. I've also used a different implementation of hash on LPI. Instead of returning ^self, I create SmallIntegers from chunks of three bytes and bitXor: them together. Even though calculating the hash (obviously) takes more time than just doing ^self, it performs better when later on a \\ is sent to it. I've done other tests with it (1M random LPI numbers in a random order). In those cases the results were less impressive than the ones shown above, but were still in the range of 3-15x faster depending on the functionality being tested. These are some good things about the implementation: - it is general purpose. No restriction is put on the kind of keys/values. - in the experiments I did it always performs good (#at:put, #at:, #includesKey:, #removeKey:). The performance is relatively independent of the keys that are put into it. Much depends of course on the hash function producing good values. - it goes relatively easy on the memory usage. For a dictionary containing 1M elements, 3.5M array slots are needed (for comparison: OtWeakValueDictionary uses 4M array slots). - it can grow fairly quick - it allows concurrent access Paul/Martin, it would be good to know how the Gemstone dictionaries perform on the three experiments below. The first one is the #at:put: we already tested, the two others test performance of #includesKey:. If you could find the time to try these out and mail the result I would appreciate it. Thanks, Mark Here are the benchmarks. WeakValueDictionary is my implementation, OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for its #hash method. The results for both are provided so that you can compare with the ones you get. ######### at:put: ################# | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. Core.Time microsecondsToRun: [ items do: [:i | d at: i put: 'test' ] ]. 304271 296721 307280 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal]. Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']] 3055544 3047453 3049904 ######## includesKey: the key is in the dictionary ############## | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i] ]. 292774 293001 303625 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i] ]. 5640235 5647289 5655267 ####### includesKey: when the key is not in the dictionary ####### | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. items keysAndValuesDo: [:idx :i | items at: idx put: i negated]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i ] ]. 303553 301590 302882 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. items keysAndValuesDo: [:idx :i | items at: idx put: i negated]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i ] ]. 2098209 2122677 2073916 |
Random keys aren't necessarily a thorough test. A large
number of consecutive integer keys in a weak dictionary is an interesting
case that is pathological in the current implementation.
At 05:05 AM 11/13/2007, Mark Plas wrote: Hi Paul/Martin, --
Alan Knight [|], Cincom Smalltalk Development
|
In reply to this post by Mark Plas
Alan, I was wondering what kind of dictionaries
you use in Glorp since I imagine you also require a dictionary of some sort to
guarantee uniqueness of objects when retrieving rows from a database query. Do
you know how it deals with a large number of objects? Mark From: Alan Knight
[mailto:[hidden email]] Random keys aren't necessarily a thorough test. A large number of
consecutive integer keys in a weak dictionary is an interesting case that is
pathological in the current implementation. Hi Paul/Martin, --
Alan Knight [|], Cincom Smalltalk Development
|
In reply to this post by Mark Plas
I would recommend that the issue of better hash functions is separated
from the issue of better hashed collections. For the issue of better hash functions, there is the Hash Analysis Tool available in the public Store repository. Thanks, Andres. -----Original Message----- From: Mark Plas [mailto:[hidden email]] Sent: Tuesday, November 13, 2007 2:06 AM To: Paul Baumann; Martin McClure Cc: [hidden email] Subject: High performance dictionary (2) Hi Paul/Martin, I've made an attempt to write a better general purpose WeakValueDictionary than OtWeakValueDictionary. I think I've fairly succeeded (see the benchmarks at the end of the mail). In general this is what I obtained in the experiments done below: - #at:put: is about 10x faster - #includesKey: (when the key is in the dictionary) is about 19x faster - #includesKey: (when the key is NOT in the dictionary) is about 7x faster OtWeakValueDictionary is using #hash ^self on LPI. My dictionary uses buckets and has arrays of keys, values, hashtable (containing the hash buckets) and a linked list encoded in an array. I've also used a different implementation of hash on LPI. Instead of returning ^self, I create SmallIntegers from chunks of three bytes and bitXor: them together. Even though calculating the hash (obviously) takes more time than just doing ^self, it performs better when later on a \\ is sent to it. I've done other tests with it (1M random LPI numbers in a random order). In those cases the results were less impressive than the ones shown above, but were still in the range of 3-15x faster depending on the functionality being tested. These are some good things about the implementation: - it is general purpose. No restriction is put on the kind of keys/values. - in the experiments I did it always performs good (#at:put, #at:, #includesKey:, #removeKey:). The performance is relatively independent of the keys that are put into it. Much depends of course on the hash function producing good values. - it goes relatively easy on the memory usage. For a dictionary containing 1M elements, 3.5M array slots are needed (for comparison: OtWeakValueDictionary uses 4M array slots). - it can grow fairly quick - it allows concurrent access Paul/Martin, it would be good to know how the Gemstone dictionaries perform on the three experiments below. The first one is the #at:put: we already tested, the two others test performance of #includesKey:. If you could find the time to try these out and mail the result I would appreciate it. Thanks, Mark Here are the benchmarks. WeakValueDictionary is my implementation, OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for its #hash method. The results for both are provided so that you can compare with the ones you get. ######### at:put: ################# | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. Core.Time microsecondsToRun: [ items do: [:i | d at: i put: 'test' ] ]. 304271 296721 307280 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal]. Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']] 3055544 3047453 3049904 ######## includesKey: the key is in the dictionary ############## | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i] ]. 292774 293001 303625 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i] ]. 5640235 5647289 5655267 ####### includesKey: when the key is not in the dictionary ####### | items d | d := WeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. items keysAndValuesDo: [:idx :i | items at: idx put: i negated]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i ] ]. 303553 301590 302882 | items d | d := OtWeakValueDictionary new: 1000000. items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ]. items do: [:i | d at: i put: 'test' ]. items keysAndValuesDo: [:idx :i | items at: idx put: i negated]. Core.Time microsecondsToRun: [ items do: [:i | d includesKey: i ] ]. 2098209 2122677 2073916 |
In reply to this post by Mark Plas
I use a subclass of the EphemeronDictionary in VisualWorks
if using weak references. Otherwise I just use a regular dictionary. So
basically I'm not doing anything special. Other than a few pathological
cases (e.g. the sequential integers) I haven't found the dictionary
performance to be a big issue.
At 07:55 AM 11/13/2007, Mark Plas wrote: Alan, --
Alan Knight [|], Cincom Smalltalk Development
|
In reply to this post by Andres Valloud-6
+1
At 01:16 PM 11/13/2007, Valloud, Andres wrote: I would recommend that the issue of better hash functions is separated --
Alan Knight [|], Cincom Smalltalk Development
|
Free forum by Nabble | Edit this page |