High performance dictionary (2)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

High performance dictionary (2)

Mark Plas
Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
].  
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
].  
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
        303553  
        301590  
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916

Reply | Threaded
Open this post in threaded view
|

Re: High performance dictionary (2)

Alan Knight-2
Random keys aren't necessarily a thorough test. A large number of consecutive integer keys in a weak dictionary is an interesting case that is pathological in the current implementation.

At 05:05 AM 11/13/2007, Mark Plas wrote:
Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
]. 
         303553 
         301590 
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916

--
Alan Knight [|], Cincom Smalltalk Development
Reply | Threaded
Open this post in threaded view
|

RE: High performance dictionary (2)

Mark Plas
In reply to this post by Mark Plas

Alan,

 

I was wondering what kind of dictionaries you use in Glorp since I imagine you also require a dictionary of some sort to guarantee uniqueness of objects when retrieving rows from a database query. Do you know how it deals with a large number of objects?

 

Mark

 


From: Alan Knight [mailto:[hidden email]]
Sent: dinsdag 13 november 2007 13:32
To: Mark Plas; Paul Baumann; Martin McClure
Cc: [hidden email]
Subject: Re: High performance dictionary (2)

 

Random keys aren't necessarily a thorough test. A large number of consecutive integer keys in a weak dictionary is an interesting case that is pathological in the current implementation.

At 05:05 AM 11/13/2007, Mark Plas wrote:

Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
]. 
         303553 
         301590 
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916


--
Alan Knight [|], Cincom Smalltalk Development
Reply | Threaded
Open this post in threaded view
|

RE: High performance dictionary (2)

Andres Valloud-6
In reply to this post by Mark Plas
I would recommend that the issue of better hash functions is separated
from the issue of better hashed collections.  For the issue of better
hash functions, there is the Hash Analysis Tool available in the public
Store repository.

Thanks,
Andres.

-----Original Message-----
From: Mark Plas [mailto:[hidden email]]
Sent: Tuesday, November 13, 2007 2:06 AM
To: Paul Baumann; Martin McClure
Cc: [hidden email]
Subject: High performance dictionary (2)

Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
].  
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
].  
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
        303553  
        301590  
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916

Reply | Threaded
Open this post in threaded view
|

RE: High performance dictionary (2)

Alan Knight-2
In reply to this post by Mark Plas
I use a subclass of the EphemeronDictionary in VisualWorks if using weak references. Otherwise I just use a regular dictionary. So basically I'm not doing anything special. Other than a few pathological cases (e.g. the sequential integers) I haven't found the dictionary performance to be a big issue.

At 07:55 AM 11/13/2007, Mark Plas wrote:
Alan,
 
I was wondering what kind of dictionaries you use in Glorp since I imagine you also require a dictionary of some sort to guarantee uniqueness of objects when retrieving rows from a database query. Do you know how it deals with a large number of objects?
 
Mark
 

From: Alan Knight [[hidden email]]
Sent: dinsdag 13 november 2007 13:32
To: Mark Plas; Paul Baumann; Martin McClure
Cc: [hidden email]
Subject: Re: High performance dictionary (2)
 
Random keys aren't necessarily a thorough test. A large number of consecutive integer keys in a weak dictionary is an interesting case that is pathological in the current implementation.

At 05:05 AM 11/13/2007, Mark Plas wrote:

Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
]. 
         303553 
         301590 
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916

--
Alan Knight [|], Cincom Smalltalk Development
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

--
Alan Knight [|], Cincom Smalltalk Development
Reply | Threaded
Open this post in threaded view
|

RE: High performance dictionary (2)

Alan Knight-2
In reply to this post by Andres Valloud-6
+1

At 01:16 PM 11/13/2007, Valloud, Andres wrote:
I would recommend that the issue of better hash functions is separated
from the issue of better hashed collections.  For the issue of better
hash functions, there is the Hash Analysis Tool available in the public
Store repository.

Thanks,
Andres.

-----Original Message-----
From: Mark Plas [[hidden email]]
Sent: Tuesday, November 13, 2007 2:06 AM
To: Paul Baumann; Martin McClure
Cc: [hidden email]
Subject: High performance dictionary (2)

Hi Paul/Martin,

I've made an attempt to write a better general purpose
WeakValueDictionary than OtWeakValueDictionary. I think I've fairly
succeeded (see the benchmarks at the end of the mail).

In general this is what I obtained in the experiments done below:
- #at:put: is about 10x faster
- #includesKey: (when the key is in the dictionary) is about 19x faster
- #includesKey: (when the key is NOT in the dictionary) is about 7x
faster

OtWeakValueDictionary is using #hash ^self on LPI.

My dictionary uses buckets and has arrays of keys, values, hashtable
(containing the hash buckets) and a linked list encoded in an array.
I've also used a different implementation of hash on LPI. Instead of
returning ^self, I create SmallIntegers from chunks of three bytes and
bitXor: them together. Even though calculating the hash (obviously)
takes more time than just doing ^self, it performs better when later on
a \\ is sent to it.

I've done other tests with it (1M random LPI numbers in a random order).
In those cases the results were less impressive than the ones shown
above, but were still in the range of 3-15x faster depending on the
functionality being tested.

These are some good things about the implementation:
- it is general purpose. No restriction is put on the kind of
keys/values.
- in the experiments I did it always performs good (#at:put, #at:,
#includesKey:, #removeKey:). The performance is relatively independent
of the keys that are put into it. Much depends of course on the hash
function producing good values.
- it goes relatively easy on the memory usage. For a dictionary
containing 1M elements, 3.5M array slots are needed (for comparison:
OtWeakValueDictionary uses 4M array slots).
- it can grow fairly quick
- it allows concurrent access

Paul/Martin, it would be good to know how the Gemstone dictionaries
perform on the three experiments below. The first one is the #at:put: we
already tested, the two others test performance of #includesKey:. If you
could find the time to try these out and mail the result I would
appreciate it.

Thanks,
Mark

Here are the benchmarks. WeakValueDictionary is my implementation,
OtWeakValueDictionary is the one from OpenTalk using ^self on LPI for
its #hash method. The results for both are provided so that you can
compare with the ones you get.

######### at:put: #################
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
Core.Time microsecondsToRun: [
        items do: [:i | d at: i  put: 'test' ]
].
304271
296721
307280


| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal].
Core.Time microsecondsToRun: [items do: [:i | d at: i put: 'test']]  
3055544
3047453
3049904


######## includesKey: the key is in the dictionary ##############
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
292774
293001
303625
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i]
]. 
5640235
5647289
5655267


####### includesKey: when the key is not in the dictionary #######
| items d |
d := WeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
]. 
         303553 
         301590 
        302882
 
| items d |
d := OtWeakValueDictionary new: 1000000.
items := (1 to: 1000000) collect: [:i | i + Core.SmallInteger maxVal ].
items do: [:i | d at: i  put: 'test' ].
items keysAndValuesDo: [:idx :i | items at: idx put: i negated].
Core.Time microsecondsToRun: [
        items do: [:i | d includesKey: i  ]
].  
  2098209
  2122677
  2073916

--
Alan Knight [|], Cincom Smalltalk Development