Smalltalk › Squeak › Squeak - Dev

Why do we have SmallDictionary?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

22 messages Options

Max Leske

Why do we have SmallDictionary?

Hi,

I was messing around with SmallDictionary when I suddenly realised that
I can't find a single reason to use it over a normal Dictionary. While
its name and class comment imply that it is somehow an optimised
Dictionary, I don't see any measurement where that actually holds up.
The following was run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and
significantly faster while reading and writing (I know that this isn't a
good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
curious to know how SmallDictionary came to be. There must have been
some advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information:
Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84
(32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

EstebanLM

Re: [Pharo-dev] Why do we have SmallDictionary?

Hi,

it was supposed to be optimised for small sizes so I guessed it was about your choice of 100000 elements, but a fast benchmark with small numbers didn’t offered better results.

So question is appropriate.
Why we have it?
Or better: why SmallDictionary is not faster?
Even better: is this test good? :)

cheers,
Esteban

> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote:
>
> Hi,
>
> I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below):
>
> | d |
> d := SmallDictionary new.
> d sizeInMemory. "24"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>
>
>
> | d |
> d := Dictionary new.
> d sizeInMemory. "16"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>
>
> As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point).
>
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.
>
>
> Cheers,
> Max
>
>
>
>
>
> Image version: Pharo 7.0
> Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>
> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> OS: macOS 10.13.5
> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>

Max Leske

Re: Why do we have SmallDictionary?

In reply to this post by Max Leske

Well, I can answer my own question: SmallDictionary is a lot more space
efficient.

I think the class comment should clarify the use case for
SmallDictionary and mention the performance trade off.

Max

On 8 Jun 2018, at 8:46, Max Leske wrote:

> Hi,
>
> I was messing around with SmallDictionary when I suddenly realised
> that I can't find a single reason to use it over a normal Dictionary.
> While its name and class comment imply that it is somehow an optimised
> Dictionary, I don't see any measurement where that actually holds up.
> The following was run in a Pharo 7 image on a recent VM (see below):
>
> | d |
> d := SmallDictionary new.
> d sizeInMemory. "24"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>
>
>
> | d |
> d := Dictionary new.
> d sizeInMemory. "16"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>
>
> As you can see, SmallDictionary is 8 bytes larger per instance and
> significantly faster while reading and writing (I know that this isn't
> a good benchmark but it suffices to make my point).
>
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
> also curious to know how SmallDictionary came to be. There must have
> been some advantage over Dictionary at some point in the past.
>
>
> Cheers,
> Max
>
>
>
>
>
> Image version: Pharo 7.0
> Build information:
> Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84
> (32 Bit)
>
> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
> $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> OS: macOS 10.13.5
> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

Marcus Denker-4

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Max Leske

can you try with <10 elements?

>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.
>
It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements.
The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.

I am sure I did a benchmark back then, but of course this could have changed in the meantime.

It would be nice to redo the benchmark.

For size: does #sizeInMemory take the internal data into account? e.g for Dictionary there is an Array that contains Associations, you need to count those, too.

In the end this is a hack anyway: if needed Dictionary could do things like this transparently… without the user having to decide. And the question is if this is not just a bad side effect of the particular implementation.

Marcus

Max Leske

Re: [Pharo-dev] Why do we have SmallDictionary?

On 8 Jun 2018, at 9:02, Marcus Denker wrote:

>> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote:
>>
>> Hi,
>>
>> I was messing around with SmallDictionary when I suddenly realised
>> that I can't find a single reason to use it over a normal Dictionary.
>> While its name and class comment imply that it is somehow an
>> optimised Dictionary, I don't see any measurement where that actually
>> holds up. The following was run in a Pharo 7 image on a recent VM
>> (see below):
>>
>> | d |
>> d := SmallDictionary new.
>> d sizeInMemory. "24"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>>
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>
>>
>>
>> | d |
>> d := Dictionary new.
>> d sizeInMemory. "16"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>>
>>
> can you try with <10 elements?

>>
>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
>> also curious to know how SmallDictionary came to be. There must have
>> been some advantage over Dictionary at some point in the past.
>>
> It came from RB… the idea was that there are (in the Refactoring
> engine) a lot of very small dictionaries with <10 elements.
> The idea is that for such dictionaries, the overhead of hashing was
> higher than just linear search.
>
> I am sure I did a benchmark back then, but of course this could have
> changed in the meantime.
>
> It would be nice to redo the benchmark.
>
> For size: does #sizeInMemory take the internal data into account? e.g
> for Dictionary there is an Array that contains Associations, you need
> to count those, too.
>
> In the end this is a hack anyway: if needed Dictionary could do things
> like this transparently… without the user having to decide. And the
> question is if this is not just a bad side effect of the particular
> implementation.
>
> Marcus

Max Leske

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Marcus Denker-4

(Sorry, hit send accidentally).

On 8 Jun 2018, at 9:02, Marcus Denker wrote:

The results are in!
Note that I had to increase the repetitions by a factor or 100.

| d |
d := SmallDictionary new.
[10000000 timesRepeat: [
1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"

[10000000 timesRepeat: [
d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686"

| d |
d := Dictionary new.
[10000000 timesRepeat: [
1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"

[10000000 timesRepeat: [
d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"

So SmallDictionary is indeed very fast for only few elements. Once the
SmallDictionary has to grow its performance decreases rapidly, or at
least the grow operation takes a long time.
Dictionary still outperforms SmallDictionary though, so I guess the only
reason left is space efficency.

No, I didn't check for that initially. When I did, it became apparent
that the larger the dictionary the more space you save with
SmallDictionary.

>
> In the end this is a hack anyway: if needed Dictionary could do things
> like this transparently… without the user having to decide. And the
> question is if this is not just a bad side effect of the particular
> implementation.

Good point. Sounds like what Stefan Marr wrote in the paper where he
performed a field study of collection libraries ;)

Max

>
> Marcus

Max Leske

Re: [Pharo-dev] Why do we have SmallDictionary?

On 8 Jun 2018, at 9:30, Max Leske wrote:

> (Sorry, hit send accidentally).
>
>
> On 8 Jun 2018, at 9:02, Marcus Denker wrote:
>
>>> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I was messing around with SmallDictionary when I suddenly realised
>>> that I can't find a single reason to use it over a normal
>>> Dictionary. While its name and class comment imply that it is
>>> somehow an optimised Dictionary, I don't see any measurement where
>>> that actually holds up. The following was run in a Pharo 7 image on
>>> a recent VM (see below):
>>>
>>> | d |
>>> d := SmallDictionary new.
>>> d sizeInMemory. "24"
>>> [100000 timesRepeat: [
>>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>>>
>>> [100000 timesRepeat: [
>>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>>
>>>
>>>
>>> | d |
>>> d := Dictionary new.
>>> d sizeInMemory. "16"
>>> [100000 timesRepeat: [
>>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
>>> [100000 timesRepeat: [
>>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>>>
>>>
>> can you try with <10 elements?
>
> The results are in!
> Note that I had to increase the repetitions by a factor or 100.
>
>
> | d |
> d := SmallDictionary new.
> [10000000 timesRepeat: [
> 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"
>
> [10000000 timesRepeat: [
> d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686"
>
>
>
> | d |
> d := Dictionary new.
> [10000000 timesRepeat: [
> 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"
>
> [10000000 timesRepeat: [
> d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"
>
>
> So SmallDictionary is indeed very fast for only few elements. Once the
> SmallDictionary has to grow its performance decreases rapidly, or at
> least the grow operation takes a long time.
> Dictionary still outperforms SmallDictionary though, so I guess the
> only reason left is space efficency.
>
>
>>>
>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
>>> also curious to know how SmallDictionary came to be. There must have
>>> been some advantage over Dictionary at some point in the past.
>>>
>> It came from RB… the idea was that there are (in the Refactoring
>> engine) a lot of very small dictionaries with <10 elements.
>> The idea is that for such dictionaries, the overhead of hashing was
>> higher than just linear search.
>>
>> I am sure I did a benchmark back then, but of course this could have
>> changed in the meantime.
>>
>> It would be nice to redo the benchmark.

That would be very nice, as I just realised, that the type of object
used as key has an effect on the hashing time... There are a lot of
variables here, unfortunately.

Max

>>
>> For size: does #sizeInMemory take the internal data into account? e.g
>> for Dictionary there is an Array that contains Associations, you need
>> to count those, too.
>
> No, I didn't check for that initially. When I did, it became apparent
> that the larger the dictionary the more space you save with
> SmallDictionary.
>
>>
>> In the end this is a hack anyway: if needed Dictionary could do
>> things like this transparently… without the user having to decide.
>> And the question is if this is not just a bad side effect of the
>> particular implementation.
>
> Good point. Sounds like what Stefan Marr wrote in the paper where he
> performed a field study of collection libraries ;)
>
> Max
>
>>
>> Marcus

Stéphane Rollandin

Re: Why do we have SmallDictionary?

In reply to this post by Max Leske

I was messing around with SmallDictionary when I suddenly realised that
> I can't find a single reason to use it over a normal Dictionary. While
> its name and class comment imply that it is somehow an optimised
> Dictionary, I don't see any measurement where that actually holds up.
> The following was run in a Pharo 7 image on a recent VM (see below):

FWIW it seems there is no SmallDictionary in Squeak.

Stef

Clément Béra

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Max Leske

Hi Max,

Theoretically, for a small number of elements, usually somewhere between 3 and 30 depending on implementations, a linear search is faster than a hash search, especially in the Pharo dictionary hash search implementation.

Efficient dictionary implementations are usually bucket-based. The dictionary holds a certain number of buckets, and based on the key hash, the bucket where the key value is present is determined. Small buckets are linear (arrays or linked list). Large buckets are typically balanced binary trees (red-black trees typically). Under a certain number of elements there is a single bucket, which means a linear search is performed, as for the SmallDictionary. When it grows the dictionary search becomes a combination between a hash search and a linear or tree search.

Pharo dictionary search is first hash-based, then all the buckets are represented next to each other in the same arrays and a linear search is performed there, leading to many collisions and slower search time (especially when the element is not found), sometimes the code searches over multiple buckets because the dictionary is too full or there are too many near-collisions. The memory consumption is competitive with the advanced implementations though (precise measurements would need to be made).

Method dictionaries are represented differently to optimize the look-up logic.

If you want to improve things and have one dictionary implementation instead of two, implement or look for a bucket based dictionary and put it in the base image instead of Dictionary. This is quite some work since there are many APIs to port. You can look at the Pinnochio implementation, it's quite good but they've not implemented large buckets.

On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:

Hi,

I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

Clément Béra
https://clementbera.github.io/

https://clementbera.wordpress.com/

Levente Uzonyi

Re: [Pharo-dev] Why do we have SmallDictionary?

On Fri, 8 Jun 2018, Clément Bera wrote:

> Hi Max,
> Theoretically, for a small number of elements, usually somewhere between 3 and 30 depending on implementations, a linear search is faster than a hash search, especially in the Pharo dictionary hash search
> implementation.
>
> Efficient dictionary implementations are usually bucket-based. The dictionary holds a certain number of buckets, and based on the key hash, the bucket where the key value is present is determined. Small
> buckets are linear (arrays or linked list). Large buckets are typically balanced binary trees (red-black trees typically). Under a certain number of elements there is a single bucket, which means a linear
> search is performed, as for the SmallDictionary. When it grows the dictionary search becomes a combination between a hash search and a linear or tree search.

The bucket-based list you described is called separate chaining collision
resolution, and it won't have any better performance in Squeak than the
currently used open addressing.
However, it is preferrable, when you can't trust your hash function. When
you do "crazy" things, like using a tree as a bucket, you just exchange
the possible O(n) worst case time with O(log(n)).

>
> Pharo dictionary search is first hash-based, then all the buckets are represented next to each other in the same arrays and a linear search is performed there, leading to many collisions and slower search
> time (especially when the element is not found), sometimes the code searches over multiple buckets because the dictionary is too full or there are too many near-collisions. The memory consumption is
> competitive with the advanced implementations though (precise measurements would need to be made).

This is called open addressing, and if your hash function is good enough,
the lists are short, so those "many collisions and slower search time"
just don't happen. And no, separate chaining needs a lot more memory than
open addressing, unless you manage to get rid of Associations, but that's
a lot harder than it sounds like.

>
> Method dictionaries are represented differently to optimize the look-up logic.

MethodDictionary is optimized to let the VM do lookups more easily. Memory
consumption is worse when the dictionary is sparse, but gets better as it
gets filled up compared to the current Dictionary implementation
(assuming Pharo still uses the same representation as Squeak does).

>
> If you want to improve things and have one dictionary implementation instead of two, implement or look for a bucket based dictionary and put it in the base image instead of Dictionary. This is quite some work
> since there are many APIs to port. You can look at the Pinnochio implementation, it's quite good but they've not implemented large buckets.

Well, as an experiment, I'm sure it'll be fun. But don't expect better
performance nor better memory usage.

Levente

>
>
>
> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:
> Hi,
>
> I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow
> an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below):
>
> | d |
> d := SmallDictionary new.
> d sizeInMemory. "24"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>
>
>
> | d |
> d := Dictionary new.
> d sizeInMemory. "16"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>
>
> As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point).
>
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the
> past.
>
>
> Cheers,
> Max
>
>
>
>
>
> Image version: Pharo 7.0
> Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>
> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> OS: macOS 10.13.5
> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>
>
>
>
> --
> Clément Béra
> https://clementbera.github.io/https://clementbera.wordpress.com/
>
>

Andres Valloud-4

Re: [Pharo-dev] Why do we have SmallDictionary?

In addition, open addressing with linear probing has superior cache line
read behavior (no indirection / random traversal, and if the first probe
misses the second one was likely cached by the first one).

On 6/8/18 1:37 , Levente Uzonyi wrote:

> On Fri, 8 Jun 2018, Clément Bera wrote:
>
>> Hi Max,
>> Theoretically, for a small number of elements, usually somewhere
>> between 3 and 30 depending on implementations, a linear search is
>> faster than a hash search, especially in the Pharo dictionary hash search
>> implementation.
>>
>> Efficient dictionary implementations are usually bucket-based. The
>> dictionary holds a certain number of buckets, and based on the key
>> hash, the bucket where the key value is present is determined. Small
>> buckets are linear (arrays or linked list). Large buckets are
>> typically balanced binary trees (red-black trees typically). Under a
>> certain number of elements there is a single bucket, which means a linear
>> search is performed, as for the SmallDictionary. When it grows the
>> dictionary search becomes a combination between a hash search and a
>> linear or tree search.
>
> The bucket-based list you described is called separate chaining
> collision resolution, and it won't have any better performance in Squeak
> than the currently used open addressing.
> However, it is preferrable, when you can't trust your hash function.
> When you do "crazy" things, like using a tree as a bucket, you just
> exchange the possible O(n) worst case time with O(log(n)).
>
>>
>> Pharo dictionary search is first hash-based, then all the buckets are
>> represented next to each other in the same arrays and a linear search
>> is performed there, leading to many collisions and slower search
>> time (especially when the element is not found), sometimes the code
>> searches over multiple buckets because the dictionary is too full or
>> there are too many near-collisions. The memory consumption is
>> competitive with the advanced implementations though (precise
>> measurements would need to be made).
>
> This is called open addressing, and if your hash function is good
> enough, the lists are short, so those "many collisions and slower search
> time" just don't happen. And no, separate chaining needs a lot more
> memory than open addressing, unless you manage to get rid of
> Associations, but that's a lot harder than it sounds like.
>
>>
>> Method dictionaries are represented differently to optimize the
>> look-up logic.
>
> MethodDictionary is optimized to let the VM do lookups more easily.
> Memory consumption is worse when the dictionary is sparse, but gets
> better as it gets filled up compared to the current Dictionary
> implementation (assuming Pharo still uses the same representation as
> Squeak does).
>
>>
>> If you want to improve things and have one dictionary implementation
>> instead of two, implement or look for a bucket based dictionary and
>> put it in the base image instead of Dictionary. This is quite some work
>> since there are many APIs to port. You can look at the Pinnochio
>> implementation, it's quite good but they've not implemented large
>> buckets.
>
> Well, as an experiment, I'm sure it'll be fun. But don't expect better
> performance nor better memory usage.
>
> Levente
>
>>
>>
>>
>> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:
>> Hi,
>>
>> I was messing around with SmallDictionary when I suddenly
>> realised that I can't find a single reason to use it over a normal
>> Dictionary. While its name and class comment imply that it is somehow
>> an optimised Dictionary, I don't see any measurement where that
>> actually holds up. The following was run in a Pharo 7 image on a
>> recent VM (see below):
>>
>> | d |
>> d := SmallDictionary new.
>> d sizeInMemory. "24"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:05.226"
>>
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>
>>
>>
>> | d |
>> d := Dictionary new.
>> d sizeInMemory. "16"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:00.385"
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>>
>>
>> As you can see, SmallDictionary is 8 bytes larger per instance
>> and significantly faster while reading and writing (I know that this
>> isn't a good benchmark but it suffices to make my point).
>>
>>
>> Is anyone aware of a reason for hanging on to SmallDictionary?
>> I'm also curious to know how SmallDictionary came to be. There must
>> have been some advantage over Dictionary at some point in the
>> past.
>>
>>
>> Cheers,
>> Max
>>
>>
>>
>>
>>
>> Image version: Pharo 7.0
>> Build information:
>> Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84
>> (32 Bit)
>>
>> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
>> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
>> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
>> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
>> VM: 201711262336
>> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov
>> 27 00:36:29 2017 +0100 $ Plugins: 201711262336
>> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>>
>> OS: macOS 10.13.5
>> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>>
>>
>>
>>
>> --
>> Clément Béra
>> https://clementbera.github.io/https://clementbera.wordpress.com/
>>
>>
>
>
>

Levente Uzonyi

Re: Why do we have SmallDictionary?

In reply to this post by Max Leske

On Fri, 8 Jun 2018, Max Leske wrote:

> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
> curious to know how SmallDictionary came to be. There must have been
> some advantage over Dictionary at some point in the past.

IIRC it came from Seaside, where it was used to minimize the memory
footprint of sessions.

Levente

Max Leske

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Clément Béra

On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:

> FWIW it seems there is no SmallDictionary in Squeak.

Oh... Thanks Stéf, I wasn't aware of that.

On 8 June 2018, at 15:13, John Brant wrote:

>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
>>> also curious to know how SmallDictionary came to be. There must have
>>> been some advantage over Dictionary at some point in the past.
>>
>> It came from RB… the idea was that there are (in the Refactoring
>> engine) a lot of very small dictionaries with <10 elements.
>> The idea is that for such dictionaries, the overhead of hashing was
>> higher than just linear search.
>
> I created its ancestor in VW some 20+ years ago (as
> RBSmallDictionary).
> It was used when doing pattern matching. When it performs a pattern
> match against an AST, it puts the potential value of the pattern
> variable in the dictionary. If the value is used later in the pattern,
> then we can get the previous value and make sure that we have an
> equivalent AST. This allows you to write patterns like:
>
> `@a = `@a
>
> to find where someone has the same expression on both sides of the #=
> message. Since most patterns have very few pattern variables, these
> dictionaries won't hold many entries. Furthermore, we are likely to
> abort the match when we have 0 entries.
>
> The original RBSmallDictionary had an #empty method that "emptied" the
> dictionary without actually removing any elements -- it just set the
> size to 0. In a general dictionary this would lead to memory leaks
> since
> the previous values are still held by the dictionary. However, these
> dictionaries were only used during the matching process and went away
> after the process completed.
>
> Anyway, at the time when we converted our pattern matching code from
> using the VW parser with our pattern matching extensions to use the
> new
> RB parser with pattern matching, the time to run Smalllint on the
> image
> was cut in half even though our parser was quite a bit slower than the
> VW parser. I don't remember everything that was done, but I think that
> most of the speedup came from having special pattern AST nodes and the
> small dictionary.
>
>
> John Brant

Very interesting! Thanks John!

As Marcus has mentioned before in this thread, it would make a lot of
sense to run benchmarks again. Actually, I think it would be nice to
have a benchmark suite for these cases, that would let us monitor the
performance and ensure that changes to the codebase don't have a
deteriorative effect. I'm not saying that it would be easy to make this
happen, writing proper benchmarks is hard (for me especially, as it
seems, given my utter failure to think of the edge cases before starting
this thread). Such a suite might also prevent these sorts of questions
on the mailing list in the future, or at least might make it easier to
answer them.

On 8 June 2018, at 13:01, Andres Valloud wrote:

> In addition, open addressing with linear probing has superior cache
> line read behavior (no indirection / random traversal, and if the
> first probe misses the second one was likely cached by the first one).

Ah, nice catch! Although that would require frequent access to the
dictionary / repeated access to the same items to have an effect,
wouldn't it?

On 8 Jun 2018, at 10:01, Clément Bera wrote:

> Hi Max,
>
> Theoretically, for a small number of elements, usually somewhere
> between 3
> and 30 depending on implementations, a linear search is faster than a
> hash
> search, especially in the Pharo dictionary hash search implementation.
>
> Efficient dictionary implementations are usually bucket-based. The
> dictionary holds a certain number of buckets, and based on the key
> hash,
> the bucket where the key value is present is determined. Small buckets
> are
> linear (arrays or linked list). Large buckets are typically balanced
> binary
> trees (red-black trees typically). Under a certain number of elements
> there
> is a single bucket, which means a linear search is performed, as for
> the
> SmallDictionary. When it grows the dictionary search becomes a
> combination
> between a hash search and a linear or tree search.
>
> Pharo dictionary search is first hash-based, then all the buckets are
> represented next to each other in the same arrays and a linear search
> is
> performed there, leading to many collisions and slower search time
> (especially when the element is not found), sometimes the code
> searches
> over multiple buckets because the dictionary is too full or there are
> too
> many near-collisions. The memory consumption is competitive with the
> advanced implementations though (precise measurements would need to be
> made).
>
> Method dictionaries are represented differently to optimize the
> look-up
> logic.
>
> If you want to improve things and have one dictionary implementation
> instead of two, implement or look for a bucket based dictionary and
> put it
> in the base image instead of Dictionary. This is quite some work since
> there are many APIs to port. You can look at the Pinnochio
> implementation,
> it's quite good but they've not implemented large buckets.

Thanks for the detailed explanations Clément and Levente. I'll probably
not add a new dictionary implementation ;)

>
>
>
> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:
>
>> Hi,
>>
>> I was messing around with SmallDictionary when I suddenly realised
>> that I
>> can't find a single reason to use it over a normal Dictionary. While
>> its
>> name and class comment imply that it is somehow an optimised
>> Dictionary, I
>> don't see any measurement where that actually holds up. The following
>> was
>> run in a Pharo 7 image on a recent VM (see below):
>>
>> | d |
>> d := SmallDictionary new.
>> d sizeInMemory. "24"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:05.226"
>>
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>
>>
>>
>> | d |
>> d := Dictionary new.
>> d sizeInMemory. "16"
>> [100000 timesRepeat: [
>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:00.385"
>> [100000 timesRepeat: [
>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>>
>>
>> As you can see, SmallDictionary is 8 bytes larger per instance and
>> significantly faster while reading and writing (I know that this
>> isn't a
>> good benchmark but it suffices to make my point).
>>
>>
>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
>> also
>> curious to know how SmallDictionary came to be. There must have been
>> some
>> advantage over Dictionary at some point in the past.
>>
>>
>> Cheers,
>> Max
>>
>>
>>
>>
>>
>> Image version: Pharo 7.0
>> Build information: Pharo-7.0+alpha.build.961.sha.
>> a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>>
>> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
>> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
>> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
>> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
>> VM: 201711262336
>> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>> Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
>> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>>
>> OS: macOS 10.13.5
>> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>>
>>
>
>
> --
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/

Clément Béra

Re: [Pharo-dev] Why do we have SmallDictionary?

Hum, about dictionary implementations, I've just found this [1]:

Open addressing vs. chaining

	Chaining	Open addressing
Collision resolution	Using external data structure	Using hash table itself
Memory waste	Pointer size overhead per entry (storing list heads in the table)	No overhead ¹
Performance dependence on table's load factor	Directly proportional	Proportional to (loadFactor) / (1 - loadFactor)
Allow to store more items, than hash table size	Yes	No. Moreover, it's recommended to keep table's load factor below 0.7
Hash function requirements	Uniform distribution	Uniform distribution, should avoid clustering
Handle removals	Removals are ok	Removals clog the hash table with "DELETED" entries
Implementation	Simple	Correct implementation of open addressing based hash table is quite tricky

¹ Hash tables with chaining can work efficiently even with load factor more than 1. At the same time, tables based on open addressing scheme require load factor not to exceed 0.7 to be efficient. Hence, 30% of slots remain empty, which leads to obvious memory waste.

Do you guys agree with this?

Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6).

Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.

I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is not that easy.

I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really understanding details.

[1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing

On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:

On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:

FWIW it seems there is no SmallDictionary in Squeak.

Oh... Thanks Stéf, I wasn't aware of that.

On 8 June 2018, at 15:13, John Brant wrote:

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.

It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements.
The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.

I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
It was used when doing pattern matching. When it performs a pattern
match against an AST, it puts the potential value of the pattern
variable in the dictionary. If the value is used later in the pattern,
then we can get the previous value and make sure that we have an
equivalent AST. This allows you to write patterns like:

`@a = `@a

to find where someone has the same expression on both sides of the #=
message. Since most patterns have very few pattern variables, these
dictionaries won't hold many entries. Furthermore, we are likely to
abort the match when we have 0 entries.

The original RBSmallDictionary had an #empty method that "emptied" the
dictionary without actually removing any elements -- it just set the
size to 0. In a general dictionary this would lead to memory leaks since
the previous values are still held by the dictionary. However, these
dictionaries were only used during the matching process and went away
after the process completed.

Anyway, at the time when we converted our pattern matching code from
using the VW parser with our pattern matching extensions to use the new
RB parser with pattern matching, the time to run Smalllint on the image
was cut in half even though our parser was quite a bit slower than the
VW parser. I don't remember everything that was done, but I think that
most of the speedup came from having special pattern AST nodes and the
small dictionary.

John Brant

Very interesting! Thanks John!

As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the mailing list in the future, or at least might make it easier to answer them.

On 8 June 2018, at 13:01, Andres Valloud wrote:

In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely cached by the first one).

Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it?

On 8 Jun 2018, at 10:01, Clément Bera wrote:

Hi Max,

Theoretically, for a small number of elements, usually somewhere between 3
and 30 depending on implementations, a linear search is faster than a hash
search, especially in the Pharo dictionary hash search implementation.

Efficient dictionary implementations are usually bucket-based. The
dictionary holds a certain number of buckets, and based on the key hash,
the bucket where the key value is present is determined. Small buckets are
linear (arrays or linked list). Large buckets are typically balanced binary
trees (red-black trees typically). Under a certain number of elements there
is a single bucket, which means a linear search is performed, as for the
SmallDictionary. When it grows the dictionary search becomes a combination
between a hash search and a linear or tree search.

Pharo dictionary search is first hash-based, then all the buckets are
represented next to each other in the same arrays and a linear search is
performed there, leading to many collisions and slower search time
(especially when the element is not found), sometimes the code searches
over multiple buckets because the dictionary is too full or there are too
many near-collisions. The memory consumption is competitive with the
advanced implementations though (precise measurements would need to be
made).

Method dictionaries are represented differently to optimize the look-up
logic.

If you want to improve things and have one dictionary implementation
instead of two, implement or look for a bucket based dictionary and put it
in the base image instead of Dictionary. This is quite some work since
there are many APIs to port. You can look at the Pinnochio implementation,
it's quite good but they've not implemented large buckets.

Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)

On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:

Hi,

I was messing around with SmallDictionary when I suddenly realised that I
can't find a single reason to use it over a normal Dictionary. While its
name and class comment imply that it is somehow an optimised Dictionary, I
don't see any measurement where that actually holds up. The following was
run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and
significantly faster while reading and writing (I know that this isn't a
good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
curious to know how SmallDictionary came to be. There must have been some
advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information: Pharo-7.0+alpha.build.961.sha.
a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

Clément Béra
https://clementbera.github.io/

https://clementbera.wordpress.com/

Peter Crowther-2

Re: [Pharo-dev] Why do we have SmallDictionary?

In an environment with garbage collection, it's worth considering the overhead of one's representation on the memory allocator and the garbage collector. Open addressing adds objects - the exact number of these varies - and hence adds GC overhead (= paying extra cycles for the life of the dictionary even if it never changes). Each object also has some bytes of overhead for headers, so chaining becomes less efficient as collisions occur due to extra object allocations.

Also, modern processors and memory systems are very good at reading forward in memory, as there's often a wide memory bus (and sometimes other optimisations). This means it's often quick to read the next entry as it's already been pulled in due to adjacency. By contrast, chaining puts collisions all over memory, reducing adjacency and increasing the bandwidth required to memory.

I'm unconvinced that chaining dictionaries are sensible in modern languages with garbage collection on modern architectures where bandwidth to memory, and cache space, are both scarce resources. The days of memory being truly random-access are long gone, and too few programmers realise the impact that can have.

Cheers,

- Peter

On 29 June 2018 at 08:27, Clément Bera <[hidden email]> wrote:

Hum, about dictionary implementations, I've just found this [1]:
Open addressing vs. chaining
Chaining Open addressing
Collision resolution Using external data structure Using hash table itself
Memory waste Pointer size overhead per entry (storing list heads in the table) No overhead ¹
Performance dependence on table's load factor Directly proportional Proportional to (loadFactor) / (1 - loadFactor)
Allow to store more items, than hash table size Yes No. Moreover, it's recommended to keep table's load factor below 0.7
Hash function requirements Uniform distribution Uniform distribution, should avoid clustering
Handle removals Removals are ok Removals clog the hash table with "DELETED" entries
Implementation Simple Correct implementation of open addressing based hash table is quite tricky
¹ Hash tables with chaining can work efficiently even with load factor more than 1. At the same time, tables based on open addressing scheme require load factor not to exceed 0.7 to be efficient. Hence, 30% of slots remain empty, which leads to obvious memory waste.

Do you guys agree with this?

Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6).
Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.

I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is not that easy.

I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really understanding details.

[1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing

On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:

On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:

FWIW it seems there is no SmallDictionary in Squeak.

Oh... Thanks Stéf, I wasn't aware of that.

On 8 June 2018, at 15:13, John Brant wrote:

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.

It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements.
The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.

I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
It was used when doing pattern matching. When it performs a pattern
match against an AST, it puts the potential value of the pattern
variable in the dictionary. If the value is used later in the pattern,
then we can get the previous value and make sure that we have an
equivalent AST. This allows you to write patterns like:

`@a = `@a

to find where someone has the same expression on both sides of the #=
message. Since most patterns have very few pattern variables, these
dictionaries won't hold many entries. Furthermore, we are likely to
abort the match when we have 0 entries.

The original RBSmallDictionary had an #empty method that "emptied" the
dictionary without actually removing any elements -- it just set the
size to 0. In a general dictionary this would lead to memory leaks since
the previous values are still held by the dictionary. However, these
dictionaries were only used during the matching process and went away
after the process completed.

Anyway, at the time when we converted our pattern matching code from
using the VW parser with our pattern matching extensions to use the new
RB parser with pattern matching, the time to run Smalllint on the image
was cut in half even though our parser was quite a bit slower than the
VW parser. I don't remember everything that was done, but I think that
most of the speedup came from having special pattern AST nodes and the
small dictionary.

John Brant

Very interesting! Thanks John!

As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the mailing list in the future, or at least might make it easier to answer them.

On 8 June 2018, at 13:01, Andres Valloud wrote:

In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely cached by the first one).

Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it?

On 8 Jun 2018, at 10:01, Clément Bera wrote:

Hi Max,

Theoretically, for a small number of elements, usually somewhere between 3
and 30 depending on implementations, a linear search is faster than a hash
search, especially in the Pharo dictionary hash search implementation.

Efficient dictionary implementations are usually bucket-based. The
dictionary holds a certain number of buckets, and based on the key hash,
the bucket where the key value is present is determined. Small buckets are
linear (arrays or linked list). Large buckets are typically balanced binary
trees (red-black trees typically). Under a certain number of elements there
is a single bucket, which means a linear search is performed, as for the
SmallDictionary. When it grows the dictionary search becomes a combination
between a hash search and a linear or tree search.

Pharo dictionary search is first hash-based, then all the buckets are
represented next to each other in the same arrays and a linear search is
performed there, leading to many collisions and slower search time
(especially when the element is not found), sometimes the code searches
over multiple buckets because the dictionary is too full or there are too
many near-collisions. The memory consumption is competitive with the
advanced implementations though (precise measurements would need to be
made).

Method dictionaries are represented differently to optimize the look-up
logic.

If you want to improve things and have one dictionary implementation
instead of two, implement or look for a bucket based dictionary and put it
in the base image instead of Dictionary. This is quite some work since
there are many APIs to port. You can look at the Pinnochio implementation,
it's quite good but they've not implemented large buckets.

Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)

On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:

Hi,

I was messing around with SmallDictionary when I suddenly realised that I
can't find a single reason to use it over a normal Dictionary. While its
name and class comment imply that it is somehow an optimised Dictionary, I
don't see any measurement where that actually holds up. The following was
run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and
significantly faster while reading and writing (I know that this isn't a
good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
curious to know how SmallDictionary came to be. There must have been some
advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information: Pharo-7.0+alpha.build.961.sha.
a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

Levente Uzonyi

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Clément Béra

On Fri, 29 Jun 2018, Clément Bera wrote:

> Do you guys agree with this?

Mostly. I don't think open addressing's performance is proporional to
loadFactor / (1 - loadFactor). That formula would be correct if you had a
an array filled up to loadFactor randomly, but linear probing creates
longer chains due to clustering, so it's worse than that in practice.
Here's a small table based on my measurements:

lf formula measurement
0.5 1.0 ~1.5
0.55 1.222 ~1.96
0.6 1.5 ~2.62
0.65 1.857 ~3.56
0.7 2.333 ~5.06
0.75 3.0 ~7.5
0.8 4.0 ~12.0

In Squeak load factor normally varies between 0.5 and 0.75, but it can be
anything between 0.0 and 0.8 (only compaction can push load factor
above 0.75), so performance can differ quite a bit. Based on these numbers
0.7 sounds like a reasonable upper limit - something we might want to
consider.
You might think that linear probing's clustering effect costs too much,
but because of cache locality this is probably still faster than linked
lists or double hashing.

It is also irrelevant if you can store more elements than the size of the
hash table, because that breaks the precondition of the use of hash
tables, therefore the contract too. So, the promised O(1) runtime of
operations is also gone in that case.

Finally, removals do not clog the hash tables using open addressing
unless you use lazy deletion[1], which is something you should avoid.

>
> Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions,
> at least in Pharo 6).

Proactively removing the garbage - aka #fixCollisionsFrom: - takes O(chain
length) time, which is amortized O(1) if your hash function is good
enough.
Lazy deletion degrades lookup performance over time, because lookups
have to treat deleted entries as existing entries. It makes deletion
somewhat quicker, because you can stop the lookup procedure as soon as you
find your element, but what you save is amortized O(1).

> Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.

Because hash tables in general don't do such thing. During my time at the
university, it was one of the tricky questions why hash tables use linked
lists instead of balanced binary trees for collision resolution.
The answer is that you don't need them, because the contract of hash
tables is that you'll (practically) not have long chains if your hash
function is good enough.
Using binary trees for collision resolution mitigate situations when your
hash function is not good enough or when you don't have full control over
your hash values. But those are fairly rare in practice, and you still
lose the O(1) operation runtime cost.
Binary trees also require your elements to have a total order defined
on them.

>
> I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree
> that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only
> one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is
> not that easy.

It's easy to make indexing mistakes, especially when wrap-around is
involved. Also, deletion is tricky and is often left as an exercise for
the reader. Even the wikipedia article[2] has quite complex pseudocode for
it. The version you find in Squeak has a few hidden tricks too, but those
are there for performance and the code would still work without them.

Levente

[1] https://en.wikipedia.org/wiki/Lazy_deletion
[2] https://en.wikipedia.org/wiki/Open_addressing

>
> I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented
> naive open addressing up until now without really understanding details.
>
> [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing
>
> On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:
>
>
> On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:
>
> FWIW it seems there is no SmallDictionary in Squeak.
>
>
> Oh... Thanks Stéf, I wasn't aware of that.
>
>
>
> On 8 June 2018, at 15:13, John Brant wrote:
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how
> SmallDictionary came to be. There must have been some advantage over Dictionary at some point in
> the past.
>
>
> It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with
> <10 elements.
> The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.
>
>
> I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
> It was used when doing pattern matching. When it performs a pattern
> match against an AST, it puts the potential value of the pattern
> variable in the dictionary. If the value is used later in the pattern,
> then we can get the previous value and make sure that we have an
> equivalent AST. This allows you to write patterns like:
>
> `@a = `@a
>
> to find where someone has the same expression on both sides of the #=
> message. Since most patterns have very few pattern variables, these
> dictionaries won't hold many entries. Furthermore, we are likely to
> abort the match when we have 0 entries.
>
> The original RBSmallDictionary had an #empty method that "emptied" the
> dictionary without actually removing any elements -- it just set the
> size to 0. In a general dictionary this would lead to memory leaks since
> the previous values are still held by the dictionary. However, these
> dictionaries were only used during the matching process and went away
> after the process completed.
>
> Anyway, at the time when we converted our pattern matching code from
> using the VW parser with our pattern matching extensions to use the new
> RB parser with pattern matching, the time to run Smalllint on the image
> was cut in half even though our parser was quite a bit slower than the
> VW parser. I don't remember everything that was done, but I think that
> most of the speedup came from having special pattern AST nodes and the
> small dictionary.
>
>
> John Brant
>
>
>
> Very interesting! Thanks John!
>
> As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to
> have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a
> deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it
> seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of
> questions on the mailing list in the future, or at least might make it easier to answer them.
>
>
>
>
> On 8 June 2018, at 13:01, Andres Valloud wrote:
>
> In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and
> if the first probe misses the second one was likely cached by the first one).
>
>
>
> Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect,
> wouldn't it?
>
>
> On 8 Jun 2018, at 10:01, Clément Bera wrote:
>
> Hi Max,
>
> Theoretically, for a small number of elements, usually somewhere between 3
> and 30 depending on implementations, a linear search is faster than a hash
> search, especially in the Pharo dictionary hash search implementation.
>
> Efficient dictionary implementations are usually bucket-based. The
> dictionary holds a certain number of buckets, and based on the key hash,
> the bucket where the key value is present is determined. Small buckets are
> linear (arrays or linked list). Large buckets are typically balanced binary
> trees (red-black trees typically). Under a certain number of elements there
> is a single bucket, which means a linear search is performed, as for the
> SmallDictionary. When it grows the dictionary search becomes a combination
> between a hash search and a linear or tree search.
>
> Pharo dictionary search is first hash-based, then all the buckets are
> represented next to each other in the same arrays and a linear search is
> performed there, leading to many collisions and slower search time
> (especially when the element is not found), sometimes the code searches
> over multiple buckets because the dictionary is too full or there are too
> many near-collisions. The memory consumption is competitive with the
> advanced implementations though (precise measurements would need to be
> made).
>
> Method dictionaries are represented differently to optimize the look-up
> logic.
>
> If you want to improve things and have one dictionary implementation
> instead of two, implement or look for a bucket based dictionary and put it
> in the base image instead of Dictionary. This is quite some work since
> there are many APIs to port. You can look at the Pinnochio implementation,
> it's quite good but they've not implemented large buckets.
>
>
> Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)
>
>
>
>
> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:
>
> Hi,
>
> I was messing around with SmallDictionary when I suddenly realised that I
> can't find a single reason to use it over a normal Dictionary. While its
> name and class comment imply that it is somehow an optimised Dictionary, I
> don't see any measurement where that actually holds up. The following was
> run in a Pharo 7 image on a recent VM (see below):
>
> | d |
> d := SmallDictionary new.
> d sizeInMemory. "24"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>
>
>
> | d |
> d := Dictionary new.
> d sizeInMemory. "16"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>
>
> As you can see, SmallDictionary is 8 bytes larger per instance and
> significantly faster while reading and writing (I know that this isn't a
> good benchmark but it suffices to make my point).
>
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
> curious to know how SmallDictionary came to be. There must have been some
> advantage over Dictionary at some point in the past.
>
>
> Cheers,
> Max
>
>
>
>
>
> Image version: Pharo 7.0
> Build information: Pharo-7.0+alpha.build.961.sha.
> a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>
> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
> Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> OS: macOS 10.13.5
> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>
>
>
>
> --
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/
>
>
>
>
>
>
>
> --
> Clément Béra
> https://clementbera.github.io/https://clementbera.wordpress.com/
>
>

Clément Béra

Re: [Pharo-dev] Why do we have SmallDictionary?

Thanks for the detailed answer. I now understand better why Smalltalk dictionaries are this way.

Best,

On Sat, Jun 30, 2018 at 2:05 AM, Levente Uzonyi <[hidden email]> wrote:

On Fri, 29 Jun 2018, Clément Bera wrote:

Do you guys agree with this?

Mostly. I don't think open addressing's performance is proporional to loadFactor / (1 - loadFactor). That formula would be correct if you had a an array filled up to loadFactor randomly, but linear probing creates longer chains due to clustering, so it's worse than that in practice.
Here's a small table based on my measurements:

lf formula measurement
0.5 1.0 ~1.5
0.55 1.222 ~1.96
0.6 1.5 ~2.62
0.65 1.857 ~3.56
0.7 2.333 ~5.06
0.75 3.0 ~7.5
0.8 4.0 ~12.0

In Squeak load factor normally varies between 0.5 and 0.75, but it can be anything between 0.0 and 0.8 (only compaction can push load factor above 0.75), so performance can differ quite a bit. Based on these numbers 0.7 sounds like a reasonable upper limit - something we might want to consider.
You might think that linear probing's clustering effect costs too much, but because of cache locality this is probably still faster than linked lists or double hashing.

It is also irrelevant if you can store more elements than the size of the hash table, because that breaks the precondition of the use of hash tables, therefore the contract too. So, the promised O(1) runtime of operations is also gone in that case.

Finally, removals do not clog the hash tables using open addressing unless you use lazy deletion[1], which is something you should avoid.

Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions,
at least in Pharo 6).

Proactively removing the garbage - aka #fixCollisionsFrom: - takes O(chain length) time, which is amortized O(1) if your hash function is good enough.
Lazy deletion degrades lookup performance over time, because lookups have to treat deleted entries as existing entries. It makes deletion somewhat quicker, because you can stop the lookup procedure as soon as you find your element, but what you save is amortized O(1).

Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.

Because hash tables in general don't do such thing. During my time at the
university, it was one of the tricky questions why hash tables use linked lists instead of balanced binary trees for collision resolution.
The answer is that you don't need them, because the contract of hash
tables is that you'll (practically) not have long chains if your hash
function is good enough.
Using binary trees for collision resolution mitigate situations when your hash function is not good enough or when you don't have full control over your hash values. But those are fairly rare in practice, and you still lose the O(1) operation runtime cost.
Binary trees also require your elements to have a total order defined on them.

I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree
that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only
one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is
not that easy.

It's easy to make indexing mistakes, especially when wrap-around is involved. Also, deletion is tricky and is often left as an exercise for the reader. Even the wikipedia article[2] has quite complex pseudocode for it. The version you find in Squeak has a few hidden tricks too, but those are there for performance and the code would still work without them.

Levente

[1] https://en.wikipedia.org/wiki/Lazy_deletion
[2] https://en.wikipedia.org/wiki/Open_addressing

I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented
naive open addressing up until now without really understanding details.

[1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing

On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:

On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:

FWIW it seems there is no SmallDictionary in Squeak.

Oh... Thanks Stéf, I wasn't aware of that.

On 8 June 2018, at 15:13, John Brant wrote:

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how
SmallDictionary came to be. There must have been some advantage over Dictionary at some point in
the past.

It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with
<10 elements.
The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.

I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
It was used when doing pattern matching. When it performs a pattern
match against an AST, it puts the potential value of the pattern
variable in the dictionary. If the value is used later in the pattern,
then we can get the previous value and make sure that we have an
equivalent AST. This allows you to write patterns like:

`@a = `@a

to find where someone has the same expression on both sides of the #=
message. Since most patterns have very few pattern variables, these
dictionaries won't hold many entries. Furthermore, we are likely to
abort the match when we have 0 entries.

The original RBSmallDictionary had an #empty method that "emptied" the
dictionary without actually removing any elements -- it just set the
size to 0. In a general dictionary this would lead to memory leaks since
the previous values are still held by the dictionary. However, these
dictionaries were only used during the matching process and went away
after the process completed.

Anyway, at the time when we converted our pattern matching code from
using the VW parser with our pattern matching extensions to use the new
RB parser with pattern matching, the time to run Smalllint on the image
was cut in half even though our parser was quite a bit slower than the
VW parser. I don't remember everything that was done, but I think that
most of the speedup came from having special pattern AST nodes and the
small dictionary.

John Brant

Very interesting! Thanks John!

As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to
have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a
deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it
seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of
questions on the mailing list in the future, or at least might make it easier to answer them.

On 8 June 2018, at 13:01, Andres Valloud wrote:

In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and
if the first probe misses the second one was likely cached by the first one).

Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect,
wouldn't it?

On 8 Jun 2018, at 10:01, Clément Bera wrote:

Hi Max,

Theoretically, for a small number of elements, usually somewhere between 3
and 30 depending on implementations, a linear search is faster than a hash
search, especially in the Pharo dictionary hash search implementation.

Efficient dictionary implementations are usually bucket-based. The
dictionary holds a certain number of buckets, and based on the key hash,
the bucket where the key value is present is determined. Small buckets are
linear (arrays or linked list). Large buckets are typically balanced binary
trees (red-black trees typically). Under a certain number of elements there
is a single bucket, which means a linear search is performed, as for the
SmallDictionary. When it grows the dictionary search becomes a combination
between a hash search and a linear or tree search.

Pharo dictionary search is first hash-based, then all the buckets are
represented next to each other in the same arrays and a linear search is
performed there, leading to many collisions and slower search time
(especially when the element is not found), sometimes the code searches
over multiple buckets because the dictionary is too full or there are too
many near-collisions. The memory consumption is competitive with the
advanced implementations though (precise measurements would need to be
made).

Method dictionaries are represented differently to optimize the look-up
logic.

If you want to improve things and have one dictionary implementation
instead of two, implement or look for a bucket based dictionary and put it
in the base image instead of Dictionary. This is quite some work since
there are many APIs to port. You can look at the Pinnochio implementation,
it's quite good but they've not implemented large buckets.

Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)

On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:

Hi,

I was messing around with SmallDictionary when I suddenly realised that I
can't find a single reason to use it over a normal Dictionary. While its
name and class comment imply that it is somehow an optimised Dictionary, I
don't see any measurement where that actually holds up. The following was
run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and
significantly faster while reading and writing (I know that this isn't a
good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
curious to know how SmallDictionary came to be. There must have been some
advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information: Pharo-7.0+alpha.build.961.sha.
a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

--
Clément Béra
https://clementbera.github.io/https://clementbera.wordpress.com/

Clément Béra
https://clementbera.github.io/

https://clementbera.wordpress.com/

Chris Muller-3

Re: [Pharo-dev] Why do we have SmallDictionary?

In reply to this post by Peter Crowther-2

This discussion has limited its context to comparing *in-memory* Dictionary implementations, where open-addressing has an advantage.

Using Magma reveals another context where chained Dictionary's can have some advantages: When they're large, persistent, and shared in a multi-user system -- e.g., when grow rehashing is not an option.

Not only do chained Dictionary's make that scenario possible, they improve the concurrency -- different users can add/change/remove from the Dictionary as long as they occur at different keys. With regular Dictionary's, concurrency guarantees a commit conflict.

- Chris

On Fri, Jun 29, 2018 at 2:56 PM, Peter Crowther <[hidden email]> wrote:

In an environment with garbage collection, it's worth considering the overhead of one's representation on the memory allocator and the garbage collector. Open addressing adds objects - the exact number of these varies - and hence adds GC overhead (= paying extra cycles for the life of the dictionary even if it never changes). Each object also has some bytes of overhead for headers, so chaining becomes less efficient as collisions occur due to extra object allocations.

Also, modern processors and memory systems are very good at reading forward in memory, as there's often a wide memory bus (and sometimes other optimisations). This means it's often quick to read the next entry as it's already been pulled in due to adjacency. By contrast, chaining puts collisions all over memory, reducing adjacency and increasing the bandwidth required to memory.

I'm unconvinced that chaining dictionaries are sensible in modern languages with garbage collection on modern architectures where bandwidth to memory, and cache space, are both scarce resources. The days of memory being truly random-access are long gone, and too few programmers realise the impact that can have.

Cheers,

- Peter

On 29 June 2018 at 08:27, Clément Bera <[hidden email]> wrote:
Hum, about dictionary implementations, I've just found this [1]:
Open addressing vs. chaining
Chaining Open addressing
Collision resolution Using external data structure Using hash table itself
Memory waste Pointer size overhead per entry (storing list heads in the table) No overhead ¹
Performance dependence on table's load factor Directly proportional Proportional to (loadFactor) / (1 - loadFactor)
Allow to store more items, than hash table size Yes No. Moreover, it's recommended to keep table's load factor below 0.7
Hash function requirements Uniform distribution Uniform distribution, should avoid clustering
Handle removals Removals are ok Removals clog the hash table with "DELETED" entries
Implementation Simple Correct implementation of open addressing based hash table is quite tricky
¹ Hash tables with chaining can work efficiently even with load factor more than 1. At the same time, tables based on open addressing scheme require load factor not to exceed 0.7 to be efficient. Hence, 30% of slots remain empty, which leads to obvious memory waste.

Do you guys agree with this?

Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6).
Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.

I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is not that easy.

I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really understanding details.

[1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing

On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:

On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:

FWIW it seems there is no SmallDictionary in Squeak.

Oh... Thanks Stéf, I wasn't aware of that.

On 8 June 2018, at 15:13, John Brant wrote:

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.

It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements.
The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.

I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
It was used when doing pattern matching. When it performs a pattern
match against an AST, it puts the potential value of the pattern
variable in the dictionary. If the value is used later in the pattern,
then we can get the previous value and make sure that we have an
equivalent AST. This allows you to write patterns like:

`@a = `@a

to find where someone has the same expression on both sides of the #=
message. Since most patterns have very few pattern variables, these
dictionaries won't hold many entries. Furthermore, we are likely to
abort the match when we have 0 entries.

The original RBSmallDictionary had an #empty method that "emptied" the
dictionary without actually removing any elements -- it just set the
size to 0. In a general dictionary this would lead to memory leaks since
the previous values are still held by the dictionary. However, these
dictionaries were only used during the matching process and went away
after the process completed.

Anyway, at the time when we converted our pattern matching code from
using the VW parser with our pattern matching extensions to use the new
RB parser with pattern matching, the time to run Smalllint on the image
was cut in half even though our parser was quite a bit slower than the
VW parser. I don't remember everything that was done, but I think that
most of the speedup came from having special pattern AST nodes and the
small dictionary.

John Brant

Very interesting! Thanks John!

As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the mailing list in the future, or at least might make it easier to answer them.

On 8 June 2018, at 13:01, Andres Valloud wrote:

In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely cached by the first one).

Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it?

On 8 Jun 2018, at 10:01, Clément Bera wrote:

Hi Max,

Theoretically, for a small number of elements, usually somewhere between 3
and 30 depending on implementations, a linear search is faster than a hash
search, especially in the Pharo dictionary hash search implementation.

Efficient dictionary implementations are usually bucket-based. The
dictionary holds a certain number of buckets, and based on the key hash,
the bucket where the key value is present is determined. Small buckets are
linear (arrays or linked list). Large buckets are typically balanced binary
trees (red-black trees typically). Under a certain number of elements there
is a single bucket, which means a linear search is performed, as for the
SmallDictionary. When it grows the dictionary search becomes a combination
between a hash search and a linear or tree search.

Pharo dictionary search is first hash-based, then all the buckets are
represented next to each other in the same arrays and a linear search is
performed there, leading to many collisions and slower search time
(especially when the element is not found), sometimes the code searches
over multiple buckets because the dictionary is too full or there are too
many near-collisions. The memory consumption is competitive with the
advanced implementations though (precise measurements would need to be
made).

Method dictionaries are represented differently to optimize the look-up
logic.

If you want to improve things and have one dictionary implementation
instead of two, implement or look for a bucket based dictionary and put it
in the base image instead of Dictionary. This is quite some work since
there are many APIs to port. You can look at the Pinnochio implementation,
it's quite good but they've not implemented large buckets.

Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)

On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:

Hi,

I was messing around with SmallDictionary when I suddenly realised that I
can't find a single reason to use it over a normal Dictionary. While its
name and class comment imply that it is somehow an optimised Dictionary, I
don't see any measurement where that actually holds up. The following was
run in a Pharo 7 image on a recent VM (see below):

| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"

[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"

| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"

As you can see, SmallDictionary is 8 bytes larger per instance and
significantly faster while reading and writing (I know that this isn't a
good benchmark but it suffices to make my point).

Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
curious to know how SmallDictionary came to be. There must have been some
advantage over Dictionary at some point in the past.

Cheers,
Max

Image version: Pharo 7.0
Build information: Pharo-7.0+alpha.build.961.sha.
a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)

VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

OS: macOS 10.13.5
Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

--
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/

Levente Uzonyi

Re: [Pharo-dev] Why do we have SmallDictionary?

On Sat, 30 Jun 2018, Chris Muller wrote:

> This discussion has limited its context to comparing *in-memory* Dictionary implementations, where open-addressing has an advantage.
> Using Magma reveals another context where chained Dictionary's can have some advantages: When they're large, persistent, and shared in a multi-user system -- e.g., when grow rehashing is not an option.

B-trees were designed for that use case.
If growing is not an option, then your dictionary's performance will
degrade as its size increases, won't it?

>
> Not only do chained Dictionary's make that scenario possible, they improve the concurrency -- different users can add/change/remove from the Dictionary as long as they occur at different keys. With regular
> Dictionary's, concurrency guarantees a commit conflict.

If you created a new, larger dictionary on grow and lazily copied the
entries from the smaller to the larger upon access, then I think you could
use open addressing hash tables.
Also, I don't see the concurrency issue with open addressing, because as
long as different users modify different chains, the case of concurrency
is the same as what you described, isn't it?

Levente

>
> - Chris
>
> On Fri, Jun 29, 2018 at 2:56 PM, Peter Crowther <[hidden email]> wrote:
> In an environment with garbage collection, it's worth considering the overhead of one's representation on the memory allocator and the garbage collector. Open addressing adds objects - the exact
> number of these varies - and hence adds GC overhead (= paying extra cycles for the life of the dictionary even if it never changes). Each object also has some bytes of overhead for headers, so
> chaining becomes less efficient as collisions occur due to extra object allocations.
>
> Also, modern processors and memory systems are very good at reading forward in memory, as there's often a wide memory bus (and sometimes other optimisations). This means it's often quick to read the
> next entry as it's already been pulled in due to adjacency. By contrast, chaining puts collisions all over memory, reducing adjacency and increasing the bandwidth required to memory.
>
> I'm unconvinced that chaining dictionaries are sensible in modern languages with garbage collection on modern architectures where bandwidth to memory, and cache space, are both scarce resources. The
> days of memory being truly random-access are long gone, and too few programmers realise the impact that can have.
>
> Cheers,
>
> - Peter
>
> On 29 June 2018 at 08:27, Clément Bera <[hidden email]> wrote:
> Hum, about dictionary implementations, I've just found this [1]:
>
> OPEN ADDRESSING VS. CHAINING
>
>
> Chaining
> Open addressing
> Collision resolution
> Using external data structure
> Using hash table itself
> Memory waste
> Pointer size overhead per entry (storing list heads in the table)
> No overhead 1
> Performance dependence on table's load factor
> Directly proportional
> Proportional to (loadFactor) / (1 - loadFactor)
> Allow to store more items, than hash table size
> Yes
> No. Moreover, it's recommended to keep table's load factor below 0.7
> Hash function requirements
> Uniform distribution
> Uniform distribution, should avoid clustering
> Handle removals
> Removals are ok
> Removals clog the hash table with "DELETED" entries
> Implementation
> Simple
> Correct implementation of open addressing based hash table is quite tricky1 Hash tables with chaining can work efficiently even with load factor more than 1. At the same time, tables based
> on open addressing scheme require load factor not to exceed 0.7 to be efficient. Hence, 30% of slots remain empty, which leads to obvious memory waste.
>
> Do you guys agree with this?
>
> Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6).
> Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation.
>
> I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but
> in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good
> hash collection size (See HashTableSizes) is not that easy.
>
> I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really
> understanding details.
>
> [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing
>
> On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:
>
>
> On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:
>
> FWIW it seems there is no SmallDictionary in Squeak.
>
>
> Oh... Thanks Stéf, I wasn't aware of that.
>
>
>
> On 8 June 2018, at 15:13, John Brant wrote:
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some
> advantage over Dictionary at some point in the past.
>
>
> It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements.
> The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.
>
>
> I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
> It was used when doing pattern matching. When it performs a pattern
> match against an AST, it puts the potential value of the pattern
> variable in the dictionary. If the value is used later in the pattern,
> then we can get the previous value and make sure that we have an
> equivalent AST. This allows you to write patterns like:
>
> `@a = `@a
>
> to find where someone has the same expression on both sides of the #=
> message. Since most patterns have very few pattern variables, these
> dictionaries won't hold many entries. Furthermore, we are likely to
> abort the match when we have 0 entries.
>
> The original RBSmallDictionary had an #empty method that "emptied" the
> dictionary without actually removing any elements -- it just set the
> size to 0. In a general dictionary this would lead to memory leaks since
> the previous values are still held by the dictionary. However, these
> dictionaries were only used during the matching process and went away
> after the process completed.
>
> Anyway, at the time when we converted our pattern matching code from
> using the VW parser with our pattern matching extensions to use the new
> RB parser with pattern matching, the time to run Smalllint on the image
> was cut in half even though our parser was quite a bit slower than the
> VW parser. I don't remember everything that was done, but I think that
> most of the speedup came from having special pattern AST nodes and the
> small dictionary.
>
>
> John Brant
>
>
>
> Very interesting! Thanks John!
>
> As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would
> let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks
> is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the
> mailing list in the future, or at least might make it easier to answer them.
>
>
>
>
> On 8 June 2018, at 13:01, Andres Valloud wrote:
>
> In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely
> cached by the first one).
>
>
>
> Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it?
>
>
> On 8 Jun 2018, at 10:01, Clément Bera wrote:
>
> Hi Max,
>
> Theoretically, for a small number of elements, usually somewhere between 3
> and 30 depending on implementations, a linear search is faster than a hash
> search, especially in the Pharo dictionary hash search implementation.
>
> Efficient dictionary implementations are usually bucket-based. The
> dictionary holds a certain number of buckets, and based on the key hash,
> the bucket where the key value is present is determined. Small buckets are
> linear (arrays or linked list). Large buckets are typically balanced binary
> trees (red-black trees typically). Under a certain number of elements there
> is a single bucket, which means a linear search is performed, as for the
> SmallDictionary. When it grows the dictionary search becomes a combination
> between a hash search and a linear or tree search.
>
> Pharo dictionary search is first hash-based, then all the buckets are
> represented next to each other in the same arrays and a linear search is
> performed there, leading to many collisions and slower search time
> (especially when the element is not found), sometimes the code searches
> over multiple buckets because the dictionary is too full or there are too
> many near-collisions. The memory consumption is competitive with the
> advanced implementations though (precise measurements would need to be
> made).
>
> Method dictionaries are represented differently to optimize the look-up
> logic.
>
> If you want to improve things and have one dictionary implementation
> instead of two, implement or look for a bucket based dictionary and put it
> in the base image instead of Dictionary. This is quite some work since
> there are many APIs to port. You can look at the Pinnochio implementation,
> it's quite good but they've not implemented large buckets.
>
>
> Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;)
>
>
>
>
> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote:
>
> Hi,
>
> I was messing around with SmallDictionary when I suddenly realised that I
> can't find a single reason to use it over a normal Dictionary. While its
> name and class comment imply that it is somehow an optimised Dictionary, I
> don't see any measurement where that actually holds up. The following was
> run in a Pharo 7 image on a recent VM (see below):
>
> | d |
> d := SmallDictionary new.
> d sizeInMemory. "24"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>
>
>
> | d |
> d := Dictionary new.
> d sizeInMemory. "16"
> [100000 timesRepeat: [
> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
> [100000 timesRepeat: [
> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>
>
> As you can see, SmallDictionary is 8 bytes larger per instance and
> significantly faster while reading and writing (I know that this isn't a
> good benchmark but it suffices to make my point).
>
>
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
> curious to know how SmallDictionary came to be. There must have been some
> advantage over Dictionary at some point in the past.
>
>
> Cheers,
> Max
>
>
>
>
>
> Image version: Pharo 7.0
> Build information: Pharo-7.0+alpha.build.961.sha.
> a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>
> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
> Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> OS: macOS 10.13.5
> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>
>
>
>
> --
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/
>
>
>
>
>
>
>
> --
> Clément Béra
> https://clementbera.github.io/https://clementbera.wordpress.com/
>
>
>
>
>
>
>
>
>
>

Chris Muller-4

Re: [Pharo-dev] Why do we have SmallDictionary?

>> This discussion has limited its context to comparing *in-memory*
>> Dictionary implementations, where open-addressing has an advantage.
>> Using Magma reveals another context where chained Dictionary's can have
>> some advantages: When they're large, persistent, and shared in a multi-user
>> system -- e.g., when grow rehashing is not an option.
>
>
> B-trees were designed for that use case.
> If growing is not an option, then your dictionary's performance will degrade
> as its size increases, won't it?

Yes, but only logarithmically. But I was not making a
performance-based comparison, only a possibility-based one.

>> Not only do chained Dictionary's make that scenario possible, they improve
>> the concurrency -- different users can add/change/remove from the Dictionary
>> as long as they occur at different keys. With regular
>> Dictionary's, concurrency guarantees a commit conflict.
>
> If you created a new, larger dictionary on grow and lazily copied the
> entries from the smaller to the larger upon access, then I think you could
> use open addressing hash tables.

That's an interesting idea, but what if it grows beyond "the larger"
again while still many entries in the smaller? Add a third which is
doubled in size again? That sounds like it starts to intrude into the
efficiency of open-addressing (both computation and space), but the
idea still sparks some interesting design thoughts in my head..

> Also, I don't see the concurrency issue with open addressing, because as
> long as different users modify different chains, the case of concurrency is
> the same as what you described, isn't it?

Magma has a class called "MagmaPreallocatedDictionary" which does
exactly this. It's like a regular Dictionary, except for its internal
'array', it utilizes a MagmaArray, which is just like an Array except
large and persistent and can grow in place (so able to maintain its
identity).

However, this alone was not enough to implement your proposal because
of the 'tally'. Clients would each be trying to propose their
specific value for that variable = conflict. Thankfully, Magma has
yet another special class called MagmaCounter which fits that role
beautifully. Designed for multi-user systems, a MagmaCounter provides
a concurrent counter. Instances start at a #value of 0, after which
any number of sessions may simultaneously increment or decrement (even
by more than 1), but no one can actually hard set its value.

Best,
Chris