Hi,
I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below): | d | d := SmallDictionary new. d sizeInMemory. "24" [100000 timesRepeat: [ 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" [100000 timesRepeat: [ d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" | d | d := Dictionary new. d sizeInMemory. "16" [100000 timesRepeat: [ 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" [100000 timesRepeat: [ d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point). Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past. Cheers, Max Image version: Pharo 7.0 Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ OS: macOS 10.13.5 Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) |
Hi,
it was supposed to be optimised for small sizes so I guessed it was about your choice of 100000 elements, but a fast benchmark with small numbers didn’t offered better results. So question is appropriate. Why we have it? Or better: why SmallDictionary is not faster? Even better: is this test good? :) cheers, Esteban > On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote: > > Hi, > > I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point). > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past. > > > Cheers, > Max > > > > > > Image version: Pharo 7.0 > Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) > > VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 > StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 > VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > > OS: macOS 10.13.5 > Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) > |
In reply to this post by Max Leske
Well, I can answer my own question: SmallDictionary is a lot more space
efficient. I think the class comment should clarify the use case for SmallDictionary and mention the performance trade off. Max On 8 Jun 2018, at 8:46, Max Leske wrote: > Hi, > > I was messing around with SmallDictionary when I suddenly realised > that I can't find a single reason to use it over a normal Dictionary. > While its name and class comment imply that it is somehow an optimised > Dictionary, I don't see any measurement where that actually holds up. > The following was run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > As you can see, SmallDictionary is 8 bytes larger per instance and > significantly faster while reading and writing (I know that this isn't > a good benchmark but it suffices to make my point). > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm > also curious to know how SmallDictionary came to be. There must have > been some advantage over Dictionary at some point in the past. > > > Cheers, > Max > > > > > > Image version: Pharo 7.0 > Build information: > Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 > (32 Bit) > > VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: > 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 > StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: > 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 > VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git > $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 > https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > > OS: macOS 10.13.5 > Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) |
In reply to this post by Max Leske
> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote: > > Hi, > > I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past. > It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements. The idea is that for such dictionaries, the overhead of hashing was higher than just linear search. I am sure I did a benchmark back then, but of course this could have changed in the meantime. It would be nice to redo the benchmark. For size: does #sizeInMemory take the internal data into account? e.g for Dictionary there is an Array that contains Associations, you need to count those, too. In the end this is a hack anyway: if needed Dictionary could do things like this transparently… without the user having to decide. And the question is if this is not just a bad side effect of the particular implementation. Marcus |
On 8 Jun 2018, at 9:02, Marcus Denker wrote:
>> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote: >> >> Hi, >> >> I was messing around with SmallDictionary when I suddenly realised >> that I can't find a single reason to use it over a normal Dictionary. >> While its name and class comment imply that it is somehow an >> optimised Dictionary, I don't see any measurement where that actually >> holds up. The following was run in a Pharo 7 image on a recent VM >> (see below): >> >> | d | >> d := SmallDictionary new. >> d sizeInMemory. "24" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" >> >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" >> >> >> >> | d | >> d := Dictionary new. >> d sizeInMemory. "16" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" >> >> > can you try with <10 elements? The results are in! Note that I had to increase the repetitions by a factor or 100. | d | d := SmallDictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532" [10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686" | d | d := Dictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102" [10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567" >> >> Is anyone aware of a reason for hanging on to SmallDictionary? I'm >> also curious to know how SmallDictionary came to be. There must have >> been some advantage over Dictionary at some point in the past. >> > It came from RB… the idea was that there are (in the Refactoring > engine) a lot of very small dictionaries with <10 elements. > The idea is that for such dictionaries, the overhead of hashing was > higher than just linear search. > > I am sure I did a benchmark back then, but of course this could have > changed in the meantime. > > It would be nice to redo the benchmark. > > For size: does #sizeInMemory take the internal data into account? e.g > for Dictionary there is an Array that contains Associations, you need > to count those, too. > > In the end this is a hack anyway: if needed Dictionary could do things > like this transparently… without the user having to decide. And the > question is if this is not just a bad side effect of the particular > implementation. > > Marcus |
In reply to this post by Marcus Denker-4
(Sorry, hit send accidentally).
On 8 Jun 2018, at 9:02, Marcus Denker wrote: >> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote: >> >> Hi, >> >> I was messing around with SmallDictionary when I suddenly realised >> that I can't find a single reason to use it over a normal Dictionary. >> While its name and class comment imply that it is somehow an >> optimised Dictionary, I don't see any measurement where that actually >> holds up. The following was run in a Pharo 7 image on a recent VM >> (see below): >> >> | d | >> d := SmallDictionary new. >> d sizeInMemory. "24" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" >> >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" >> >> >> >> | d | >> d := Dictionary new. >> d sizeInMemory. "16" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" >> >> > can you try with <10 elements? The results are in! Note that I had to increase the repetitions by a factor or 100. | d | d := SmallDictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532" [10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686" | d | d := Dictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102" [10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567" So SmallDictionary is indeed very fast for only few elements. Once the SmallDictionary has to grow its performance decreases rapidly, or at least the grow operation takes a long time. Dictionary still outperforms SmallDictionary though, so I guess the only reason left is space efficency. >> >> Is anyone aware of a reason for hanging on to SmallDictionary? I'm >> also curious to know how SmallDictionary came to be. There must have >> been some advantage over Dictionary at some point in the past. >> > It came from RB… the idea was that there are (in the Refactoring > engine) a lot of very small dictionaries with <10 elements. > The idea is that for such dictionaries, the overhead of hashing was > higher than just linear search. > > I am sure I did a benchmark back then, but of course this could have > changed in the meantime. > > It would be nice to redo the benchmark. > > For size: does #sizeInMemory take the internal data into account? e.g > for Dictionary there is an Array that contains Associations, you need > to count those, too. No, I didn't check for that initially. When I did, it became apparent that the larger the dictionary the more space you save with SmallDictionary. > > In the end this is a hack anyway: if needed Dictionary could do things > like this transparently… without the user having to decide. And the > question is if this is not just a bad side effect of the particular > implementation. Good point. Sounds like what Stefan Marr wrote in the paper where he performed a field study of collection libraries ;) Max > > Marcus |
On 8 Jun 2018, at 9:30, Max Leske wrote:
> (Sorry, hit send accidentally). > > > On 8 Jun 2018, at 9:02, Marcus Denker wrote: > >>> On 8 Jun 2018, at 08:46, Max Leske <[hidden email]> wrote: >>> >>> Hi, >>> >>> I was messing around with SmallDictionary when I suddenly realised >>> that I can't find a single reason to use it over a normal >>> Dictionary. While its name and class comment imply that it is >>> somehow an optimised Dictionary, I don't see any measurement where >>> that actually holds up. The following was run in a Pharo 7 image on >>> a recent VM (see below): >>> >>> | d | >>> d := SmallDictionary new. >>> d sizeInMemory. "24" >>> [100000 timesRepeat: [ >>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" >>> >>> [100000 timesRepeat: [ >>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" >>> >>> >>> >>> | d | >>> d := Dictionary new. >>> d sizeInMemory. "16" >>> [100000 timesRepeat: [ >>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" >>> [100000 timesRepeat: [ >>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" >>> >>> >> can you try with <10 elements? > > The results are in! > Note that I had to increase the repetitions by a factor or 100. > > > | d | > d := SmallDictionary new. > [10000000 timesRepeat: [ > 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532" > > [10000000 timesRepeat: [ > d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686" > > > > | d | > d := Dictionary new. > [10000000 timesRepeat: [ > 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102" > > [10000000 timesRepeat: [ > d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567" > > > So SmallDictionary is indeed very fast for only few elements. Once the > SmallDictionary has to grow its performance decreases rapidly, or at > least the grow operation takes a long time. > Dictionary still outperforms SmallDictionary though, so I guess the > only reason left is space efficency. > > >>> >>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm >>> also curious to know how SmallDictionary came to be. There must have >>> been some advantage over Dictionary at some point in the past. >>> >> It came from RB… the idea was that there are (in the Refactoring >> engine) a lot of very small dictionaries with <10 elements. >> The idea is that for such dictionaries, the overhead of hashing was >> higher than just linear search. >> >> I am sure I did a benchmark back then, but of course this could have >> changed in the meantime. >> >> It would be nice to redo the benchmark. That would be very nice, as I just realised, that the type of object used as key has an effect on the hashing time... There are a lot of variables here, unfortunately. Max >> >> For size: does #sizeInMemory take the internal data into account? e.g >> for Dictionary there is an Array that contains Associations, you need >> to count those, too. > > No, I didn't check for that initially. When I did, it became apparent > that the larger the dictionary the more space you save with > SmallDictionary. > >> >> In the end this is a hack anyway: if needed Dictionary could do >> things like this transparently… without the user having to decide. >> And the question is if this is not just a bad side effect of the >> particular implementation. > > Good point. Sounds like what Stefan Marr wrote in the paper where he > performed a field study of collection libraries ;) > > Max > >> >> Marcus |
In reply to this post by Max Leske
I was messing around with SmallDictionary when I suddenly realised that
> I can't find a single reason to use it over a normal Dictionary. While > its name and class comment imply that it is somehow an optimised > Dictionary, I don't see any measurement where that actually holds up. > The following was run in a Pharo 7 image on a recent VM (see below): FWIW it seems there is no SmallDictionary in Squeak. Stef |
In reply to this post by Max Leske
Hi Max, Theoretically, for a small number of elements, usually somewhere between 3 and 30 depending on implementations, a linear search is faster than a hash search, especially in the Pharo dictionary hash search implementation. Efficient dictionary implementations are usually bucket-based. The dictionary holds a certain number of buckets, and based on the key hash, the bucket where the key value is present is determined. Small buckets are linear (arrays or linked list). Large buckets are typically balanced binary trees (red-black trees typically). Under a certain number of elements there is a single bucket, which means a linear search is performed, as for the SmallDictionary. When it grows the dictionary search becomes a combination between a hash search and a linear or tree search. Pharo dictionary search is first hash-based, then all the buckets are represented next to each other in the same arrays and a linear search is performed there, leading to many collisions and slower search time (especially when the element is not found), sometimes the code searches over multiple buckets because the dictionary is too full or there are too many near-collisions. The memory consumption is competitive with the advanced implementations though (precise measurements would need to be made). Method dictionaries are represented differently to optimize the look-up logic. If you want to improve things and have one dictionary implementation instead of two, implement or look for a bucket based dictionary and put it in the base image instead of Dictionary. This is quite some work since there are many APIs to port. You can look at the Pinnochio implementation, it's quite good but they've not implemented large buckets. On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: Hi, |
On Fri, 8 Jun 2018, Clément Bera wrote:
> Hi Max, > Theoretically, for a small number of elements, usually somewhere between 3 and 30 depending on implementations, a linear search is faster than a hash search, especially in the Pharo dictionary hash search > implementation. > > Efficient dictionary implementations are usually bucket-based. The dictionary holds a certain number of buckets, and based on the key hash, the bucket where the key value is present is determined. Small > buckets are linear (arrays or linked list). Large buckets are typically balanced binary trees (red-black trees typically). Under a certain number of elements there is a single bucket, which means a linear > search is performed, as for the SmallDictionary. When it grows the dictionary search becomes a combination between a hash search and a linear or tree search. The bucket-based list you described is called separate chaining collision resolution, and it won't have any better performance in Squeak than the currently used open addressing. However, it is preferrable, when you can't trust your hash function. When you do "crazy" things, like using a tree as a bucket, you just exchange the possible O(n) worst case time with O(log(n)). > > Pharo dictionary search is first hash-based, then all the buckets are represented next to each other in the same arrays and a linear search is performed there, leading to many collisions and slower search > time (especially when the element is not found), sometimes the code searches over multiple buckets because the dictionary is too full or there are too many near-collisions. The memory consumption is > competitive with the advanced implementations though (precise measurements would need to be made). This is called open addressing, and if your hash function is good enough, the lists are short, so those "many collisions and slower search time" just don't happen. And no, separate chaining needs a lot more memory than open addressing, unless you manage to get rid of Associations, but that's a lot harder than it sounds like. > > Method dictionaries are represented differently to optimize the look-up logic. MethodDictionary is optimized to let the VM do lookups more easily. Memory consumption is worse when the dictionary is sparse, but gets better as it gets filled up compared to the current Dictionary implementation (assuming Pharo still uses the same representation as Squeak does). > > If you want to improve things and have one dictionary implementation instead of two, implement or look for a bucket based dictionary and put it in the base image instead of Dictionary. This is quite some work > since there are many APIs to port. You can look at the Pinnochio implementation, it's quite good but they've not implemented large buckets. Well, as an experiment, I'm sure it'll be fun. But don't expect better performance nor better memory usage. Levente > > > > On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: > Hi, > > I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow > an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > As you can see, SmallDictionary is 8 bytes larger per instance and significantly faster while reading and writing (I know that this isn't a good benchmark but it suffices to make my point). > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the > past. > > > Cheers, > Max > > > > > > Image version: Pharo 7.0 > Build information: Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) > > VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 > StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 > VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > > OS: macOS 10.13.5 > Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) > > > > > -- > Clément Béra > https://clementbera.github.io/https://clementbera.wordpress.com/ > > |
In addition, open addressing with linear probing has superior cache line
read behavior (no indirection / random traversal, and if the first probe misses the second one was likely cached by the first one). On 6/8/18 1:37 , Levente Uzonyi wrote: > On Fri, 8 Jun 2018, Clément Bera wrote: > >> Hi Max, >> Theoretically, for a small number of elements, usually somewhere >> between 3 and 30 depending on implementations, a linear search is >> faster than a hash search, especially in the Pharo dictionary hash search >> implementation. >> >> Efficient dictionary implementations are usually bucket-based. The >> dictionary holds a certain number of buckets, and based on the key >> hash, the bucket where the key value is present is determined. Small >> buckets are linear (arrays or linked list). Large buckets are >> typically balanced binary trees (red-black trees typically). Under a >> certain number of elements there is a single bucket, which means a linear >> search is performed, as for the SmallDictionary. When it grows the >> dictionary search becomes a combination between a hash search and a >> linear or tree search. > > The bucket-based list you described is called separate chaining > collision resolution, and it won't have any better performance in Squeak > than the currently used open addressing. > However, it is preferrable, when you can't trust your hash function. > When you do "crazy" things, like using a tree as a bucket, you just > exchange the possible O(n) worst case time with O(log(n)). > >> >> Pharo dictionary search is first hash-based, then all the buckets are >> represented next to each other in the same arrays and a linear search >> is performed there, leading to many collisions and slower search >> time (especially when the element is not found), sometimes the code >> searches over multiple buckets because the dictionary is too full or >> there are too many near-collisions. The memory consumption is >> competitive with the advanced implementations though (precise >> measurements would need to be made). > > This is called open addressing, and if your hash function is good > enough, the lists are short, so those "many collisions and slower search > time" just don't happen. And no, separate chaining needs a lot more > memory than open addressing, unless you manage to get rid of > Associations, but that's a lot harder than it sounds like. > >> >> Method dictionaries are represented differently to optimize the >> look-up logic. > > MethodDictionary is optimized to let the VM do lookups more easily. > Memory consumption is worse when the dictionary is sparse, but gets > better as it gets filled up compared to the current Dictionary > implementation (assuming Pharo still uses the same representation as > Squeak does). > >> >> If you want to improve things and have one dictionary implementation >> instead of two, implement or look for a bucket based dictionary and >> put it in the base image instead of Dictionary. This is quite some work >> since there are many APIs to port. You can look at the Pinnochio >> implementation, it's quite good but they've not implemented large >> buckets. > > Well, as an experiment, I'm sure it'll be fun. But don't expect better > performance nor better memory usage. > > Levente > >> >> >> >> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: >> Hi, >> >> I was messing around with SmallDictionary when I suddenly >> realised that I can't find a single reason to use it over a normal >> Dictionary. While its name and class comment imply that it is somehow >> an optimised Dictionary, I don't see any measurement where that >> actually holds up. The following was run in a Pharo 7 image on a >> recent VM (see below): >> >> | d | >> d := SmallDictionary new. >> d sizeInMemory. "24" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. >> "0:00:00:05.226" >> >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" >> >> >> >> | d | >> d := Dictionary new. >> d sizeInMemory. "16" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. >> "0:00:00:00.385" >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" >> >> >> As you can see, SmallDictionary is 8 bytes larger per instance >> and significantly faster while reading and writing (I know that this >> isn't a good benchmark but it suffices to make my point). >> >> >> Is anyone aware of a reason for hanging on to SmallDictionary? >> I'm also curious to know how SmallDictionary came to be. There must >> have been some advantage over Dictionary at some point in the >> past. >> >> >> Cheers, >> Max >> >> >> >> >> >> Image version: Pharo 7.0 >> Build information: >> Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84 >> (32 Bit) >> >> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: >> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 >> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: >> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 >> VM: 201711262336 >> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov >> 27 00:36:29 2017 +0100 $ Plugins: 201711262336 >> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ >> >> OS: macOS 10.13.5 >> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) >> >> >> >> >> -- >> Clément Béra >> https://clementbera.github.io/https://clementbera.wordpress.com/ >> >> > > > |
In reply to this post by Max Leske
On Fri, 8 Jun 2018, Max Leske wrote:
> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also > curious to know how SmallDictionary came to be. There must have been > some advantage over Dictionary at some point in the past. IIRC it came from Seaside, where it was used to minimize the memory footprint of sessions. Levente |
In reply to this post by Clément Béra
On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote: > FWIW it seems there is no SmallDictionary in Squeak. Oh... Thanks Stéf, I wasn't aware of that. On 8 June 2018, at 15:13, John Brant wrote: >>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm >>> also curious to know how SmallDictionary came to be. There must have >>> been some advantage over Dictionary at some point in the past. >> >> It came from RB… the idea was that there are (in the Refactoring >> engine) a lot of very small dictionaries with <10 elements. >> The idea is that for such dictionaries, the overhead of hashing was >> higher than just linear search. > > I created its ancestor in VW some 20+ years ago (as > RBSmallDictionary). > It was used when doing pattern matching. When it performs a pattern > match against an AST, it puts the potential value of the pattern > variable in the dictionary. If the value is used later in the pattern, > then we can get the previous value and make sure that we have an > equivalent AST. This allows you to write patterns like: > > `@a = `@a > > to find where someone has the same expression on both sides of the #= > message. Since most patterns have very few pattern variables, these > dictionaries won't hold many entries. Furthermore, we are likely to > abort the match when we have 0 entries. > > The original RBSmallDictionary had an #empty method that "emptied" the > dictionary without actually removing any elements -- it just set the > size to 0. In a general dictionary this would lead to memory leaks > since > the previous values are still held by the dictionary. However, these > dictionaries were only used during the matching process and went away > after the process completed. > > Anyway, at the time when we converted our pattern matching code from > using the VW parser with our pattern matching extensions to use the > new > RB parser with pattern matching, the time to run Smalllint on the > image > was cut in half even though our parser was quite a bit slower than the > VW parser. I don't remember everything that was done, but I think that > most of the speedup came from having special pattern AST nodes and the > small dictionary. > > > John Brant Very interesting! Thanks John! As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the mailing list in the future, or at least might make it easier to answer them. On 8 June 2018, at 13:01, Andres Valloud wrote: > In addition, open addressing with linear probing has superior cache > line read behavior (no indirection / random traversal, and if the > first probe misses the second one was likely cached by the first one). Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it? On 8 Jun 2018, at 10:01, Clément Bera wrote: > Hi Max, > > Theoretically, for a small number of elements, usually somewhere > between 3 > and 30 depending on implementations, a linear search is faster than a > hash > search, especially in the Pharo dictionary hash search implementation. > > Efficient dictionary implementations are usually bucket-based. The > dictionary holds a certain number of buckets, and based on the key > hash, > the bucket where the key value is present is determined. Small buckets > are > linear (arrays or linked list). Large buckets are typically balanced > binary > trees (red-black trees typically). Under a certain number of elements > there > is a single bucket, which means a linear search is performed, as for > the > SmallDictionary. When it grows the dictionary search becomes a > combination > between a hash search and a linear or tree search. > > Pharo dictionary search is first hash-based, then all the buckets are > represented next to each other in the same arrays and a linear search > is > performed there, leading to many collisions and slower search time > (especially when the element is not found), sometimes the code > searches > over multiple buckets because the dictionary is too full or there are > too > many near-collisions. The memory consumption is competitive with the > advanced implementations though (precise measurements would need to be > made). > > Method dictionaries are represented differently to optimize the > look-up > logic. > > If you want to improve things and have one dictionary implementation > instead of two, implement or look for a bucket based dictionary and > put it > in the base image instead of Dictionary. This is quite some work since > there are many APIs to port. You can look at the Pinnochio > implementation, > it's quite good but they've not implemented large buckets. Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;) > > > > On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: > >> Hi, >> >> I was messing around with SmallDictionary when I suddenly realised >> that I >> can't find a single reason to use it over a normal Dictionary. While >> its >> name and class comment imply that it is somehow an optimised >> Dictionary, I >> don't see any measurement where that actually holds up. The following >> was >> run in a Pharo 7 image on a recent VM (see below): >> >> | d | >> d := SmallDictionary new. >> d sizeInMemory. "24" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. >> "0:00:00:05.226" >> >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" >> >> >> >> | d | >> d := Dictionary new. >> d sizeInMemory. "16" >> [100000 timesRepeat: [ >> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. >> "0:00:00:00.385" >> [100000 timesRepeat: [ >> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" >> >> >> As you can see, SmallDictionary is 8 bytes larger per instance and >> significantly faster while reading and writing (I know that this >> isn't a >> good benchmark but it suffices to make my point). >> >> >> Is anyone aware of a reason for hanging on to SmallDictionary? I'm >> also >> curious to know how SmallDictionary came to be. There must have been >> some >> advantage over Dictionary at some point in the past. >> >> >> Cheers, >> Max >> >> >> >> >> >> Image version: Pharo 7.0 >> Build information: Pharo-7.0+alpha.build.961.sha. >> a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) >> >> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: >> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 >> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: >> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 >> VM: 201711262336 >> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ >> Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 >> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ >> >> OS: macOS 10.13.5 >> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) >> >> > > > -- > Clément Béra > https://clementbera.github.io/ > https://clementbera.wordpress.com/ |
Hum, about dictionary implementations, I've just found this [1]: Open addressing vs. chaining
Do you guys agree with this? Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6). Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation. I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is not that easy. I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really understanding details. On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote:
|
In an environment with garbage collection, it's worth considering the overhead of one's representation on the memory allocator and the garbage collector. Open addressing adds objects - the exact number of these varies - and hence adds GC overhead (= paying extra cycles for the life of the dictionary even if it never changes). Each object also has some bytes of overhead for headers, so chaining becomes less efficient as collisions occur due to extra object allocations. Also, modern processors and memory systems are very good at reading forward in memory, as there's often a wide memory bus (and sometimes other optimisations). This means it's often quick to read the next entry as it's already been pulled in due to adjacency. By contrast, chaining puts collisions all over memory, reducing adjacency and increasing the bandwidth required to memory.On 29 June 2018 at 08:27, Clément Bera <[hidden email]> wrote:
|
In reply to this post by Clément Béra
On Fri, 29 Jun 2018, Clément Bera wrote:
> Do you guys agree with this? Mostly. I don't think open addressing's performance is proporional to loadFactor / (1 - loadFactor). That formula would be correct if you had a an array filled up to loadFactor randomly, but linear probing creates longer chains due to clustering, so it's worse than that in practice. Here's a small table based on my measurements: lf formula measurement 0.5 1.0 ~1.5 0.55 1.222 ~1.96 0.6 1.5 ~2.62 0.65 1.857 ~3.56 0.7 2.333 ~5.06 0.75 3.0 ~7.5 0.8 4.0 ~12.0 In Squeak load factor normally varies between 0.5 and 0.75, but it can be anything between 0.0 and 0.8 (only compaction can push load factor above 0.75), so performance can differ quite a bit. Based on these numbers 0.7 sounds like a reasonable upper limit - something we might want to consider. You might think that linear probing's clustering effect costs too much, but because of cache locality this is probably still faster than linked lists or double hashing. It is also irrelevant if you can store more elements than the size of the hash table, because that breaks the precondition of the use of hash tables, therefore the contract too. So, the promised O(1) runtime of operations is also gone in that case. Finally, removals do not clog the hash tables using open addressing unless you use lazy deletion[1], which is something you should avoid. > > Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, > at least in Pharo 6). Proactively removing the garbage - aka #fixCollisionsFrom: - takes O(chain length) time, which is amortized O(1) if your hash function is good enough. Lazy deletion degrades lookup performance over time, because lookups have to treat deleted entries as existing entries. It makes deletion somewhat quicker, because you can stop the lookup procedure as soon as you find your element, but what you save is amortized O(1). > Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation. Because hash tables in general don't do such thing. During my time at the university, it was one of the tricky questions why hash tables use linked lists instead of balanced binary trees for collision resolution. The answer is that you don't need them, because the contract of hash tables is that you'll (practically) not have long chains if your hash function is good enough. Using binary trees for collision resolution mitigate situations when your hash function is not good enough or when you don't have full control over your hash values. But those are fairly rare in practice, and you still lose the O(1) operation runtime cost. Binary trees also require your elements to have a total order defined on them. > > I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree > that it might be easier to implement chaining, but in low level language such as in the VM, IMO open addressing is easier since you've got only > one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good hash collection size (See HashTableSizes) is > not that easy. It's easy to make indexing mistakes, especially when wrap-around is involved. Also, deletion is tricky and is often left as an exercise for the reader. Even the wikipedia article[2] has quite complex pseudocode for it. The version you find in Squeak has a few hidden tricks too, but those are there for performance and the code would still work without them. Levente [1] https://en.wikipedia.org/wiki/Lazy_deletion [2] https://en.wikipedia.org/wiki/Open_addressing > > I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented > naive open addressing up until now without really understanding details. > > [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing > > On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote: > > > On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote: > > FWIW it seems there is no SmallDictionary in Squeak. > > > Oh... Thanks Stéf, I wasn't aware of that. > > > > On 8 June 2018, at 15:13, John Brant wrote: > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how > SmallDictionary came to be. There must have been some advantage over Dictionary at some point in > the past. > > > It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with > <10 elements. > The idea is that for such dictionaries, the overhead of hashing was higher than just linear search. > > > I created its ancestor in VW some 20+ years ago (as RBSmallDictionary). > It was used when doing pattern matching. When it performs a pattern > match against an AST, it puts the potential value of the pattern > variable in the dictionary. If the value is used later in the pattern, > then we can get the previous value and make sure that we have an > equivalent AST. This allows you to write patterns like: > > `@a = `@a > > to find where someone has the same expression on both sides of the #= > message. Since most patterns have very few pattern variables, these > dictionaries won't hold many entries. Furthermore, we are likely to > abort the match when we have 0 entries. > > The original RBSmallDictionary had an #empty method that "emptied" the > dictionary without actually removing any elements -- it just set the > size to 0. In a general dictionary this would lead to memory leaks since > the previous values are still held by the dictionary. However, these > dictionaries were only used during the matching process and went away > after the process completed. > > Anyway, at the time when we converted our pattern matching code from > using the VW parser with our pattern matching extensions to use the new > RB parser with pattern matching, the time to run Smalllint on the image > was cut in half even though our parser was quite a bit slower than the > VW parser. I don't remember everything that was done, but I think that > most of the speedup came from having special pattern AST nodes and the > small dictionary. > > > John Brant > > > > Very interesting! Thanks John! > > As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to > have a benchmark suite for these cases, that would let us monitor the performance and ensure that changes to the codebase don't have a > deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks is hard (for me especially, as it > seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of > questions on the mailing list in the future, or at least might make it easier to answer them. > > > > > On 8 June 2018, at 13:01, Andres Valloud wrote: > > In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and > if the first probe misses the second one was likely cached by the first one). > > > > Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, > wouldn't it? > > > On 8 Jun 2018, at 10:01, Clément Bera wrote: > > Hi Max, > > Theoretically, for a small number of elements, usually somewhere between 3 > and 30 depending on implementations, a linear search is faster than a hash > search, especially in the Pharo dictionary hash search implementation. > > Efficient dictionary implementations are usually bucket-based. The > dictionary holds a certain number of buckets, and based on the key hash, > the bucket where the key value is present is determined. Small buckets are > linear (arrays or linked list). Large buckets are typically balanced binary > trees (red-black trees typically). Under a certain number of elements there > is a single bucket, which means a linear search is performed, as for the > SmallDictionary. When it grows the dictionary search becomes a combination > between a hash search and a linear or tree search. > > Pharo dictionary search is first hash-based, then all the buckets are > represented next to each other in the same arrays and a linear search is > performed there, leading to many collisions and slower search time > (especially when the element is not found), sometimes the code searches > over multiple buckets because the dictionary is too full or there are too > many near-collisions. The memory consumption is competitive with the > advanced implementations though (precise measurements would need to be > made). > > Method dictionaries are represented differently to optimize the look-up > logic. > > If you want to improve things and have one dictionary implementation > instead of two, implement or look for a bucket based dictionary and put it > in the base image instead of Dictionary. This is quite some work since > there are many APIs to port. You can look at the Pinnochio implementation, > it's quite good but they've not implemented large buckets. > > > Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;) > > > > > On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: > > Hi, > > I was messing around with SmallDictionary when I suddenly realised that I > can't find a single reason to use it over a normal Dictionary. While its > name and class comment imply that it is somehow an optimised Dictionary, I > don't see any measurement where that actually holds up. The following was > run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > As you can see, SmallDictionary is 8 bytes larger per instance and > significantly faster while reading and writing (I know that this isn't a > good benchmark but it suffices to make my point). > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also > curious to know how SmallDictionary came to be. There must have been some > advantage over Dictionary at some point in the past. > > > Cheers, > Max > > > > > > Image version: Pharo 7.0 > Build information: Pharo-7.0+alpha.build.961.sha. > a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) > > VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: > 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 > StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: > 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 > VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 > https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > > OS: macOS 10.13.5 > Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) > > > > > -- > Clément Béra > https://clementbera.github.io/ > https://clementbera.wordpress.com/ > > > > > > > > -- > Clément Béra > https://clementbera.github.io/https://clementbera.wordpress.com/ > > |
Thanks for the detailed answer. I now understand better why Smalltalk dictionaries are this way. Best, On Sat, Jun 30, 2018 at 2:05 AM, Levente Uzonyi <[hidden email]> wrote: On Fri, 29 Jun 2018, Clément Bera wrote: |
In reply to this post by Peter Crowther-2
This discussion has limited its context to comparing *in-memory* Dictionary implementations, where open-addressing has an advantage. Using Magma reveals another context where chained Dictionary's can have some advantages: When they're large, persistent, and shared in a multi-user system -- e.g., when grow rehashing is not an option. Not only do chained Dictionary's make that scenario possible, they improve the concurrency -- different users can add/change/remove from the Dictionary as long as they occur at different keys. With regular Dictionary's, concurrency guarantees a commit conflict. - Chris On Fri, Jun 29, 2018 at 2:56 PM, Peter Crowther <[hidden email]> wrote:
|
On Sat, 30 Jun 2018, Chris Muller wrote: > This discussion has limited its context to comparing *in-memory* Dictionary implementations, where open-addressing has an advantage. > Using Magma reveals another context where chained Dictionary's can have some advantages: When they're large, persistent, and shared in a multi-user system -- e.g., when grow rehashing is not an option. B-trees were designed for that use case. If growing is not an option, then your dictionary's performance will degrade as its size increases, won't it? > > Not only do chained Dictionary's make that scenario possible, they improve the concurrency -- different users can add/change/remove from the Dictionary as long as they occur at different keys. With regular > Dictionary's, concurrency guarantees a commit conflict. If you created a new, larger dictionary on grow and lazily copied the entries from the smaller to the larger upon access, then I think you could use open addressing hash tables. Also, I don't see the concurrency issue with open addressing, because as long as different users modify different chains, the case of concurrency is the same as what you described, isn't it? Levente > > - Chris > > On Fri, Jun 29, 2018 at 2:56 PM, Peter Crowther <[hidden email]> wrote: > In an environment with garbage collection, it's worth considering the overhead of one's representation on the memory allocator and the garbage collector. Open addressing adds objects - the exact > number of these varies - and hence adds GC overhead (= paying extra cycles for the life of the dictionary even if it never changes). Each object also has some bytes of overhead for headers, so > chaining becomes less efficient as collisions occur due to extra object allocations. > > Also, modern processors and memory systems are very good at reading forward in memory, as there's often a wide memory bus (and sometimes other optimisations). This means it's often quick to read the > next entry as it's already been pulled in due to adjacency. By contrast, chaining puts collisions all over memory, reducing adjacency and increasing the bandwidth required to memory. > > I'm unconvinced that chaining dictionaries are sensible in modern languages with garbage collection on modern architectures where bandwidth to memory, and cache space, are both scarce resources. The > days of memory being truly random-access are long gone, and too few programmers realise the impact that can have. > > Cheers, > > - Peter > > On 29 June 2018 at 08:27, Clément Bera <[hidden email]> wrote: > Hum, about dictionary implementations, I've just found this [1]: > > OPEN ADDRESSING VS. CHAINING > > > Chaining > Open addressing > Collision resolution > Using external data structure > Using hash table itself > Memory waste > Pointer size overhead per entry (storing list heads in the table) > No overhead 1 > Performance dependence on table's load factor > Directly proportional > Proportional to (loadFactor) / (1 - loadFactor) > Allow to store more items, than hash table size > Yes > No. Moreover, it's recommended to keep table's load factor below 0.7 > Hash function requirements > Uniform distribution > Uniform distribution, should avoid clustering > Handle removals > Removals are ok > Removals clog the hash table with "DELETED" entries > Implementation > Simple > Correct implementation of open addressing based hash table is quite tricky1 Hash tables with chaining can work efficiently even with load factor more than 1. At the same time, tables based > on open addressing scheme require load factor not to exceed 0.7 to be efficient. Hence, 30% of slots remain empty, which leads to obvious memory waste. > > Do you guys agree with this? > > Their open addressing implementation is slightly different from the Smalltalk one (they have deleted entries upon removal while we fix collisions, at least in Pharo 6). > Their chaining implementation does not use balanced binary tree when a bucket becomes large like Java's implementation. > > I don't really understand why an open addressing implementation is trickier to implement, in high level languages such as Smalltalk I may agree that it might be easier to implement chaining, but > in low level language such as in the VM, IMO open addressing is easier since you've got only one large chunk of memory to manage. Maybe it's because of hash collisions, having good hash and good > hash collection size (See HashTableSizes) is not that easy. > > I'm curious because I do more and more bare metal programming and I end up having to implement this kind of things myself, I've always implemented naive open addressing up until now without really > understanding details. > > [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing > > On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <[hidden email]> wrote: > > > On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote: > > FWIW it seems there is no SmallDictionary in Squeak. > > > Oh... Thanks Stéf, I wasn't aware of that. > > > > On 8 June 2018, at 15:13, John Brant wrote: > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some > advantage over Dictionary at some point in the past. > > > It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements. > The idea is that for such dictionaries, the overhead of hashing was higher than just linear search. > > > I created its ancestor in VW some 20+ years ago (as RBSmallDictionary). > It was used when doing pattern matching. When it performs a pattern > match against an AST, it puts the potential value of the pattern > variable in the dictionary. If the value is used later in the pattern, > then we can get the previous value and make sure that we have an > equivalent AST. This allows you to write patterns like: > > `@a = `@a > > to find where someone has the same expression on both sides of the #= > message. Since most patterns have very few pattern variables, these > dictionaries won't hold many entries. Furthermore, we are likely to > abort the match when we have 0 entries. > > The original RBSmallDictionary had an #empty method that "emptied" the > dictionary without actually removing any elements -- it just set the > size to 0. In a general dictionary this would lead to memory leaks since > the previous values are still held by the dictionary. However, these > dictionaries were only used during the matching process and went away > after the process completed. > > Anyway, at the time when we converted our pattern matching code from > using the VW parser with our pattern matching extensions to use the new > RB parser with pattern matching, the time to run Smalllint on the image > was cut in half even though our parser was quite a bit slower than the > VW parser. I don't remember everything that was done, but I think that > most of the speedup came from having special pattern AST nodes and the > small dictionary. > > > John Brant > > > > Very interesting! Thanks John! > > As Marcus has mentioned before in this thread, it would make a lot of sense to run benchmarks again. Actually, I think it would be nice to have a benchmark suite for these cases, that would > let us monitor the performance and ensure that changes to the codebase don't have a deteriorative effect. I'm not saying that it would be easy to make this happen, writing proper benchmarks > is hard (for me especially, as it seems, given my utter failure to think of the edge cases before starting this thread). Such a suite might also prevent these sorts of questions on the > mailing list in the future, or at least might make it easier to answer them. > > > > > On 8 June 2018, at 13:01, Andres Valloud wrote: > > In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely > cached by the first one). > > > > Ah, nice catch! Although that would require frequent access to the dictionary / repeated access to the same items to have an effect, wouldn't it? > > > On 8 Jun 2018, at 10:01, Clément Bera wrote: > > Hi Max, > > Theoretically, for a small number of elements, usually somewhere between 3 > and 30 depending on implementations, a linear search is faster than a hash > search, especially in the Pharo dictionary hash search implementation. > > Efficient dictionary implementations are usually bucket-based. The > dictionary holds a certain number of buckets, and based on the key hash, > the bucket where the key value is present is determined. Small buckets are > linear (arrays or linked list). Large buckets are typically balanced binary > trees (red-black trees typically). Under a certain number of elements there > is a single bucket, which means a linear search is performed, as for the > SmallDictionary. When it grows the dictionary search becomes a combination > between a hash search and a linear or tree search. > > Pharo dictionary search is first hash-based, then all the buckets are > represented next to each other in the same arrays and a linear search is > performed there, leading to many collisions and slower search time > (especially when the element is not found), sometimes the code searches > over multiple buckets because the dictionary is too full or there are too > many near-collisions. The memory consumption is competitive with the > advanced implementations though (precise measurements would need to be > made). > > Method dictionaries are represented differently to optimize the look-up > logic. > > If you want to improve things and have one dictionary implementation > instead of two, implement or look for a bucket based dictionary and put it > in the base image instead of Dictionary. This is quite some work since > there are many APIs to port. You can look at the Pinnochio implementation, > it's quite good but they've not implemented large buckets. > > > Thanks for the detailed explanations Clément and Levente. I'll probably not add a new dictionary implementation ;) > > > > > On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[hidden email]> wrote: > > Hi, > > I was messing around with SmallDictionary when I suddenly realised that I > can't find a single reason to use it over a normal Dictionary. While its > name and class comment imply that it is somehow an optimised Dictionary, I > don't see any measurement where that actually holds up. The following was > run in a Pharo 7 image on a recent VM (see below): > > | d | > d := SmallDictionary new. > d sizeInMemory. "24" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226" > > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041" > > > > | d | > d := Dictionary new. > d sizeInMemory. "16" > [100000 timesRepeat: [ > 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" > [100000 timesRepeat: [ > d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006" > > > As you can see, SmallDictionary is 8 bytes larger per instance and > significantly faster while reading and writing (I know that this isn't a > good benchmark but it suffices to make my point). > > > Is anyone aware of a reason for hanging on to SmallDictionary? I'm also > curious to know how SmallDictionary came to be. There must have been some > advantage over Dictionary at some point in the past. > > > Cheers, > Max > > > > > > Image version: Pharo 7.0 > Build information: Pharo-7.0+alpha.build.961.sha. > a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit) > > VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid: > 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 > StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: > 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 > VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336 > https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ > > OS: macOS 10.13.5 > Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) > > > > > -- > Clément Béra > https://clementbera.github.io/ > https://clementbera.wordpress.com/ > > > > > > > > -- > Clément Béra > https://clementbera.github.io/https://clementbera.wordpress.com/ > > > > > > > > > > |
>> This discussion has limited its context to comparing *in-memory*
>> Dictionary implementations, where open-addressing has an advantage. >> Using Magma reveals another context where chained Dictionary's can have >> some advantages: When they're large, persistent, and shared in a multi-user >> system -- e.g., when grow rehashing is not an option. > > > B-trees were designed for that use case. > If growing is not an option, then your dictionary's performance will degrade > as its size increases, won't it? Yes, but only logarithmically. But I was not making a performance-based comparison, only a possibility-based one. >> Not only do chained Dictionary's make that scenario possible, they improve >> the concurrency -- different users can add/change/remove from the Dictionary >> as long as they occur at different keys. With regular >> Dictionary's, concurrency guarantees a commit conflict. > > If you created a new, larger dictionary on grow and lazily copied the > entries from the smaller to the larger upon access, then I think you could > use open addressing hash tables. That's an interesting idea, but what if it grows beyond "the larger" again while still many entries in the smaller? Add a third which is doubled in size again? That sounds like it starts to intrude into the efficiency of open-addressing (both computation and space), but the idea still sparks some interesting design thoughts in my head.. > Also, I don't see the concurrency issue with open addressing, because as > long as different users modify different chains, the case of concurrency is > the same as what you described, isn't it? Magma has a class called "MagmaPreallocatedDictionary" which does exactly this. It's like a regular Dictionary, except for its internal 'array', it utilizes a MagmaArray, which is just like an Array except large and persistent and can grow in place (so able to maintain its identity). However, this alone was not enough to implement your proposal because of the 'tally'. Clients would each be trying to propose their specific value for that variable = conflict. Thankfully, Magma has yet another special class called MagmaCounter which fits that role beautifully. Designed for multi-user systems, a MagmaCounter provides a concurrent counter. Instances start at a #value of 0, after which any number of sessions may simultaneously increment or decrement (even by more than 1), but no one can actually hard set its value. Best, Chris |
Free forum by Nabble | Edit this page |