Dictionary removeKey: very low

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Dictionary removeKey: very low

LABORDE Pierre

Hi all,

 

I have a problem with Dictionaries : 

 

dic := Dictionary new: 10000.

 

1 to: 10000 do:[ :i |

            dic at: i put: i printString.

].

 

1 to: 10000 do:[ :i |

            dic removeKey: i.      

]

 

Removing each keys is very slooow, time execution difference between add and remove is crazy.

Any idea for fix that ? Can I use another Dictionary implementation or settings ?

 

Thanks.

 

Pierre

 

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low

Peter Kenny

Pierre

 

It’s all to do with rehashing. A dictionary is stored as an Array of Association. After a key and its corresponding association is removed, the remaining entries are rehashed from the removal point to the end of the array. By doing the removals in natural order, you rehash the whole of the remaining array after each removal. Try doing the removals in reverse order; on my machine it reduces the time from 6 seconds to 6 milliseconds.

 

HTH

 

Peter Kenny

 

From: Pharo-users <[hidden email]> On Behalf Of LABORDE Pierre
Sent: 03 February 2020 10:18
To: [hidden email]
Subject: [Pharo-users] Dictionary removeKey: very low

 

Hi all,

 

I have a problem with Dictionaries : 

 

dic := Dictionary new: 10000.

 

1 to: 10000 do:[ :i |

            dic at: i put: i printString.

].

 

1 to: 10000 do:[ :i |

            dic removeKey: i.      

]

 

Removing each keys is very slooow, time execution difference between add and remove is crazy.

Any idea for fix that ? Can I use another Dictionary implementation or settings ?

 

Thanks.

 

Pierre

 

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low

LABORDE Pierre

Thanks Peter.

 

Here some informations about execution time :

 

dic := Dictionary new: 10000.

 

Time microsecondsToRun:[

            1 to: 10000 do:[ :i |

                        dic at: i put: i printString.

            ].

]. "6000µs"

Time microsecondsToRun:[

            1 to: 10000 do:[ :i |

                        dic removeKey: i.     

            ].

]. "2683000µs, up to 447x"

 

Regarding your comment, I have wrote a too simple example.

Here a new test with a better representation of my need and datas (randomization) :

 

dic := Dictionary new: 10000.

 

pool := SortedCollection new: 10000.

random := Random new.

pool sort:[ :a :b | random next < 0.5 ].

1 to: 10000 do:[ :i | pool add: i ].

 

Time microsecondsToRun:[

            pool do:[ :i |

                        dic at: i put: i printString.

            ].

]. "6000µs => same time, good point"

Time microsecondsToRun:[

            pool do:[ :i |

                        dic removeKey: i.     

            ].

]. "17000µs, up to 2,8x => huge changes !"

 

It’s appear more reasonable !

I need to test it with real datas, to be continued.

 

Thanks for all.

 

Pierre

 

De : Pharo-users [mailto:[hidden email]] De la part de PBKResearch
Envoyé : lundi 3 février 2020 12:25
À : 'Any question about pharo is welcome'
Objet : Re: [Pharo-users] Dictionary removeKey: very low

 

Pierre

 

It’s all to do with rehashing. A dictionary is stored as an Array of Association. After a key and its corresponding association is removed, the remaining entries are rehashed from the removal point to the end of the array. By doing the removals in natural order, you rehash the whole of the remaining array after each removal. Try doing the removals in reverse order; on my machine it reduces the time from 6 seconds to 6 milliseconds.

 

HTH

 

Peter Kenny

 

From: Pharo-users <[hidden email]> On Behalf Of LABORDE Pierre
Sent: 03 February 2020 10:18
To: [hidden email]
Subject: [Pharo-users] Dictionary removeKey: very low

 

Hi all,

 

I have a problem with Dictionaries : 

 

dic := Dictionary new: 10000.

 

1 to: 10000 do:[ :i |

            dic at: i put: i printString.

].

 

1 to: 10000 do:[ :i |

            dic removeKey: i.      

]

 

Removing each keys is very slooow, time execution difference between add and remove is crazy.

Any idea for fix that ? Can I use another Dictionary implementation or settings ?

 

Thanks.

 

Pierre

 

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low

Kasper Osterbye
This is actually a very good issue you have raised. As far as I am able to read the code for Dictionary and HashedCollection, the implementation is not very well suited for deletions at all. 

First, as you have discovered, the remove has an implementation which is unsound (performance wise). Second, HashedCollections grow when you add to them, but they do not shrink when you remove from them.

That is, in scenarios where your dictionaries grow huge, and then later shrink, there is a potential “memory leak” in that the array is kept at its largest size. 

Btw, thanks for the shuffle function using sort.  I do believe the built-in shuffle in SequenceableCollection is faster though (O(n·logn) vs. O(n)).

It seems from the wikipedia page on hash tables, that the there are algorithms with efficient removal also for the "open addressing with linear probing” implementation used in HashedCollection. But it only seems to work if the hash table is also implementing a “shrink” function which is where the re-hashing is done.

Interesting stuff. It would be a good programming exercise to get that to work.

— Kasper


On 7 February 2020 at 11.17.27, LABORDE Pierre ([hidden email]) wrote:

Thanks Peter.

 

Here some informations about execution time :

 

dic := Dictionary new: 10000.

 

Time microsecondsToRun:[

            1 to: 10000 do:[ :i |

                        dic at: i put: i printString.

            ].

]. "6000µs"

Time microsecondsToRun:[

            1 to: 10000 do:[ :i |

                        dic removeKey: i.     

            ].

]. "2683000µs, up to 447x"

 

Regarding your comment, I have wrote a too simple example.

Here a new test with a better representation of my need and datas (randomization) :

 

dic := Dictionary new: 10000.

 

pool := SortedCollection new: 10000.

random := Random new.

pool sort:[ :a :b | random next < 0.5 ].

1 to: 10000 do:[ :i | pool add: i ].

 

Time microsecondsToRun:[

            pool do:[ :i |

                        dic at: i put: i printString.

            ].

]. "6000µs => same time, good point"

Time microsecondsToRun:[

            pool do:[ :i |

                        dic removeKey: i.     

            ].

]. "17000µs, up to 2,8x => huge changes !"

 

It’s appear more reasonable !

I need to test it with real datas, to be continued.

 

Thanks for all.

 

Pierre

 

De : Pharo-users [mailto:[hidden email]] De la part de PBKResearch
Envoyé : lundi 3 février 2020 12:25
À : 'Any question about pharo is welcome'
Objet : Re: [Pharo-users] Dictionary removeKey: very low

 

Pierre

 

It’s all to do with rehashing. A dictionary is stored as an Array of Association. After a key and its corresponding association is removed, the remaining entries are rehashed from the removal point to the end of the array. By doing the removals in natural order, you rehash the whole of the remaining array after each removal. Try doing the removals in reverse order; on my machine it reduces the time from 6 seconds to 6 milliseconds.

 

HTH

 

Peter Kenny

 

From: Pharo-users <[hidden email]> On Behalf Of LABORDE Pierre
Sent: 03 February 2020 10:18
To: [hidden email]
Subject: [Pharo-users] Dictionary removeKey: very low

 

Hi all,

 

I have a problem with Dictionaries : 

 

dic := Dictionary new: 10000.

 

1 to: 10000 do:[ :i |

            dic at: i put: i printString.

].

 

1 to: 10000 do:[ :i |

            dic removeKey: i.      

]

 

Removing each keys is very slooow, time execution difference between add and remove is crazy.

Any idea for fix that ? Can I use another Dictionary implementation or settings ?

 

Thanks.

 

Pierre

 

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low

Richard O'Keefe
In reply to this post by LABORDE Pierre
It's a problem with the traditional Smalltalk-80 implementation of Dictionaries.
Using my compiler and library,
1311 usec to build
 498 usec to delete (1..10000)
1318 usec to build
 472 usec to delete (10000..1)
(The time difference between building and deleting is mostly #printString.)
There is no defect in the Pharo compiler here, it's the hash table design
which is basically flawed if you want delete entries.
My library uses separate chaining
(https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)
which makes deletion simple and fast and allows 'null' keys.

Pharo uses open addressing, which makes deletion much harder.
It doesn't seem to improve retrieval performance either.

On Mon, 3 Feb 2020 at 23:18, LABORDE Pierre
<[hidden email]> wrote:

>
> Hi all,
>
>
>
> I have a problem with Dictionaries :
>
>
>
> dic := Dictionary new: 10000.
>
>
>
> 1 to: 10000 do:[ :i |
>
>             dic at: i put: i printString.
>
> ].
>
>
>
> 1 to: 10000 do:[ :i |
>
>             dic removeKey: i.
>
> ]
>
>
>
> Removing each keys is very slooow, time execution difference between add and remove is crazy.
>
> Any idea for fix that ? Can I use another Dictionary implementation or settings ?
>
>
>
> Thanks.
>
>
>
> Pierre
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low performance

Kasper Osterbye
Hi RIchard,

On 9 February 2020 at 17.54.30, Richard O'Keefe ([hidden email]) wrote:
My library uses separate chaining 
(https://en.wikipedia.org/wiki/Hash_table#Separate_chaining) 
which makes deletion simple and fast and allows 'null' keys. 

Does your implementation use a fixed number of buckets, or do they grow (and shrink)?

I had been wondering about doing an open addressing with items marked as deleted to see how it would perform.

Aka "Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records.” from https://en.wikipedia.org/wiki/Open_addressing.

Best,

Kasper



Reply | Threaded
Open this post in threaded view
|

Re: Dictionary removeKey: very low performance

Richard O'Keefe
My Dictionary and Set implementations (neither of which inherits from the other)
grow but do not shrink.  It's easy enough to do, I just have not
bothered because
I have a significant redesign that improves space and time that needs finishing.

I've used the marking-deleted-entries technique in the past.  I'd
prefer not to do it
again.

On Tue, 11 Feb 2020 at 07:31, Kasper Østerbye <[hidden email]> wrote:

>
> Hi RIchard,
>
> On 9 February 2020 at 17.54.30, Richard O'Keefe ([hidden email]) wrote:
>
> My library uses separate chaining
> (https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)
> which makes deletion simple and fast and allows 'null' keys.
>
> Does your implementation use a fixed number of buckets, or do they grow (and shrink)?
>
> I had been wondering about doing an open addressing with items marked as deleted to see how it would perform.
>
> Aka "Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records.” from https://en.wikipedia.org/wiki/Open_addressing.
>
> Best,
>
> Kasper
>
>
>