Reimplementing WeakKeyDictionary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Reimplementing WeakKeyDictionary

"Martin v. Löwis"
I now have completed a first version of a reimplementation of
WeakKeyDictionary, with the objective of improving performance
(both computation and memory consumption). The entire change
with a discussion can be seen at

http://bugs.squeak.org/view.php?id=6348

In essence, this introduces two changes:
- when values are finalized or keys removed from the dictionary,
   it is not always rehashed anymore. Instead, the dictionary just
   releases the value, and marks the association as expired. Expired
   associations can then be reused if a new value is to be inserted
   under the same has. Rehashing is only done when so many entries
   have expired. Expired entries don't count towards the size of the
   dictionary, and aren't reported when iterating over the dictionary.
- WeakKeyAssociation was rewritten to be a weak class, with a fixed
   slot for the value, and a weak slot for the key (currently, it
   has two fixed slots, with the key slot referring to a WeakArray
   of length 1).

In some measurements involving large (10000 keys) or many (10000)
small weak key dictionaries, this new implementation improves
performance by a factor of 4..5, when measuring finalization time.
The second change saves 8 bytes per association (on a 32-bit
machine).

Regards,
Martin




Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Philippe Marschall
2007/3/18, "Martin v. Löwis" <[hidden email]>:

> I now have completed a first version of a reimplementation of
> WeakKeyDictionary, with the objective of improving performance
> (both computation and memory consumption). The entire change
> with a discussion can be seen at
>
> http://bugs.squeak.org/view.php?id=6348
>
> In essence, this introduces two changes:
> - when values are finalized or keys removed from the dictionary,
>    it is not always rehashed anymore. Instead, the dictionary just
>    releases the value, and marks the association as expired. Expired
>    associations can then be reused if a new value is to be inserted
>    under the same has. Rehashing is only done when so many entries
>    have expired. Expired entries don't count towards the size of the
>    dictionary, and aren't reported when iterating over the dictionary.
> - WeakKeyAssociation was rewritten to be a weak class, with a fixed
>    slot for the value, and a weak slot for the key (currently, it
>    has two fixed slots, with the key slot referring to a WeakArray
>    of length 1).
>
> In some measurements involving large (10000 keys) or many (10000)
> small weak key dictionaries, this new implementation improves
> performance by a factor of 4..5, when measuring finalization time.
> The second change saves 8 bytes per association (on a 32-bit
> machine).
Could you attach the code for these stress tests to the bug?

Cheers
Philippe

> Regards,
> Martin
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

"Martin v. Löwis"
Philippe Marschall schrieb:
> Could you attach the code for these stress tests to the bug?

They are in there (weakdictall.1.cs):
WeakKeyDictionaryTest class>>timingLarge
WeakKeyDictionaryTest class>>timingMany

HTH,
Martin


Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

David T. Lewis
In reply to this post by "Martin v. Löwis"
On Sun, Mar 18, 2007 at 03:51:34PM +0100, "Martin v. L?wis" wrote:
> I now have completed a first version of a reimplementation of
> WeakKeyDictionary, with the objective of improving performance
> (both computation and memory consumption). The entire change
> with a discussion can be seen at
>
> http://bugs.squeak.org/view.php?id=6348

Martin,

This would be a welcome improvement!

Note also http://bugs.squeak.org/view.php?id=2910 which is also
related to this problem. (How do you add a "relationship" in
Mantis?)

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

"Martin v. Löwis"
David T. Lewis schrieb:
> This would be a welcome improvement!
>
> Note also http://bugs.squeak.org/view.php?id=2910 which is also
> related to this problem. (How do you add a "relationship" in
> Mantis?)

I had indeed seen this before. On a first quick review, I agreed
with your analysis that the patch is right; on a closer look,
I agree with Andreas: it seems that the patch drops associations
with nil keys on the floor, which it shouldn't do.

In any case, my patch would also address this problem: on removal,
no rehashing is done at all. Instead, it just clears the association
of the to-be-removed key, and then stops; the association will get
only discarded when the entire dictionary gets reorganized.

Regards,
Martin


Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Andreas.Raab
In reply to this post by "Martin v. Löwis"
Martin v. Löwis wrote:
> In essence, this introduces two changes:
> - when values are finalized or keys removed from the dictionary,
>   it is not always rehashed anymore. Instead, the dictionary just
>   releases the value, and marks the association as expired. Expired
>   associations can then be reused if a new value is to be inserted
>   under the same has. Rehashing is only done when so many entries
>   have expired. Expired entries don't count towards the size of the
>   dictionary, and aren't reported when iterating over the dictionary.

That's an interesting approach. I would expect this to impact the time
it takes to insert/find elements in the dict, no? Like:
     wd := WeakKeyDictionary new.
     1 to: 100 do:[:i| wd at: i put: i].
     1 to: 99  do:[:i| wd removeKey: i].
After this point I'd expect the operations to be slower since we have
about a hundred expired elements in the dict. Nothing that a good
#rehash can't fix of course, but I'm wondering if at some point the WKD
shouldn't recognize the ratio of expired/free/used slots and act
accordingly. In any case, it's definitely an improvement.

> - WeakKeyAssociation was rewritten to be a weak class, with a fixed
>   slot for the value, and a weak slot for the key (currently, it
>   has two fixed slots, with the key slot referring to a WeakArray
>   of length 1).

Hah! Finally  ;-)  When I wrote the weak dicts I was concerned about the
usage of associations for literal bindings in code and whether any such
weak associations might be used for this purpose (the problem is that
the VM accesses those directly and not having the right values in the
right slots will give you "interesting" results). However, that concern
never even remotely materialized so it's basically a waste of perfectly
good resources. Thanks for cleaning that part up.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: Reimplementing WeakKeyDictionary

Ken Causey-3
In reply to this post by "Martin v. Löwis"
This appears to be something that did not make the transition to the new server, the ability for reporters to add relationships.  I spent a few minutes on this but it looks like this is going to take a bit more time to resolve and I don't have the time today.  I'll try to take care of it tomorrow and let you know.

Ken

> From: "David T. Lewis" <[hidden email]>
> Date: Sun, March 18, 2007 11:36 am
...snip...
> (How do you add a "relationship" in Mantis?)
>
> Dave

Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

"Martin v. Löwis"
In reply to this post by Andreas.Raab
> That's an interesting approach. I would expect this to impact the time
> it takes to insert/find elements in the dict, no? Like:
>     wd := WeakKeyDictionary new.
>     1 to: 100 do:[:i| wd at: i put: i].
>     1 to: 99  do:[:i| wd removeKey: i].
> After this point I'd expect the operations to be slower since we have
> about a hundred expired elements in the dict.

It strongly depends on the operations now. The lookup for the only
remaining key (100) will not be slower, since that key is not colliding.

The next invocation of fullCheck will see that there are way too many
expired entries in the dictionary, and rehash it. So the very next
insertion may be slightly slower, but all subsequent insertions
and other operations will have the same performance - in this specific
use case, the new code just saved a hundred rehash operations.

> Nothing that a good
> #rehash can't fix of course, but I'm wondering if at some point the WKD
> shouldn't recognize the ratio of expired/free/used slots and act
> accordingly.

Ah, it does: if 4*expired > array size, it rehashes (in fullCheck). The
precise percentage may be debatable, and it may be an option to also
consider the relationship of expired and tally, but it will definitely
rehash from time to time (and growing will drop expired entries, too).

It might also be useful to add a fullCheck into removeKey:, so that in
above example it won't end up with with 99 expired associations, but
only one. With rehashing as implemented in the patch, this would cause
the dictionary to shrink (as it uses essentially the implementation
of Dictionary>>rehash); in turn, the loop would do 7 rehashes, at
41, 21, 14, 9, 6, 4, 3 expired elements.

With rehashing similar to the original one (i.e. one that keeps the
size), a fullCheck on remove, in above example, would rehash every
40 removals (given that the dictionary will have 160 slots), so you
would end up with 20 expired elements after the loop.

Regards,
Martin


Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Andreas.Raab
Martin v. Löwis wrote:
> The next invocation of fullCheck will see that there are way too many
> expired entries in the dictionary, and rehash it.

Oh, it does! I didn't notice that. In this case, my point is of course moot.

>> Nothing that a good #rehash can't fix of course, but I'm wondering if
>> at some point the WKD shouldn't recognize the ratio of
>> expired/free/used slots and act accordingly.
>
> Ah, it does: if 4*expired > array size, it rehashes (in fullCheck). The
> precise percentage may be debatable, and it may be an option to also
> consider the relationship of expired and tally, but it will definitely
> rehash from time to time (and growing will drop expired entries, too).

Right. Wasn't careful enough reading the code. Nevermind ;-)

Brief Q: You didn't explicitly specify what license those changes are
under. Would you be amendable to using MIT-style license for it? (it
would allow me to use the code trivially in Croquet)

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Bert Freudenberg
On Mar 19, 2007, at 7:50 , Andreas Raab wrote:

> Brief Q: You didn't explicitly specify what license those changes  
> are under. Would you be amendable to using MIT-style license for  
> it? (it would allow me to use the code trivially in Croquet)

In case this hasn't been clear to everybody yet - we require MIT  
licensing for anything that people might want to go into official  
Squeak nowadays.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Blake-5
On Sun, 18 Mar 2007 23:53:21 -0800, Bert Freudenberg  
<[hidden email]> wrote:

> On Mar 19, 2007, at 7:50 , Andreas Raab wrote:
>
>> Brief Q: You didn't explicitly specify what license those changes are  
>> under. Would you be amendable to using MIT-style license for it? (it  
>> would allow me to use the code trivially in Croquet)
>
> In case this hasn't been clear to everybody yet - we require MIT  
> licensing for anything that people might want to go into official Squeak  
> nowadays.

A cursory Google suggests that "MIT licensing" might mean a number of  
things. If it's this:

http://www.opensource.org/licenses/mit-license.php

Then it would seem that MIT licensing means you can do (or not do) just  
about anything with the code, as long as the MIT license accompanies  
whatever you do?

        ===Blake===

Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

"Martin v. Löwis"
In reply to this post by Bert Freudenberg
Bert Freudenberg schrieb:
> On Mar 19, 2007, at 7:50 , Andreas Raab wrote:
>
>> Brief Q: You didn't explicitly specify what license those changes are
>> under. Would you be amendable to using MIT-style license for it? (it
>> would allow me to use the code trivially in Croquet)
>
> In case this hasn't been clear to everybody yet - we require MIT
> licensing for anything that people might want to go into official Squeak
> nowadays.

So how do you deal with the requirement that the license is put in all
copies?

Regards,
Martin

Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

"Martin v. Löwis"
In reply to this post by Andreas.Raab
Andreas Raab schrieb:
> Brief Q: You didn't explicitly specify what license those changes are
> under. Would you be amendable to using MIT-style license for it? (it
> would allow me to use the code trivially in Croquet)

Sure - I just added a license file. See also my other response; I wonder
where all the license files are kept in Squeak.

I would be willing to give the Squeak Foundation (as well as Apple
Computer, Inc., if necessary) the irrevocable and perpetual right to
make and distribute copies of the Software, as well as to create and
distribute collective works and derivative works of the Software, under
any other OSI-approved open source license (thus eliminating my own
MIT-style license from the license stack).

Regards,
Martin



Reply | Threaded
Open this post in threaded view
|

Contribution licensing

Bert Freudenberg
In reply to this post by "Martin v. Löwis"
On Mar 19, 2007, at 10:58 , Martin v. Löwis wrote:

> Bert Freudenberg schrieb:
>> On Mar 19, 2007, at 7:50 , Andreas Raab wrote:
>>> Brief Q: You didn't explicitly specify what license those changes  
>>> are under. Would you be amendable to using MIT-style license for  
>>> it? (it would allow me to use the code trivially in Croquet)
>> In case this hasn't been clear to everybody yet - we require MIT  
>> licensing for anything that people might want to go into official  
>> Squeak nowadays.
>
> So how do you deal with the requirement that the license is put in all
> copies?

We require a written note that allows us to use the contributions  
under MIT. Since this is rather new we need to figure out the exact  
details still, but there is a form you need to print out and sign and  
send in.  I think Craig might know best.

MIT was chosen because it is compatible to basically everything.  
Squeak 1.1 was relicensed by Apple under the Apache License 2.0 just  
last year. With this Apache base and all the MIT contributions we  
have a completely free system. And if there was a replacement for the  
Squeak kernel down the road (Spoon, Pepsi, whatever) the MIT parts  
can be moved over with ease. Also, the major new systems built on  
Squeak (like Croquet, Tweak, Seaside, etc.) are MIT licensed.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Reimplementing WeakKeyDictionary

Bert Freudenberg
In reply to this post by Blake-5
On Mar 19, 2007, at 11:22 , Blake wrote:

> On Sun, 18 Mar 2007 23:53:21 -0800, Bert Freudenberg  
> <[hidden email]> wrote:
>
>> On Mar 19, 2007, at 7:50 , Andreas Raab wrote:
>>
>>> Brief Q: You didn't explicitly specify what license those changes  
>>> are under. Would you be amendable to using MIT-style license for  
>>> it? (it would allow me to use the code trivially in Croquet)
>>
>> In case this hasn't been clear to everybody yet - we require MIT  
>> licensing for anything that people might want to go into official  
>> Squeak nowadays.
>
> A cursory Google suggests that "MIT licensing" might mean a number  
> of things. If it's this:
>
> http://www.opensource.org/licenses/mit-license.php
>
> Then it would seem that MIT licensing means you can do (or not do)  
> just about anything with the code, as long as the MIT license  
> accompanies whatever you do?

As I understand it, yes.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Contribution licensing

"Martin v. Löwis"
In reply to this post by Bert Freudenberg
Bert Freudenberg schrieb:
> MIT was chosen because it is compatible to basically everything. Squeak
> 1.1 was relicensed by Apple under the Apache License 2.0 just last year.
> With this Apache base and all the MIT contributions we have a completely
> free system. And if there was a replacement for the Squeak kernel down
> the road (Spoon, Pepsi, whatever) the MIT parts can be moved over with
> ease. Also, the major new systems built on Squeak (like Croquet, Tweak,
> Seaside, etc.) are MIT licensed.

As a contributor, I don't mind that choice. Still, the MIT license
requires that "The above copyright notice and this permission notice
shall be included in all copies or substantial portions of the
Software.", so I'm curious how this requirement will be executed.

Will the licenses all be collected in a class comment of some class?

Regards,
Martin

Reply | Threaded
Open this post in threaded view
|

Re: Contribution licensing

Karl-19
Martin v. Löwis skrev:

> Bert Freudenberg schrieb:
>> MIT was chosen because it is compatible to basically everything.
>> Squeak 1.1 was relicensed by Apple under the Apache License 2.0 just
>> last year. With this Apache base and all the MIT contributions we
>> have a completely free system. And if there was a replacement for the
>> Squeak kernel down the road (Spoon, Pepsi, whatever) the MIT parts
>> can be moved over with ease. Also, the major new systems built on
>> Squeak (like Croquet, Tweak, Seaside, etc.) are MIT licensed.
>
> As a contributor, I don't mind that choice. Still, the MIT license
> requires that "The above copyright notice and this permission notice
> shall be included in all copies or substantial portions of the
> Software.", so I'm curious how this requirement will be executed.
>
> Will the licenses all be collected in a class comment of some class?
>
> Regards,
> Martin
>
>
Should we not have a class License and method licenseMIT ?
I think we have been sloppy with distributing the license with Squeak in
the past. I can't recall any download wich include the Squeak License or
any other for that sake.

Karl

Reply | Threaded
Open this post in threaded view
|

Re: Contribution licensing

Karl-19
karl skrev:

> Martin v. Löwis skrev:
>> Bert Freudenberg schrieb:
>>> MIT was chosen because it is compatible to basically everything.
>>> Squeak 1.1 was relicensed by Apple under the Apache License 2.0 just
>>> last year. With this Apache base and all the MIT contributions we
>>> have a completely free system. And if there was a replacement for
>>> the Squeak kernel down the road (Spoon, Pepsi, whatever) the MIT
>>> parts can be moved over with ease. Also, the major new systems built
>>> on Squeak (like Croquet, Tweak, Seaside, etc.) are MIT licensed.
>>
>> As a contributor, I don't mind that choice. Still, the MIT license
>> requires that "The above copyright notice and this permission notice
>> shall be included in all copies or substantial portions of the
>> Software.", so I'm curious how this requirement will be executed.
>>
>> Will the licenses all be collected in a class comment of some class?
>>
>> Regards,
>> Martin
>>
>>
> Should we not have a class License and method licenseMIT ?
> I think we have been sloppy with distributing the license with Squeak
> in the past. I can't recall any download wich include the Squeak
> License or any other for that sake.
We could even add a comment tag for the compiler so it will only compile
the methods which refer to the MIT license. I'm not sure how to do that....

Karl

Reply | Threaded
Open this post in threaded view
|

Re: Contribution licensing

"Martin v. Löwis"
In reply to this post by Karl-19
> Should we not have a class License and method licenseMIT ?
> I think we have been sloppy with distributing the license with Squeak in
> the past. I can't recall any download wich include the Squeak License or
> any other for that sake.

Reviewing the Squeak license, it appears that it includes no requirement
to actually include the license in the software, so not including it
wasn't a license violation (although "sloppy" may still apply). This is
different with the MIT license: it explicitly requires inclusion with
the software. Given that the MIT license is a template, so each
contributor will provide her own version of it, this gives a long list
of licenses to include.

Regards,
Martin



Reply | Threaded
Open this post in threaded view
|

Re: Contribution licensing

Karl-19
Martin v. Löwis skrev:

>> Should we not have a class License and method licenseMIT ?
>> I think we have been sloppy with distributing the license with Squeak
>> in the past. I can't recall any download wich include the Squeak
>> License or any other for that sake.
>
> Reviewing the Squeak license, it appears that it includes no requirement
> to actually include the license in the software, so not including it
> wasn't a license violation (although "sloppy" may still apply). This is
> different with the MIT license: it explicitly requires inclusion with
> the software. Given that the MIT license is a template, so each
> contributor will provide her own version of it, this gives a long list
> of licenses to include.
Well it says:
'The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.'

So it should be enough to have one license in the image ?
Maybe all packages/modules should be dependent on a License module ?
Patches and bugfixes, I don't know, depends on how substantial they are ?

Licenses are a source of endless discussions :-)

Karl


12