Set internal array size

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Set internal array size

demarey
Hi,

I just discovered a funny thing.
If you create a Set with Set new, you will get a Set with an internal array of 5 elements (hardcoded in Set class>>#new).
If you ask explicitly a Set for one or no element: Set new: 0 (or a number between 0 and 5), you will get an internal array of 7 elements.
This behavior is a bit strange. Either we always need at least 7 elements and we need to update Set class>>#new, either the method Set class>>#sizeFor: should be updated.

WDYT?

Christophe
Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Henrik Nergaard-2

Hashed collections keep a portion of the available memory empty (usually 25%) as a partition mechanism to increase lookup speed.

 

“Normal array”

| col |

 

col := Smalltalk allClassesAndTraits asArray.

 

[ col includes: 8; includes: Morph ] bench. "'366.853 per second'"

 

“Normal set ~75% load”

| set |

 

set := Smalltalk allClassesAndTraits asSet.

 

[ set includes: 8; includes: Morph ] bench. "'4,700,967 per second'"

 

 

“Set forced to have ~100% load”

| array fullLoadSet |

 

array := Smalltalk allClassesAndTraits asSet asArray , #(nil).

fullLoadSet:= Set basicNew.

fullLoadSet         

instVarNamed: #array put: array;

               instVarNamed: #tally put: array size.

              

[ fullLoadSet includes: 8; includes: Morph ] bench. "'215.157 per second'"

 

 

Best regards,

Henrik

 

 

From: Pharo-dev [mailto:[hidden email]] On Behalf Of Christophe Demarey
Sent: Thursday, December 1, 2016 11:49 AM
To: Pharo Development List <[hidden email]>
Subject: [Pharo-dev] Set internal array size

 

Hi,

 

I just discovered a funny thing.

If you create a Set with Set new, you will get a Set with an internal array of 5 elements (hardcoded in Set class>>#new).

If you ask explicitly a Set for one or no element: Set new: 0 (or a number between 0 and 5), you will get an internal array of 7 elements.

This behavior is a bit strange. Either we always need at least 7 elements and we need to update Set class>>#new, either the method Set class>>#sizeFor: should be updated.

 

WDYT?

 

Christophe

Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Sven Van Caekenberghe-2
Wow, great explanation/example !

> On 1 Dec 2016, at 13:25, Henrik Nergaard <[hidden email]> wrote:
>
> Hashed collections keep a portion of the available memory empty (usually 25%) as a partition mechanism to increase lookup speed.
>  
> “Normal array”
> | col |
>  
> col := Smalltalk allClassesAndTraits asArray.
>  
> [ col includes: 8; includes: Morph ] bench. "'366.853 per second'"
>  
> “Normal set ~75% load”
> | set |
>  
> set := Smalltalk allClassesAndTraits asSet.
>  
> [ set includes: 8; includes: Morph ] bench. "'4,700,967 per second'"
>  
>  
> “Set forced to have ~100% load”
> | array fullLoadSet |
>  
> array := Smalltalk allClassesAndTraits asSet asArray , #(nil).
> fullLoadSet:= Set basicNew.
> fullLoadSet        
> instVarNamed: #array put: array;
>                instVarNamed: #tally put: array size.
>              
> [ fullLoadSet includes: 8; includes: Morph ] bench. "'215.157 per second'"
>  
>  
> Best regards,
> Henrik
>  
>  
> From: Pharo-dev [mailto:[hidden email]] On Behalf Of Christophe Demarey
> Sent: Thursday, December 1, 2016 11:49 AM
> To: Pharo Development List <[hidden email]>
> Subject: [Pharo-dev] Set internal array size
>  
> Hi,
>  
> I just discovered a funny thing.
> If you create a Set with Set new, you will get a Set with an internal array of 5 elements (hardcoded in Set class>>#new).
> If you ask explicitly a Set for one or no element: Set new: 0 (or a number between 0 and 5), you will get an internal array of7 elements.
> This behavior is a bit strange. Either we always need at least 7 elements and we need to update Set class>>#new, either the method Set class>>#sizeFor: should be updated.
>  
> WDYT?
>  
> Christophe


Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

demarey
In reply to this post by Henrik Nergaard-2
Thanks for this great explanation Henrik.
Nevertheless, my question is still valid. Why Set new will give me an internal array of 5 elements while Set new: 1 will give me an internal array of 7 elements?
Shouldn’t Set new give an internal of 7 elements as it looks like the minimal array size for a set?
Also I would expect Set new: 0 to give me an empty internal array. It explicitly ask an empty Set. A further call to #add: will increase its size as expected.


Le 1 déc. 2016 à 13:25, Henrik Nergaard <[hidden email]> a écrit :

Hashed collections keep a portion of the available memory empty (usually 25%) as a partition mechanism to increase lookup speed.
 
“Normal array”
| col |
 
col := Smalltalk allClassesAndTraits asArray.
 
[ col includes: 8; includes: Morph ] bench. "'366.853 per second'"
 
“Normal set ~75% load”
| set |
 
set := Smalltalk allClassesAndTraits asSet.
 
[ set includes: 8; includes: Morph ] bench. "'4,700,967 per second'"
 
 
“Set forced to have ~100% load”
| array fullLoadSet |
 
array := Smalltalk allClassesAndTraits asSet asArray , #(nil).
fullLoadSet:= Set basicNew.
fullLoadSet         
instVarNamed: #array put: array;
               instVarNamed: #tally put: array size.
              
[ fullLoadSet includes: 8; includes: Morph ] bench. "'215.157 per second'"
 
 
Best regards,
Henrik
 
 
From: Pharo-dev [[hidden email]] On Behalf Of Christophe Demarey
Sent: Thursday, December 1, 2016 11:49 AM
To: Pharo Development List <[hidden email]>
Subject: [Pharo-dev] Set internal array size
 
Hi,
 
I just discovered a funny thing.
If you create a Set with Set new, you will get a Set with an internal array of 5 elements (hardcoded in Set class>>#new).
If you ask explicitly a Set for one or no element: Set new: 0 (or a number between 0 and 5), you will get an internal array of 7 elements.
This behavior is a bit strange. Either we always need at least 7 elements and we need to update Set class>>#new, either the method Set class>>#sizeFor: should be updated.
 
WDYT?
 
Christophe

Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Marcus Denker-4
The idea that a set is always created with a minimal size is because we always add at least some elements
to them. We at some point increased that size (from 3 to 5?).

The difference comes from

Set  sizeFor: 1 ==> 7

HashedCollection>>#sizeFor: has a special case for small (<4) elements, while the one of Set does not.  Which is odd.

Marcus




On 1 Dec 2016, at 14:39, Christophe Demarey <[hidden email]> wrote:

Thanks for this great explanation Henrik.
Nevertheless, my question is still valid. Why Set new will give me an internal array of 5 elements while Set new: 1 will give me an internal array of 7 elements?
Shouldn’t Set new give an internal of 7 elements as it looks like the minimal array size for a set?
Also I would expect Set new: 0 to give me an empty internal array. It explicitly ask an empty Set. A further call to #add: will increase its size as expected.


Le 1 déc. 2016 à 13:25, Henrik Nergaard <[hidden email]> a écrit :

Hashed collections keep a portion of the available memory empty (usually 25%) as a partition mechanism to increase lookup speed.
 
“Normal array”
| col |
 
col := Smalltalk allClassesAndTraits asArray.
 
[ col includes: 8; includes: Morph ] bench. "'366.853 per second'"
 
“Normal set ~75% load”
| set |
 
set := Smalltalk allClassesAndTraits asSet.
 
[ set includes: 8; includes: Morph ] bench. "'4,700,967 per second'"
 
 
“Set forced to have ~100% load”
| array fullLoadSet |
 
array := Smalltalk allClassesAndTraits asSet asArray , #(nil).
fullLoadSet:= Set basicNew.
fullLoadSet         
instVarNamed: #array put: array;
               instVarNamed: #tally put: array size.
              
[ fullLoadSet includes: 8; includes: Morph ] bench. "'215.157 per second'"
 
 
Best regards,
Henrik
 
 
From: Pharo-dev [[hidden email]] On Behalf Of Christophe Demarey
Sent: Thursday, December 1, 2016 11:49 AM
To: Pharo Development List <[hidden email]>
Subject: [Pharo-dev] Set internal array size
 
Hi,
 
I just discovered a funny thing.
If you create a Set with Set new, you will get a Set with an internal array of 5 elements (hardcoded in Set class>>#new).
If you ask explicitly a Set for one or no element: Set new: 0 (or a number between 0 and 5), you will get an internal array of 7 elements.
This behavior is a bit strange. Either we always need at least 7 elements and we need to update Set class>>#new, either the method Set class>>#sizeFor: should be updated.
 
WDYT?
 
Christophe


Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

demarey

> Le 1 déc. 2016 à 15:04, Marcus Denker <[hidden email]> a écrit :
>
> The idea that a set is always created with a minimal size is because we always add at least some elements

not always. You know in some cases 80% of the time, your instances will have an empty Set and with some values for the remaining 20%.
If you check in Moose, you should find a lot this pattern.
Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Henrik Nergaard-2
The Set implementation requires the array to have a size > 0.

(Set basicNew initialize: 0) add: #x "Error Zero Devide"

Best regards,
Henrik

-----Original Message-----
From: Pharo-dev [mailto:[hidden email]] On Behalf Of Christophe Demarey
Sent: Thursday, December 1, 2016 3:42 PM
To: Pharo Development List <[hidden email]>
Subject: Re: [Pharo-dev] Set internal array size


> Le 1 déc. 2016 à 15:04, Marcus Denker <[hidden email]> a écrit :
>
> The idea that a set is always created with a minimal size is because we always add at least some elements

not always. You know in some cases 80% of the time, your instances will have an empty Set and with some values for the remaining 20%.
If you check in Moose, you should find a lot this pattern.
Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Marcus Denker-4
In reply to this post by demarey

> On 1 Dec 2016, at 11:41, Christophe Demarey <[hidden email]> wrote:
>
>
>> Le 1 déc. 2016 à 15:04, Marcus Denker <[hidden email]> a écrit :
>>
>> The idea that a set is always created with a minimal size is because we always add at least some elements
>
> not always. You know in some cases 80% of the time, your instances will have an empty Set and with some values for the remaining 20%.
> If you check in Moose, you should find a lot this pattern.

It would be interesting to play around with these things… the problem is that they are very context dependent, so there is lots
of questions what to do (more a research topic than something one can solve now…).

What we should do:

-> the size should be 5, like it is for Dictionaries We need to unify the two version of the method that calculate the initial size.

What we could do:

-> provide a version of “new” that actually makes explicity a small array of size 1. For the case where the user knows that
the Set will be empty most of the time.

        Marcus
Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Sven Van Caekenberghe-2

> On 12 Dec 2016, at 18:29, Marcus Denker <[hidden email]> wrote:
>
>
>> On 1 Dec 2016, at 11:41, Christophe Demarey <[hidden email]> wrote:
>>
>>
>>> Le 1 déc. 2016 à 15:04, Marcus Denker <[hidden email]> a écrit :
>>>
>>> The idea that a set is always created with a minimal size is because we always add at least some elements
>>
>> not always. You know in some cases 80% of the time, your instances will have an empty Set and with some values for the remaining 20%.
>> If you check in Moose, you should find a lot this pattern.
>
> It would be interesting to play around with these things… the problem is that they are very context dependent, so there is lots
> of questions what to do (more a research topic than something one can solve now…).
>
> What we should do:
>
> -> the size should be 5, like it is for Dictionaries We need to unify the two version of the method that calculate the initial size.
>
> What we could do:
>
> -> provide a version of “new” that actually makes explicity a small array of size 1. For the case where the user knows that
> the Set will be empty most of the time.

There could be an #empty class method for that.
And one could overwrite the #with: class method to create a singleton.

> Marcus


Reply | Threaded
Open this post in threaded view
|

Re: Set internal array size

Marcus Denker-4

On 12 Dec 2016, at 14:32, Sven Van Caekenberghe <[hidden email]> wrote:


On 12 Dec 2016, at 18:29, Marcus Denker <[hidden email]> wrote:


On 1 Dec 2016, at 11:41, Christophe Demarey <[hidden email]> wrote:


Le 1 déc. 2016 à 15:04, Marcus Denker <[hidden email]> a écrit :

The idea that a set is always created with a minimal size is because we always add at least some elements

not always. You know in some cases 80% of the time, your instances will have an empty Set and with some values for the remaining 20%.
If you check in Moose, you should find a lot this pattern.

It would be interesting to play around with these things… the problem is that they are very context dependent, so there is lots
of questions what to do (more a research topic than something one can solve now…).

What we should do:

-> the size should be 5, like it is for Dictionaries We need to unify the two version of the method that calculate the initial size.


Issue tracker:


What we could do:

-> provide a version of “new” that actually makes explicity a small array of size 1. For the case where the user knows that
the Set will be empty most of the time.

There could be an #empty class method for that.
And one could overwrite the #with: class method to create a singleton.


(just the issues, no code)

Marcus