free-chunk-management-in-the-cog-vm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

free-chunk-management-in-the-cog-vm

Ben Coman
 
Hi Clement,

Nice article...

A random thought, I wonder if size-1 chunks can be managed packed into size-2 chunks.

Have a variable size1_active_double initially nil.
First size-1 allocation, grab a size-2 chunk into size1_active_double.
First half returned as the size-1-chunk. Second half marked at "empty".

Second size-1 allocation, return second half of size1_active_double
as size-1-chunk.  Set  size1_active_double to nil.

Releasing a size-1-chunk. Dependent on whether memory alignment can inform of which half of size-2-chunk is being released. Check if other half is empty ==> add it to the size-2-chunk-list.

If other half not empty, add it to one of two singly-linked-lists depending on which half is empty, the empty half used for the linked-list.  Next size-1 allocations made from these lists first to fill in the empty half.

Considerations
* Added complexity
* Remaining size-1 singly-linked-list might have minimal impact - added complexity not worth the gain.

____________
Or another random thought *grin* - and this is probably completely off base because of my limited understanding.  
Piggyback the size-1-chunk management on top of the size-3-chunk-list.
Consider one end of the size-3-chunk-list to be "size1-filled-end" with references to free size-1-chunks, and the other end of the size-3-chunk-list to be "size1-empty-end".

To release a size-1-chunk, move a size-3-chunk from the "size1-empty-end" to the "size1-filled-end" and store the size-1-chunk reference there.

To allocate a size-1-chunk, get it from the "size1-filled" end of the size-3-chunk-list and move that size-3-chunk to the "size1-empty-end" of its list.

To compact a size-3-chunk that refers to a size-1-chunk, copy that into another chunk from the size1-empty-end moved to the size1-filled-end.


I wonder if that even makes sense. 
Anyway, it hurts less to let the ideas roam free...

cheers -ben 
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Eliot Miranda-2
 
Hi Ben,

On Sun, Jun 17, 2018 at 4:11 PM, Ben Coman <[hidden email]> wrote:
 
Hi Clement,

Nice article...

A random thought, I wonder if size-1 chunks can be managed packed into size-2 chunks.

A key design feature of Spur is that every object has at least one field to function as a forwarding pointer.  Hence even zero-sized objects occupy 128 bits.  Each object has a 64-bit header (large objects have a 128-bit header).  The allocation unit is 64-bits.  All objects must have room for a forwarding pointer.  Hence the minimum sized chunk is 128-bits.

Hence there isn't room for two size 1 chunks inside a size-2 chunk, only inside a size three chunk.
 

Have a variable size1_active_double initially nil.
First size-1 allocation, grab a size-2 chunk into size1_active_double.
First half returned as the size-1-chunk. Second half marked at "empty".

Second size-1 allocation, return second half of size1_active_double
as size-1-chunk.  Set  size1_active_double to nil.

Releasing a size-1-chunk. Dependent on whether memory alignment can inform of which half of size-2-chunk is being released. Check if other half is empty ==> add it to the size-2-chunk-list.

If other half not empty, add it to one of two singly-linked-lists depending on which half is empty, the empty half used for the linked-list.  Next size-1 allocations made from these lists first to fill in the empty half.

Considerations
* Added complexity
* Remaining size-1 singly-linked-list might have minimal impact - added complexity not worth the gain.

____________
Or another random thought *grin* - and this is probably completely off base because of my limited understanding.  
Piggyback the size-1-chunk management on top of the size-3-chunk-list.
Consider one end of the size-3-chunk-list to be "size1-filled-end" with references to free size-1-chunks, and the other end of the size-3-chunk-list to be "size1-empty-end".

To release a size-1-chunk, move a size-3-chunk from the "size1-empty-end" to the "size1-filled-end" and store the size-1-chunk reference there.

To allocate a size-1-chunk, get it from the "size1-filled" end of the size-3-chunk-list and move that size-3-chunk to the "size1-empty-end" of its list.

To compact a size-3-chunk that refers to a size-1-chunk, copy that into another chunk from the size1-empty-end moved to the size1-filled-end.


I wonder if that even makes sense. 
Anyway, it hurts less to let the ideas roam free...

cheers -ben 




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Levente Uzonyi
In reply to this post by Ben Coman
 
On Mon, 18 Jun 2018, Ben Coman wrote:

> A random thought, I wonder if size-1 chunks can be managed packed into
size-2 chunks.

Or, just use the XOR trick to store each node's two links in a single pointer
sized field:
https://everything2.com/title/Storing+a+doubly-linked+list+using+just+a+single+pointer+field

Levente

Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Eliot Miranda-2
 
Hi Levente,


> On Jun 17, 2018, at 5:36 PM, Levente Uzonyi <[hidden email]> wrote:
>
>> On Mon, 18 Jun 2018, Ben Coman wrote:
>>
>> A random thought, I wonder if size-1 chunks can be managed packed into
> size-2 chunks.
>
> Or, just use the XOR trick to store each node's two links in a single pointer sized field:
> https://everything2.com/title/Storing+a+doubly-linked+list+using+just+a+single+pointer+field

Alas, while this lovely trick does indeed encode a doubly-linked list in a single field it only works for full traversals.  The xor is of the two neighbours, so to get to the next one needs the prev, and to get to the prev one needs the next.  So one can start from either end but not in the middle.  Clément’s modification is to allow rapid removal without needing to traverse the entire list, so the xor trick is not fit for purpose here.

BTW the compactor I wrote before the current one (SpurPigCompactor preceded SpurPlanningCompactor) used exactly this trick.  It didn’t work for reasons unrelated to the xor trick.  Just mentioning it as proof that I love the technique.

>
> Levente
>
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Levente Uzonyi
 
Hi Eliot,

On Sun, 17 Jun 2018, Eliot Miranda wrote:

>
> Hi Levente,
>
>
>> On Jun 17, 2018, at 5:36 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Mon, 18 Jun 2018, Ben Coman wrote:
>>>
>>> A random thought, I wonder if size-1 chunks can be managed packed into
>> size-2 chunks.
>>
>> Or, just use the XOR trick to store each node's two links in a single pointer sized field:
>> https://everything2.com/title/Storing+a+doubly-linked+list+using+just+a+single+pointer+field
>
> Alas, while this lovely trick does indeed encode a doubly-linked list in a single field it only works for full traversals.  The xor is of the two neighbours, so to get to the next one needs the prev, and to get to the prev one needs the next.  So one can start from either end but not in the middle.  Clément’s modification is to allow rapid removal without needing to traverse the entire list, so the xor trick is not fit for purpose here.
Well, it wasn't clear if that's a requirement, but I could have figured it
out, because if you always iterate over the list, you can just keep a
pointer to the previous node to delete the current node.

If performance is important here, random deletion can still be done in
O(1) time at the cost of maintaining an external doubly linked list with
each node having a pointer to the chunk and the chunk using its sole slot
to point to its list node.

>
> BTW the compactor I wrote before the current one (SpurPigCompactor preceded SpurPlanningCompactor) used exactly this trick.  It didn’t work for reasons unrelated to the xor trick.  Just mentioning it as proof that I love the technique.

Nice. :)

Levente

>
>>
>> Levente
>>
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Clément Béra
 

In the end I decided to rebuild the linked lists while sweeping afterdiscussing with Eliot. It avoids most of the unlinking operations on the linked lists of free chunks.

There are still some unlinking, but it's less common so the single linked list for chunk of size 1 is now affordable (I don't see 75% of time spent in free chunk unlinking like before on pathological cases).

In selective compactor, segments are compacted using unlinking but only segments mostly empty are compacted, so low amount of free chunks of size 1. The problem was in the sweep phase and we fixed it.

On Mon, Jun 18, 2018, 14:31 Levente Uzonyi <[hidden email]> wrote:
 Hi Eliot,

On Sun, 17 Jun 2018, Eliot Miranda wrote:

>
> Hi Levente,
>
>
>> On Jun 17, 2018, at 5:36 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Mon, 18 Jun 2018, Ben Coman wrote:
>>>
>>> A random thought, I wonder if size-1 chunks can be managed packed into
>> size-2 chunks.
>>
>> Or, just use the XOR trick to store each node's two links in a single pointer sized field:
>> https://everything2.com/title/Storing+a+doubly-linked+list+using+just+a+single+pointer+field
>
> Alas, while this lovely trick does indeed encode a doubly-linked list in a single field it only works for full traversals.  The xor is of the two neighbours, so to get to the next one needs the prev, and to get to the prev one needs the next.  So one can start from either end but not in the middle.  Clément’s modification is to allow rapid removal without needing to traverse the entire list, so the xor trick is not fit for purpose here.

Well, it wasn't clear if that's a requirement, but I could have figured it
out, because if you always iterate over the list, you can just keep a
pointer to the previous node to delete the current node.

If performance is important here, random deletion can still be done in
O(1) time at the cost of maintaining an external doubly linked list with
each node having a pointer to the chunk and the chunk using its sole slot
to point to its list node.

>
> BTW the compactor I wrote before the current one (SpurPigCompactor preceded SpurPlanningCompactor) used exactly this trick.  It didn’t work for reasons unrelated to the xor trick.  Just mentioning it as proof that I love the technique.

Nice. :)

Levente

>
>>
>> Levente
>>
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Levente Uzonyi
 
On Mon, 18 Jun 2018, Clément Bera wrote:

> There are still some unlinking, but it's less common so the single
linked list for chunk of size 1 is now affordable (I don't see 75% of time
spent in free chunk unlinking like before on pathological cases).
>
> In selective compactor, segments are compacted using unlinking but only
segments mostly empty are compacted, so low amount of free chunks of size
1. The problem was in the sweep phase and we fixed it.

Great.
Btw, why did you decide to use a tree instead of a hash table?
I had a look at the tree implementation, and it seems to be a simple
binary tree. Is that correct?

Levente
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Eliot Miranda-2
 
Hi Levente,

On Tue, Jun 19, 2018 at 4:50 PM, Levente Uzonyi <[hidden email]> wrote:
 
On Mon, 18 Jun 2018, Clément Bera wrote:

There are still some unlinking, but it's less common so the single
linked list for chunk of size 1 is now affordable (I don't see 75% of time spent in free chunk unlinking like before on pathological cases).

In selective compactor, segments are compacted using unlinking but only
segments mostly empty are compacted, so low amount of free chunks of size 1. The problem was in the sweep phase and we fixed it.

Great.
Btw, why did you decide to use a tree instead of a hash table?

To keep it simple.  The scheme really comes from the free lists.  There is either a 32 or a 64 element array with all the free chunks from size 1 to 32 or 1 to 64.  So this gives fast allocation for the common case of a smallish object.  Then what to do with element 0?  A binary tree works well since it is populated only with larger chunks and, given that all chunks of the same size are, effectively, a single node in the tree, the tree typically isn't that large.  That and some very simple balancing hacks keep it from degenerating into a list whenever I've looked at its state (although we might investigate organizing it as an AVL tree; I like AVL trees but they're a lot of work for little gain if in practice a tree doesn't degenerate).

The use of 32 or 64 elements allows the use of a bitmask word to serve as a non-empty list cache, so that when the system looks for chunks bigger than needed (if a smaller list is empty) it is fast to check the list(s) at 2x, 3x, etc...

A hash table needs heap space to organize and unlike the binary tree this space may need to grow if the number of free chunks is very large.
 
I had a look at the tree implementation, and it seems to be a simple binary tree. Is that correct?

Yes.  A binary tree of lists.
 
Levente

and you are very welcome to experiment with alternative representations.

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: free-chunk-management-in-the-cog-vm

Levente Uzonyi
 
Hi Eliot,

On Tue, 19 Jun 2018, Eliot Miranda wrote:

> To keep it simple.  The scheme really comes from the free lists.  There
is either a 32 or a 64 element array with all the free chunks from size 1
to 32 or 1 to 64.  So this gives fast allocation for the common case of a
smallish object.  Then what to do with element 0?  A binary tree works
well since it is populated only with larger chunks and, given that all
chunks of the same size are, effectively, a single node in the tree, the
tree typically isn't that large.  That and some very simple balancing
hacks keep it from degenerating into a list whenever I've looked at its
state (although we might investigate organizing it as an AVL tree; I like
AVL trees but they're a lot of work for little gain if in practice a tree
doesn't degenerate).

Well, random binary trees have ~1.39 x log(n) average height, so
performance can be really good if the order the chunk sizes are added and
removed do not unbalance the tree too much.

The reason I asked about this was that I couldn't figure out why the trees
were said to be AVL trees. And I would have used a hash table with open
addressing over an AVL tree, because that's fairly easy to implement and
has good performance, while AVL trees take a lot more effort to implement.

>
> The use of 32 or 64 elements allows the use of a bitmask word to serve
as a non-empty list cache, so that when the system looks for chunks bigger
than needed (if a smaller list is empty) it is fast to check the list(s)
at 2x, 3x, etc...
>
> A hash table needs heap space to organize and unlike the binary tree
this space may need to grow if the number of free chunks is very large.

Based on your answers, I have got the impression that the performance of
these data structures have negligible effect on the overall GC
performance, so it's probably not worth to try to optimize them.

Levente