pinning GC

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

pinning GC

Eliot Miranda-2
 
Hi All,

    as I've mentioned I've been thinking of implementing a pinning GC for the threaded Cog VM.  Pinning is really important for a threaded FFI since the GC can run in parallel with FFI calls and so move objects while FFI calls are in progress.  The idea then is to arrange that every object passed out through the FFI is pinned.  This is easy to do in a Smalltalk VM that has an object table; an object header bit is used as the "is pinned" flag.  The marshalling code in FFI calls checks the "is pinned" bit and if not set, allocates a clone of the object in a region of the heap that the GC does not compact, with the "is pinned" bit set, and does a become.

In a Smalltalk VM without an object table (Squeak, VisualAge etc) this is more challenging.  But a recent comment on Gilad Bracha's Newspeak blog by "TruePath" (who's he? ed.) points out a technique that applies:
    "Remember smalltalk is a dynamic language so every method call (and that is all there is) requires we check the type pointer and (if the PIC or other caches don't include an appropriate entry) use the type to resolve the method call. One could easily have a special type that escapes into the runtime on any message at which point the runtime replaces the reference the message call was dispatch on with the new location and retries the call. Indeed, any modern GC has to have some way of doing indirection like this so the heap can be compacted."

So the FFI marshalling code checks the "is pinned" bit, and if unset allocates a clone of the object in a region of the heap that the GC does not compact, changes the class/type field of the object to the special "i'm a forwarding corpse" value, and sets a forwarding pointer to the pinned copy.

There are two problems with this.

One, it doesn't work for objects with named inst vars; in Smalltalk an object's named inst vars are accessed directly.  But that's easy; one likely doesn't have any business handing out such objects through the FFI, so we can fail calls that do this, and provide special wrappers to hand-out indirect references to objects with named inst vars.  (There was a discussion on this approach last year; Andreas proposed a generic object handle scheme).

Two, there needs to be room in the corpse for a forwarding pointer to the pinned copy.  There isn't enough space in a zero-sized byte data.  But one has no business handing pout pointers to empty byte data anyway; attempts to write data into them must overwrite the heap, potentially disastrously.  So the FFI can either pass zero-sized objects as null pointers or fail the call.  So there will always be enough room in byte data to corpse it to a pinned copy because we'll only attempt the operation on non-empty objects (and the underlying heap representation will round up the size of any object to at least a pointers width).

So all one needs is a special class marker, say 0, for corpses.  The GC must be modified to follow the forwarding pointer through corpses, as must the message lookup machinery.  The heap must have a region that is not compacted (such as old space, which could have a free list and be compacted only on snapshot - i.e. writing a file that has the free space squeezed out, but that /does not/ move objects in the heap, such as VW's image file format, and hence the compaction actually happens on image load when the objects in the snapshot file are relocated and pointers swizzled).

This should have been obvious to me, but became so only on reading TruePath's comment (thanks again).

best
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Igor Stasenko

On 12 January 2011 03:24, Eliot Miranda <[hidden email]> wrote:

>
> Hi All,
>     as I've mentioned I've been thinking of implementing a pinning GC for the threaded Cog VM.  Pinning is really important for a threaded FFI since the GC can run in parallel with FFI calls and so move objects while FFI calls are in progress.  The idea then is to arrange that every object passed out through the FFI is pinned.  This is easy to do in a Smalltalk VM that has an object table; an object header bit is used as the "is pinned" flag.  The marshalling code in FFI calls checks the "is pinned" bit and if not set, allocates a clone of the object in a region of the heap that the GC does not compact, with the "is pinned" bit set, and does a become.
> In a Smalltalk VM without an object table (Squeak, VisualAge etc) this is more challenging.  But a recent comment on Gilad Bracha's Newspeak blog by "TruePath" (who's he? ed.) points out a technique that applies:
>     "Remember smalltalk is a dynamic language so every method call (and that is all there is) requires we check the type pointer and (if the PIC or other caches don't include an appropriate entry) use the type to resolve the method call. One could easily have a special type that escapes into the runtime on any message at which point the runtime replaces the reference the message call was dispatch on with the new location and retries the call. Indeed, any modern GC has to have some way of doing indirection like this so the heap can be compacted."
> So the FFI marshalling code checks the "is pinned" bit, and if unset allocates a clone of the object in a region of the heap that the GC does not compact, changes the class/type field of the object to the special "i'm a forwarding corpse" value, and sets a forwarding pointer to the pinned copy.
> There are two problems with this.
> One, it doesn't work for objects with named inst vars; in Smalltalk an object's named inst vars are accessed directly.  But that's easy; one likely doesn't have any business handing out such objects through the FFI, so we can fail calls that do this, and provide special wrappers to hand-out indirect references to objects with named inst vars.  (There was a discussion on this approach last year; Andreas proposed a generic object handle scheme).
> Two, there needs to be room in the corpse for a forwarding pointer to the pinned copy.  There isn't enough space in a zero-sized byte data.  But one has no business handing pout pointers to empty byte data anyway; attempts to write data into them must overwrite the heap, potentially disastrously.  So the FFI can either pass zero-sized objects as null pointers or fail the call.  So there will always be enough room in byte data to corpse it to a pinned copy because we'll only attempt the operation on non-empty objects (and the underlying heap representation will round up the size of any object to at least a pointers width).
> So all one needs is a special class marker, say 0, for corpses.  The GC must be modified to follow the forwarding pointer through corpses, as must the message lookup machinery.  The heap must have a region that is not compacted (such as old space, which could have a free list and be compacted only on snapshot - i.e. writing a file that has the free space squeezed out, but that /does not/ move objects in the heap, such as VW's image file format, and hence the compaction actually happens on image load when the objects in the snapshot file are relocated and pointers swizzled).
> This should have been obvious to me, but became so only on reading TruePath's comment (thanks again).


Eliot, one thing about 'forwarded' objects, which you calling the
forwarding corpse is that it can be used not only for pinning,
but also with #becomeForward primitive, making it work a lot faster,
since its not require to scan whole heap to update references,
and update can be done during GC.
The only problem, as you pointed out, is the objects which don't have
enough space for forwarding pointer. But for this case, i think the
primitive can fall back and use old slow scheme.
There could be an option to replace an object header which will carry
out the forwarding pointer. But i'm not sure (don't remember) if this
option will work since you got only two lower bits in header to
indicate that object is forwarded.

Another problem is , that some primitives could ignore or escape an
introduced semantics, so careful check is needed.

Anyways it would be cool to kill two ducks (pinning and become) in one
shot if it possible :)

> best
> Eliot
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Josh Gargus


On Jan 11, 2011, at 11:48 PM, Igor Stasenko wrote:

>
> Eliot, one thing about 'forwarded' objects, which you calling the
> forwarding corpse is that it can be used not only for pinning,
> but also with #becomeForward primitive, making it work a lot faster,
> since its not require to scan whole heap to update references,
> and update can be done during GC.
> The only problem, as you pointed out, is the objects which don't have
> enough space for forwarding pointer. But for this case, i think the
> primitive can fall back and use old slow scheme.

I was thinking the same thing.  But wouldn't there remain the problem that Eliot mentioned about objects with named variables?

Cheers,
Josh


Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Steve Rees
 
Aren't named ivars only accessed by methods of the receiver? Or does
Squeak use access sends similar to Strongtalk?

In the former case a check to the topmost frame of each Process' stack,
and a check for corpsed objects on each method return should be enough, no?

In the latter case the class in the inline cache will differ, giving an
opportunity in the lookup to fixup the original. One might also check
the corpse bit in ivar accesses to allow an opportunity to replace
corpsed references ahead of a full GC.

The only case where I think this might still be a problem is when Cog
does method inlining and has inlined access sends, though even here it
would presumably have a type test ahead of the inlined code which the
corpse would fail because of the changed class, triggering an uncommon
branch or falling back to a traditional non-lined send (depending on the
approach Cog uses - I haven't looked at the code). Both of which give an
opportunity to handle the corpsed reference, so maybe there won't be a
problem here either.

I think you can also use this for two-way become by cloning both
objects, marking each as a corpse and having each corpse refer to the
clone of the other.

Regards, Steve

On 12/01/2011 09:37, Josh Gargus wrote:

>
>
> On Jan 11, 2011, at 11:48 PM, Igor Stasenko wrote:
>
>> Eliot, one thing about 'forwarded' objects, which you calling the
>> forwarding corpse is that it can be used not only for pinning,
>> but also with #becomeForward primitive, making it work a lot faster,
>> since its not require to scan whole heap to update references,
>> and update can be done during GC.
>> The only problem, as you pointed out, is the objects which don't have
>> enough space for forwarding pointer. But for this case, i think the
>> primitive can fall back and use old slow scheme.
> I was thinking the same thing.  But wouldn't there remain the problem that Eliot mentioned about objects with named variables?
>
> Cheers,
> Josh
>
>
>

--
You can follow me on twitter at http://twitter.com/smalltalkhacker

Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Igor Stasenko
In reply to this post by Josh Gargus

On 12 January 2011 10:37, Josh Gargus <[hidden email]> wrote:

>
>
> On Jan 11, 2011, at 11:48 PM, Igor Stasenko wrote:
>
>>
>> Eliot, one thing about 'forwarded' objects, which you calling the
>> forwarding corpse is that it can be used not only for pinning,
>> but also with #becomeForward primitive, making it work a lot faster,
>> since its not require to scan whole heap to update references,
>> and update can be done during GC.
>> The only problem, as you pointed out, is the objects which don't have
>> enough space for forwarding pointer. But for this case, i think the
>> primitive can fall back and use old slow scheme.
>
> I was thinking the same thing.  But wouldn't there remain the problem that Eliot mentioned about objects with named variables?
>

I was thinking about this problem before and was not really happy
about potential solutions of it.
In fact, it would be really nice to introduce a generic model for
forwarded objects,
which then could be used not just for FFI & #become but could also
serve as a transparent proxies.

It would be good to finally get rid of compact classes, so every
object will have 2 words header,
which is enough for encoding a forwarding pointer in class slot (or
use some flag to indicate that oop is forwarded and encode an index in
forwarding table).

But of course, the problem here that some bytecode are accessing the
object state directly, like reading/writing ivars,
so a code like following will break the system:

myMethod

  x := 1.
  self becomeForward: y.
  x := 5.

in the above, x is an instance variable of receiver.
If become primitive turns receiver oop into forwarding corpse, now we
have a problem that
after returning back to given method, a bytecode will attempt to write
new ivar value at old location,
instead of a new one, since receiver reference are not updated and
still points to old object memory location.

One possible solution might be that at the point of context
activation, right before starting interpreting the bytecode,
an interpreter should check if receiver is a forwarded oop, and if so,
then resolve the address of forwarded oop and use it as receiver
instead.


> Cheers,
> Josh
>


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Eliot Miranda-2
In reply to this post by Steve Rees
 


On Wed, Jan 12, 2011 at 1:59 AM, Steve Rees <[hidden email]> wrote:

Aren't named ivars only accessed by methods of the receiver?

 In classical Smalltalk notionally inst vars are only accessed directly by methods in the receiver's class or superclasses that have an inst size > 0.  However, become: can rebind the receiver so that an illegal direct inst var access can be made.  e.g. create a ByteArray and become it to a Point; sending x to the point accesses a potentially bogus pointer, the raw bits at 0 in the ByteArray.  So if there are extant activations whose receiver is changed through become or changeClass one can observe strange effects and/or crash the system.  The VM make go to some lengths to hide or mitigate these effects.  In my BrouHaHa Smalltalk-80 VM the JIT (bytecode to threaded code) did copy-down so that it could know that the class of self in a threaded code method was constant and so sends to self (~ 40% of all sends) didn't have to be checked.  But a become could change the class of self in an activation and so become also had to scan all activations and rebind the threaded code method in activations on the becommed objects so that self sends remained correct.
 
Or does Squeak use access sends similar to Strongtalk?

One can of course use accessors as a style, and easily modify the compiler to access them this way, but the system (along with most other Smalltalks) does provide direct access and it is used extensively.
 

In the former case a check to the topmost frame of each Process' stack, and a check for corpsed objects on each method return should be enough, no?

Checking on return is very expensive.  See my paper on context management in VisualWorks 5i.  Instead one could probably scan as part of of become/change class, but scan only activations in the stack zone and defer scanning contexts in the heap until they were faulted into the stack zone.  Faulting in is expensive anyway so adding a test for a corpse won't add much overhead.

In the latter case the class in the inline cache will differ, giving an opportunity in the lookup to fixup the original. One might also check the corpse bit in ivar accesses to allow an opportunity to replace corpsed references ahead of a full GC.

Again that kind of check isn't cheap.  Remember that the check must be made on inst var reads as well as writes.  When I added immutability to VisualWorks, which tests only writes, the total cost was about 3% to 5%, which was more than acceptable for the benefit, and it was so low precisely because an inst var write required a store check and so part of the immutability test could be folded into the store check, bringing down its overall cost.  But adding a similar check for corpses to inst var reads would probably add costs above 10% and that's getting expensive.  Since inst var access is common and become is relatively rare (we've got to be talking millions to one in normal code, right?) it makes sense to me to put the cost in become and rare operations such as faulting contexts into the stack zone.



The only case where I think this might still be a problem is when Cog does method inlining and has inlined access sends, though even here it would presumably have a type test ahead of the inlined code which the corpse would fail because of the changed class, triggering an uncommon branch or falling back to a traditional non-lined send (depending on the approach Cog uses - I haven't looked at the code). Both of which give an opportunity to handle the corpsed reference, so maybe there won't be a problem here either.

Right.  If one is doing adaptive optimization then become/change class has to take care to preserve optimized code invariants and/or dynamically deoptimize when it violates them.  But Cog doesn't do this /yet/ :)

 

I think you can also use this for two-way become by cloning both objects, marking each as a corpse and having each corpse refer to the clone of the other.

Right, noting the caveats we've discussed here.  It's all just a small matter of programming :)

best
Eliot
 

Regards, Steve


On 12/01/2011 09:37, Josh Gargus wrote:


On Jan 11, 2011, at 11:48 PM, Igor Stasenko wrote:

Eliot, one thing about 'forwarded' objects, which you calling the
forwarding corpse is that it can be used not only for pinning,
but also with #becomeForward primitive, making it work a lot faster,
since its not require to scan whole heap to update references,
and update can be done during GC.
The only problem, as you pointed out, is the objects which don't have
enough space for forwarding pointer. But for this case, i think the
primitive can fall back and use old slow scheme.
I was thinking the same thing.  But wouldn't there remain the problem that Eliot mentioned about objects with named variables?

Cheers,
Josh




--
You can follow me on twitter at http://twitter.com/smalltalkhacker


Reply | Threaded
Open this post in threaded view
|

Re: pinning GC

Eliot Miranda-2
In reply to this post by Josh Gargus
 


On Wed, Jan 12, 2011 at 1:37 AM, Josh Gargus <[hidden email]> wrote:


On Jan 11, 2011, at 11:48 PM, Igor Stasenko wrote:

>
> Eliot, one thing about 'forwarded' objects, which you calling the
> forwarding corpse is that it can be used not only for pinning,
> but also with #becomeForward primitive, making it work a lot faster,
> since its not require to scan whole heap to update references,
> and update can be done during GC.
> The only problem, as you pointed out, is the objects which don't have
> enough space for forwarding pointer. But for this case, i think the
> primitive can fall back and use old slow scheme.

I was thinking the same thing.  But wouldn't there remain the problem that Eliot mentioned about objects with named variables?

You're both right.  The issue of objects with named inst vars is important (see my reply to Steve's post; performance is one issue).  One wants to do lazy become, but without imposing checking costs on operations with high dynamic frequency.  That's what's so beautiful about changing the class; it piggy-backs on the inline cache check, and moves the actual check and fix-up to a rare and expensive path, that of an inline cache miss, without imposing any extra costs on the common case of a send to a non-corpse object.  The implementtion challenge is to come up with a scheme for objects with named inst vars that doesn't add large costs to inst var access.

Since the current Cog become has to scan activations in the stack zone anyway, using corpses and scanning the stack zone (basically eagerly becomming activations in the stack zone) is already cheaper than the current become, whose only significant optimization is to scan only the remembered table the stack zone and new space if all objects being becommed are young.


Thanks for this discussion.  This is making increasing sense.  So lazy become and corpses will be used for all non-zero-sized objects.  Cool.

best
Eliot

Cheers,
Josh