empty blocks optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

empty blocks optimization

Florin Mateoc-4
 
Hi,

I was thinking of implementing a relatively simple optimization for empty blocks - there's something in the air about blocks lately :)

The compiler would create them as a new kind of literal (a subclass of BlockClosure), so there would be no block creation at runtime. Even the bytecodes for them would be simplified - it would be just a load literal. As far as I can tell, this would also not require any VM or bytecode changes.
I also don't think they would clash with the coming full block closures.

See attached to see what the new class definition would look like. By employing Eliot's trick of re-using startpc, they can even return a non-nil constant or other literals, and thus can even cover a little more than just purely empty blocks. And we could also reuse these literals, there is no point in creating multiple instances for []. 
Given how often we use "at: aKey ifAbsent: []" or "detect: [:| ...] ifNone: []", I think it would be worth it, but let's see what you guys think.
Worst case, I'll just implement it in GraalSqueak :). Just kidding, I'll do it in GraalSqueak anyway :)

Florin

EmptyBlockClosure.st (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: empty blocks optimization

Clément Béra
 
Hi Florin,

What you're describing is a subset of the block optimizations present in other Smalltalk.

What typically happens in some other Smalltalks:
1 Blocks without any uses of the enclosing environment (variables or non local returns) are called clean block, the BlockClosure instance can be created at compiled time, such blocks do not require any allocation at runtime, this is what you described for empty blocks.
2 Blocks without non local returns are called copying block, the BlockClosure instance has to be created at runtime to hold copied values and the indirection vector if required, but does not need to hold a reference to the outerContext, which in practice allows some optimizations (in Cog it avoids allocation an object to marry the frame, in GraalSqueak it would allow to avoid having a virtual frame for the block outer context in some cases).
3 Blocks with non local returns are full blocks, and everything is required.

By default Squeak only uses 3, the reason is that 1 and 2 decrease debugging possibilities, and 1 also messes up identity. Your optimization has both flaws. Of course, if you do the optimization only for empty blocks, and not for all blocks without access to the enclosing environment, you limit the damage done reducing debugging possibilities (when inspecting the empty block the outer context reference would be nil instead of the outer context, but you cannot step in the debugger into empty blocks, so damage is limited). You still mess up identity. Here are code snippets showing the 2 problems:
Debugging
DoIt>> [ ] outerContext == thisContext.
With your optimization the DoIt now yields a different result (false instead of true)
Identity
A>>foo
  ^ [ ]
DoIt>> A new foo == A new foo.
With your optimization the DoIt now yields a different result (true instead of false)

One idea by moving from BlockClosure to FullBlockClosure (with a dedicated compiled block per closure) was, among other things, to allow block 1 and 2 to be implemented easily. There are various issues with the startpc hack that FullBlockClosure solves in a nice way. For example, if you try to do your optimization in Cog, Cog's JIT may get confused on where to find the bytecode of the BlockClosure stored in the literal and the optimization may not work (Cog does not JIT unreachable bytecode). The outerContext reference can also be nil instead of a fake context with full blocks, there's a bit in the full block closure instruction for this case.

It would be also interesting to implement Block 2 in GraalSqueak and see how much it reduces deoptimization metadata. I believe Tim did some similar experiment in RSqueak.

Good luck implementing your optimization. Have fun :-)

On Sat, Jan 11, 2020 at 7:42 AM Florin Mateoc <[hidden email]> wrote:
 
Hi,

I was thinking of implementing a relatively simple optimization for empty blocks - there's something in the air about blocks lately :)

The compiler would create them as a new kind of literal (a subclass of BlockClosure), so there would be no block creation at runtime. Even the bytecodes for them would be simplified - it would be just a load literal. As far as I can tell, this would also not require any VM or bytecode changes.
I also don't think they would clash with the coming full block closures.

See attached to see what the new class definition would look like. By employing Eliot's trick of re-using startpc, they can even return a non-nil constant or other literals, and thus can even cover a little more than just purely empty blocks. And we could also reuse these literals, there is no point in creating multiple instances for []. 
Given how often we use "at: aKey ifAbsent: []" or "detect: [:| ...] ifNone: []", I think it would be worth it, but let's see what you guys think.
Worst case, I'll just implement it in GraalSqueak :). Just kidding, I'll do it in GraalSqueak anyway :)

Florin


--
Reply | Threaded
Open this post in threaded view
|

Re: empty blocks optimization

Florin Mateoc-4
 
Hi Clement,

Thank you for your reply. I know what clean, copying and full block closures are, I was intentionally only targeting a specific subset of clean blocks. Perhaps I should have called them true literal blocks. It is the same difference as the one between literal arrays and ones constructed at runtime (#() vs {}).
You don't want or need to step into [] or [1] or [#1]. And for the same reason that you wouldn't ever step into such a literal block, why would you need a non-nil outerContext? Since you would never see it in the debugger or on the context stack, when would you need to inspect it or explore its outerContext?
And I simply don't understand what you are trying to say about identity. Sure, every singleton has the "problem" that you describe. It is a feature, not a bug

The literal would be passed around as an object, and it would have a (real) method value (plus variants), just like any other blocks, why would Cog be confused about it?

GraalSqueak already has compiledBlock objects, although not as literals, but I would expect less bang for the bug to implement something for clean blocks in general.

I am not sure if the attachment made it to the list or not, if it didn't and you would like to see the tentative implementation, I can paste it inline

Thank you,
Florin



On Sat, Jan 11, 2020 at 4:47 AM Clément Béra <[hidden email]> wrote:
 
Hi Florin,

What you're describing is a subset of the block optimizations present in other Smalltalk.

What typically happens in some other Smalltalks:
1 Blocks without any uses of the enclosing environment (variables or non local returns) are called clean block, the BlockClosure instance can be created at compiled time, such blocks do not require any allocation at runtime, this is what you described for empty blocks.
2 Blocks without non local returns are called copying block, the BlockClosure instance has to be created at runtime to hold copied values and the indirection vector if required, but does not need to hold a reference to the outerContext, which in practice allows some optimizations (in Cog it avoids allocation an object to marry the frame, in GraalSqueak it would allow to avoid having a virtual frame for the block outer context in some cases).
3 Blocks with non local returns are full blocks, and everything is required.

By default Squeak only uses 3, the reason is that 1 and 2 decrease debugging possibilities, and 1 also messes up identity. Your optimization has both flaws. Of course, if you do the optimization only for empty blocks, and not for all blocks without access to the enclosing environment, you limit the damage done reducing debugging possibilities (when inspecting the empty block the outer context reference would be nil instead of the outer context, but you cannot step in the debugger into empty blocks, so damage is limited). You still mess up identity. Here are code snippets showing the 2 problems:
Debugging
DoIt>> [ ] outerContext == thisContext.
With your optimization the DoIt now yields a different result (false instead of true)
Identity
A>>foo
  ^ [ ]
DoIt>> A new foo == A new foo.
With your optimization the DoIt now yields a different result (true instead of false)

One idea by moving from BlockClosure to FullBlockClosure (with a dedicated compiled block per closure) was, among other things, to allow block 1 and 2 to be implemented easily. There are various issues with the startpc hack that FullBlockClosure solves in a nice way. For example, if you try to do your optimization in Cog, Cog's JIT may get confused on where to find the bytecode of the BlockClosure stored in the literal and the optimization may not work (Cog does not JIT unreachable bytecode). The outerContext reference can also be nil instead of a fake context with full blocks, there's a bit in the full block closure instruction for this case.

It would be also interesting to implement Block 2 in GraalSqueak and see how much it reduces deoptimization metadata. I believe Tim did some similar experiment in RSqueak.

Good luck implementing your optimization. Have fun :-)

On Sat, Jan 11, 2020 at 7:42 AM Florin Mateoc <[hidden email]> wrote:
 
Hi,

I was thinking of implementing a relatively simple optimization for empty blocks - there's something in the air about blocks lately :)

The compiler would create them as a new kind of literal (a subclass of BlockClosure), so there would be no block creation at runtime. Even the bytecodes for them would be simplified - it would be just a load literal. As far as I can tell, this would also not require any VM or bytecode changes.
I also don't think they would clash with the coming full block closures.

See attached to see what the new class definition would look like. By employing Eliot's trick of re-using startpc, they can even return a non-nil constant or other literals, and thus can even cover a little more than just purely empty blocks. And we could also reuse these literals, there is no point in creating multiple instances for []. 
Given how often we use "at: aKey ifAbsent: []" or "detect: [:| ...] ifNone: []", I think it would be worth it, but let's see what you guys think.
Worst case, I'll just implement it in GraalSqueak :). Just kidding, I'll do it in GraalSqueak anyway :)

Florin


--
Reply | Threaded
Open this post in threaded view
|

Re: empty blocks optimization

Clément Béra
 


On Sat, Jan 11, 2020 at 5:10 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Clement,

Thank you for your reply. I know what clean, copying and full block closures are, I was intentionally only targeting a specific subset of clean blocks. Perhaps I should have called them true literal blocks. It is the same difference as the one between literal arrays and ones constructed at runtime (#() vs {}).
You don't want or need to step into [] or [1] or [#1]. And for the same reason that you wouldn't ever step into such a literal block, why would you need a non-nil outerContext? Since you would never see it in the debugger or on the context stack, when would you need to inspect it or explore its outerContext?  
And I simply don't understand what you are trying to say about identity. Sure, every singleton has the "problem" that you describe. It is a feature, not a bug
 
Legacy code may rely on identity. It may be non trivial to figure out when an application crashes why it comes from this change.
There's two syntax for #() and {}, and you can pick which identity logic you want. Now you have one syntax for blocks and [ ] has different identity logic than [ self foo ]. 
This is very confusing if when one does a change in the code then suddenly identity breaks other parts of the app.
 
Same thing for outerContext identity, but for outerContext I guess you're right that it does not really matter since most likely no code is relying on that.


The literal would be passed around as an object, and it would have a (real) method value (plus variants), just like any other blocks, why would Cog be confused about it?

GraalSqueak already has compiledBlock objects, although not as literals, but I would expect less bang for the bug to implement something for clean blocks in general.

I am not sure if the attachment made it to the list or not, if it didn't and you would like to see the tentative implementation, I can paste it inline

 
Ah, ok, just looked at the implementation. The startpc directly holds the constant to return, that's very neat, never thought about doing it that way. That should work on Cog. 
 
Thank you,
Florin



On Sat, Jan 11, 2020 at 4:47 AM Clément Béra <[hidden email]> wrote:
 
Hi Florin,

What you're describing is a subset of the block optimizations present in other Smalltalk.

What typically happens in some other Smalltalks:
1 Blocks without any uses of the enclosing environment (variables or non local returns) are called clean block, the BlockClosure instance can be created at compiled time, such blocks do not require any allocation at runtime, this is what you described for empty blocks.
2 Blocks without non local returns are called copying block, the BlockClosure instance has to be created at runtime to hold copied values and the indirection vector if required, but does not need to hold a reference to the outerContext, which in practice allows some optimizations (in Cog it avoids allocation an object to marry the frame, in GraalSqueak it would allow to avoid having a virtual frame for the block outer context in some cases).
3 Blocks with non local returns are full blocks, and everything is required.

By default Squeak only uses 3, the reason is that 1 and 2 decrease debugging possibilities, and 1 also messes up identity. Your optimization has both flaws. Of course, if you do the optimization only for empty blocks, and not for all blocks without access to the enclosing environment, you limit the damage done reducing debugging possibilities (when inspecting the empty block the outer context reference would be nil instead of the outer context, but you cannot step in the debugger into empty blocks, so damage is limited). You still mess up identity. Here are code snippets showing the 2 problems:
Debugging
DoIt>> [ ] outerContext == thisContext.
With your optimization the DoIt now yields a different result (false instead of true)
Identity
A>>foo
  ^ [ ]
DoIt>> A new foo == A new foo.
With your optimization the DoIt now yields a different result (true instead of false)

One idea by moving from BlockClosure to FullBlockClosure (with a dedicated compiled block per closure) was, among other things, to allow block 1 and 2 to be implemented easily. There are various issues with the startpc hack that FullBlockClosure solves in a nice way. For example, if you try to do your optimization in Cog, Cog's JIT may get confused on where to find the bytecode of the BlockClosure stored in the literal and the optimization may not work (Cog does not JIT unreachable bytecode). The outerContext reference can also be nil instead of a fake context with full blocks, there's a bit in the full block closure instruction for this case.

It would be also interesting to implement Block 2 in GraalSqueak and see how much it reduces deoptimization metadata. I believe Tim did some similar experiment in RSqueak.

Good luck implementing your optimization. Have fun :-)

On Sat, Jan 11, 2020 at 7:42 AM Florin Mateoc <[hidden email]> wrote:
 
Hi,

I was thinking of implementing a relatively simple optimization for empty blocks - there's something in the air about blocks lately :)

The compiler would create them as a new kind of literal (a subclass of BlockClosure), so there would be no block creation at runtime. Even the bytecodes for them would be simplified - it would be just a load literal. As far as I can tell, this would also not require any VM or bytecode changes.
I also don't think they would clash with the coming full block closures.

See attached to see what the new class definition would look like. By employing Eliot's trick of re-using startpc, they can even return a non-nil constant or other literals, and thus can even cover a little more than just purely empty blocks. And we could also reuse these literals, there is no point in creating multiple instances for []. 
Given how often we use "at: aKey ifAbsent: []" or "detect: [:| ...] ifNone: []", I think it would be worth it, but let's see what you guys think.
Worst case, I'll just implement it in GraalSqueak :). Just kidding, I'll do it in GraalSqueak anyway :)

Florin


--


--