[BUG] Cannot compile cascade sent to block

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[BUG] Cannot compile cascade sent to block

Christoph Thiede

Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far. According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

Best,
Christoph



bugfix compiling block cascades.2.cs (2K) Download Attachment
Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier
Hi Christoph,
it's possible that I've already fixed that a few years ago, but it was not considered useful enough.
Is this really a problem?

Le sam. 28 déc. 2019 à 15:53, Thiede, Christoph <[hidden email]> a écrit :

Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far. According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

Best,
Christoph



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede

Hi Nicolas,


Is this really a problem?


From my point of view, this is clearly a bug. Otherwise, it would be an unnecessary restriction. Personally, it hurts my pride of the wonderful Smalltalk implementation a bit that such an edge case does not work properly ;)

Why shouldn't we fix a bug? Or why would you not consider it a bug? We already check blockExtent for nil at some other places. And the changeset is really small.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Samstag, 28. Dezember 2019 16:41:40
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,
it's possible that I've already fixed that a few years ago, but it was not considered useful enough.
Is this really a problem?

Le sam. 28 déc. 2019 à 15:53, Thiede, Christoph <[hidden email]> a écrit :

Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far. According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

Best,
Christoph



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Levente Uzonyi
In reply to this post by Nicolas Cellier
On Sat, 28 Dec 2019, Nicolas Cellier wrote:

> Hi Christoph,
> it's possible that I've already fixed that a few years ago, but it was not considered useful enough.
> Is this really a problem?

This time it's a different problem. The second special selector
(#whileFalse in this case) leaves the BlockNode with nil blockExtent,
which causes various issues depending on how you try to compile the code.

IIRC, what you fixed was the case:

true
  ifTrue: [ 1 ];
  ifFalse: [ 2 ].


Levente

>
> Le sam. 28 déc. 2019 à 15:53, Thiede, Christoph <[hidden email]> a écrit :
>
>       Hi all! :-)
>
>
>       Steps to reproduce:
>
>       Do it:
>
>             [true]
> whileTrue;
> whileFalse
>
>
> Expected behavior:
> The script is compiled and executed (and an infinite loop occurs).
>
> Actual behavior:
> [IMAGE]
>
> Further investigations:
> In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.
>
> Similar bug:
> Try to do:
>       Object newSubclass compile:  'iWontCompile: foo
> foo.
> [true]
> whileTrue;
> whileFalse'
>
> Fails with:
> [IMAGE]
> Again, the problem is that blockExtent isNil.
>
> Workaround:
> Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.
>
> Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.
>
>
> How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far. According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not
> mention this fact.
> Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would
> be great to hear if this is the right way to solve the bug, or just extending a workaround :-)
>
> Best,
> Christoph
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Eliot Miranda-2
In reply to this post by Christoph Thiede
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?
This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()


So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent. However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.


It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Eliot Miranda-2
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede

Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot


Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Eliot Miranda-2
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot




Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot




Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier
Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <[hidden email]> a écrit :
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot




Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede

Ooooh, my patch was erroneous ... I discovered this while working at a total different thing and it took me several hours to trace recurring VM crashes back to this change ...


Compare the following before and after loading my commit:

PreferenceWizardMorph compile: (PreferenceWizardMorph sourceCodeAt: #updateWindowBounds).
(PreferenceWizardMorph >> #updateWindowBounds) symbolic edit
There is a difference!

(blue is correct)
I don't get the full explanation at the moment, but apparently, there *are* made some relevant transformations to the receiver, which is offsetting the temp index because of the nested closure or something. Transformations are more than optimization, I guess.

The following change appears to fix the current problem:

But I won't commit this to the inbox until I will have made sure that I understand whether and why this is (hopefully) correct. So long, please move Compiler-ct.414 to the Treated Inbox. So sorry!


> cose over -> close over
> throguh -> through

Thank you, will fix them with my next commit.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 01:04 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <[hidden email]> a écrit :
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot




Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier
Hi Christoph,
what about removing the copy in originalReceiver:

receiver: rcvr arguments: args precedence: p
     receiver := rcvr.
     originalReceiver := rcvr.
     arguments := args.
     originalArguments := arguments copy.
     sizes := Array new: arguments size.
     precedence := p

Le mar. 31 déc. 2019 à 02:08, Thiede, Christoph <[hidden email]> a écrit :

Ooooh, my patch was erroneous ... I discovered this while working at a total different thing and it took me several hours to trace recurring VM crashes back to this change ...


Compare the following before and after loading my commit:

PreferenceWizardMorph compile: (PreferenceWizardMorph sourceCodeAt: #updateWindowBounds).
(PreferenceWizardMorph >> #updateWindowBounds) symbolic edit
There is a difference!

(blue is correct)
I don't get the full explanation at the moment, but apparently, there *are* made some relevant transformations to the receiver, which is offsetting the temp index because of the nested closure or something. Transformations are more than optimization, I guess.

The following change appears to fix the current problem:

But I won't commit this to the inbox until I will have made sure that I understand whether and why this is (hopefully) correct. So long, please move Compiler-ct.414 to the Treated Inbox. So sorry!


> cose over -> close over
> throguh -> through

Thank you, will fix them with my next commit.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 01:04 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <[hidden email]> a écrit :
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot





Reply | Threaded
Open this post in threaded view
|

Change management and version control during releases

Jakob Reschke
In reply to this post by Eliot Miranda-2
Hello,

On 2019-12-30T17:51 Eliot Miranda <[hidden email]> wrote under the subject "[BUG] Cannot compile cascade sent to block":

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)

As long as the number of commits to manage during and after a release is no too big to look through, I think there is no immediate need to act.

Other systems use branches to deal with this. Either in the version control system or in the code. Many Git workflows have some kind of release branches on which you finish releases, and some kind of development mainline which comes closest to the non-release function of Trunk.

It seems like branch management is easier in Monticello with separated repositories. Same work in different repositories on the same platform sounds like forks of a project on GitHub to me. Eliot's concern is about changing images to follow a different repository. I think this is conceptually equal to switching the branch that the image should follow (update stream), including the loading of changes to check out the target branch. 

Of course, the switching would fail if one is not careful to provide proper migration code together with breaking changes.

This is only relevant for long-lived/valuable images. In a workflow with throw-away images you would just build a new one that is based on the other repository/branch.

Monticello configurations that can be loaded and implicit configurations for the update stream are already there. What would be missing is a simple UI to perform the switching. Anything else? 

It might make sense to recollect the requirements for such a facility, also to assess its value. For example, should it support only to "upgrade" an image (from Trunk to a release with a higher aggregate version number, from a release to Trunk, from release to a later release) or also to "downgrade" an image (from Trunk to any release, from one release to a previous one)?

Kind regards,
Jakob



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Christoph Thiede
In reply to this post by Nicolas Cellier

Hi Nicolas,


this appears to work as well. Could you - or someone else - imagine why Bert added a copy in "Compiler-bf.293"? Don't want to break something else.


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 23:59:40
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,
what about removing the copy in originalReceiver:

receiver: rcvr arguments: args precedence: p
     receiver := rcvr.
     originalReceiver := rcvr.
     arguments := args.
     originalArguments := arguments copy.
     sizes := Array new: arguments size.
     precedence := p

Le mar. 31 déc. 2019 à 02:08, Thiede, Christoph <[hidden email]> a écrit :

Ooooh, my patch was erroneous ... I discovered this while working at a total different thing and it took me several hours to trace recurring VM crashes back to this change ...


Compare the following before and after loading my commit:

PreferenceWizardMorph compile: (PreferenceWizardMorph sourceCodeAt: #updateWindowBounds).
(PreferenceWizardMorph >> #updateWindowBounds) symbolic edit
There is a difference!

(blue is correct)
I don't get the full explanation at the moment, but apparently, there *are* made some relevant transformations to the receiver, which is offsetting the temp index because of the nested closure or something. Transformations are more than optimization, I guess.

The following change appears to fix the current problem:

But I won't commit this to the inbox until I will have made sure that I understand whether and why this is (hopefully) correct. So long, please move Compiler-ct.414 to the Treated Inbox. So sorry!


> cose over -> close over
> throguh -> through

Thank you, will fix them with my next commit.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 01:04 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <[hidden email]> a écrit :
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot





Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Cannot compile cascade sent to block

Nicolas Cellier


Le mer. 1 janv. 2020 à 20:12, Thiede, Christoph <[hidden email]> a écrit :

Hi Nicolas,


this appears to work as well. Could you - or someone else - imagine why Bert added a copy in "Compiler-bf.293"? Don't want to break something else.


I can't speak for Bert, but IMO, either by mimetism with originalArguments, or fear of possible side effects if sharing.

Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 23:59:40
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,
what about removing the copy in originalReceiver:

receiver: rcvr arguments: args precedence: p
     receiver := rcvr.
     originalReceiver := rcvr.
     arguments := args.
     originalArguments := arguments copy.
     sizes := Array new: arguments size.
     precedence := p

Le mar. 31 déc. 2019 à 02:08, Thiede, Christoph <[hidden email]> a écrit :

Ooooh, my patch was erroneous ... I discovered this while working at a total different thing and it took me several hours to trace recurring VM crashes back to this change ...


Compare the following before and after loading my commit:

PreferenceWizardMorph compile: (PreferenceWizardMorph sourceCodeAt: #updateWindowBounds).
(PreferenceWizardMorph >> #updateWindowBounds) symbolic edit
There is a difference!

(blue is correct)
I don't get the full explanation at the moment, but apparently, there *are* made some relevant transformations to the receiver, which is offsetting the temp index because of the nested closure or something. Transformations are more than optimization, I guess.

The following change appears to fix the current problem:

But I won't commit this to the inbox until I will have made sure that I understand whether and why this is (hopefully) correct. So long, please move Compiler-ct.414 to the Treated Inbox. So sorry!


> cose over -> close over
> throguh -> through

Thank you, will fix them with my next commit.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Dienstag, 31. Dezember 2019 01:04 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <[hidden email]> a écrit :
Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <[hidden email]> a écrit :
Hi Christoph,
Because sending a message unstack receiver, and replace it with returned object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <[hidden email]> a écrit :

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.


Alright, done, hope you like it :)

I still don't see why cascadeReceiver is a copy, though ...


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Montag, 30. Dezember 2019 17:51:36
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph, Hi Nicolas,

On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!


However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?


As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 21:01:04
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <[hidden email]> wrote:

Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.


Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?


Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
 

This would explain the following outputs:

[|x| x:=true] whileFalse. "will be optimized"
thisContext tempNames. #('x')
  compared to:
[|x| x:=true] yourself whileFalse. "wont be optimized"
thisContext tempNames #()

Exactly.
 

So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.

And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
 
However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.

Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.

It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):

[self confirm: #foo]
whileTrue;
whileFalse

Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
 

First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!

You have it.
 
There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
Did I explain this understandable? :-)

By applying the following single patch, I can fix the problem:

Parser >> #cascade
" {; message} => CascadeNode."

- | rcvr msgs |
+ | rcvr cascadeRcvr msgs|
parseNode canCascade ifFalse:
[^self expected: 'Cascading not'].
parseNode ensureCanCascade: encoder.
rcvr := parseNode cascadeReceiver.
cascadeRcvr := rcvr copy.
msgs := OrderedCollection with: parseNode.
[self match: #semicolon]
whileTrue: 
[parseNode := rcvr.
(self messagePart: 3 repeat: false)
ifFalse: [^self expected: 'Cascade'].
parseNode canCascade ifFalse:
[^self expected: '<- No special messages'].
parseNode ensureCanCascade: encoder.
parseNode cascadeReceiver.
msgs addLast: parseNode].
- parseNode := CascadeNode new receiver: rcvr messages: msgs
parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs

All relevant tests pass (but I don't know how good their coverage is).
Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
If not, how would you think about this change? :-)

(Sorry for the long text, I did my best to maximize the compression ratio!)

Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)

 

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
 
Hi Christoph,

On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <[hidden email]> wrote:



Hi all! :-)


Steps to reproduce:

Do it:

[true]
whileTrue;
whileFalse

Expected behavior:
The script is compiled and executed (and an infinite loop occurs).

Actual behavior:
<pastedImage.png>


Further investigations:
In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.

Similar bug:
Try to do:
Object newSubclass compile:  'iWontCompile: foo
foo.
[true]
whileTrue;
whileFalse'
Fails with:
<pastedImage.png>

Again, the problem is that blockExtent isNil.

Workaround:
Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.

Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.


How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.

blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.

The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.

If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).

The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 

Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.

So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).

But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.

Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).

According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)

I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)

Best,
Christoph
<bugfix compiling block cascades.2.cs>




--
_,,,^..^,,,_
best, Eliot