Bug in Process>>#terminate | Returning from unwind contexts

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in Process>>#terminate | Returning from unwind contexts

Christoph Thiede

Hi all,


besides my proposed solution approach for the never-returning sends to Context>>#runUntilErrorReturnFrom: [1], I recently stumbled upon another problem with context manipulation connected to #runUntilErrorReturnFrom:. But I think the cause is a different one and this one brings up a more general design discussion. Let me explain:


The bug itself is pretty simple to reproduce:


1. Do it: [self error] ensure: [^2]

2. Abandon the Error debugger

3. Do it: self error
Expected behavior: A second error debugger is shown
Actual behavior: An unwind error (Unwind error during termination) is shown!

Here's an explanation: If you inspect your UI process after step 2, you will see that two guard contexts (BlockClosure>>on:do: and BlockClosure>>ensure:) are installed right before [] in BlockClosure>>newProcess on the bottom of the sender stack. These contexts have been created by #runUntilErrorOrReturnFrom: as part of the context popping-down performed by Process>>#terminate when abandoning the debugger. Normally, they would have been removed again after unwinding all contexts from the process (and the UI process would have been terminated and been replaced by a new one, see #spawnNewProcessIfThisIsUI:). But because our "ensure: [^2]" has stopped the unwinding, #runUntilErrorOrReturnFrom: and senders are still alive and waiting for the unwinding to complete while the UI process is continuing its everyday world cycles. When the next error is raised at any time later, the UnhandledError guard gets invoked and the signaler context is returned back from Process>>#popTo:, and Process>>#terminate complains that the unwinding has not been successful.

So far about the problem, here are some thoughts and questions:

1. Should it be a legal operation at all to return from an unwind context? (I think it should, because a) it works fine unless Process>>#terminate gets involved, and b) it is a powerful instrument for certain control flows that need to ignore all exceptions.)
(1.1 If you think we should criminalize this operation, should we make efforts to exclude this behavior in #aboutToReturn:through: and other places? I fear this could become quite complex ...)
2. I'm attaching a small changeset to fix this issue by decriminalizing unwind errors and continuing the execution (with the new error) instead of showing an unwind error. It's a small patch of three lines that works surprisingly well for my use case, but I'm not sure whether there could be any other justified reasons for showing an unwind error. Do you know some other relevant reasons for unwinding errors?
3. Also, the solution approach begs a new question to me: Should the #terminate send indeed freeze for potentially hours of image time or millions of bytecodes, or are we required to find another way for this? I hope for the first because I really can't imagine an elegant alternative at the moment.

Looking forward to your thoughts! :-)

CC'ing Jaromir because you're working on the same method at the moment. Afaik you are only reworking the definition of #isTerminated, but probably you can share some interesting insights on this issue, too. :-)


Best,

Christoph


(PS: This message, as my previous one, has become quite long, I know this. General suggestions for reasonable shortening of my texts will be appreciated, I'm always trying to waste as little as possible of your time ... :-))


[1] http://forum.world.st/BUG-REGRESSION-while-debugging-Generator-nextPut-tp5108125p5127567.html






unwind-errors.1.cs (4K) Download Attachment
Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Christopher,

I've just been wondering what that part of #terminate was for... Thanks for
the answer :)

I tried a recursive version of your example and it behaves even worse:

   [self error: 'x1'] ensure: [[self error: 'x2'] ensure: [[[[self error:
'x3'] ensure: [^2]] ensure: [^2]] ensure: [^2]]]

And this one's bad too - hit Abandon and you get the unwind error right
away:

   [self error] ensure: [1/0]

After your fix/workaround they miraculously seem to behave again :)

A question: why not simply `ctxt restart`? Without assigning to
suspendedContext. Like this:

#terminate
                "... lots of code"
                ctxt := self popTo: suspendedContext bottomContext.
                ctxt == suspendedContext bottomContext ifFalse:
>>>    [ctxt restart.
                        "self debug: ctxt title: 'Unwind error during termination'"].
                "Set the context to its endPC for the benefit of isTerminated."
                ctxt pc: ctxt endPC]


And finally, I noticed #valueUninterruptably uses return from an unwind
context as well and it doesn't seem to function properly due to the bug (ok
after your fix):

   [1/0] valueUninterruptably

First doit generates ZeroDivide, hit Abandon and the next doit shows unwind
error but Proceed will just end the process (I'd expect another ZeroDivide -
am I right?)

#valueUninterruptably
        "Prevent remote returns from escaping the sender.  Even attempts to
terminate (unwind) this process will be halted and the process will resume
here.  A terminate message is needed for every one of these in the sender
chain to get the entire process unwound."

        ^ self ifCurtailed: [^ self]


Besides #isTerminated I've been trying to refactor #terminate and
#newProcess to avoid 'cannot return' errors and unify different kinds of
termination (active process termination and suspended process termination).
I've sent it to the Inbox for discussion - see here:
http://forum.world.st/The-Inbox-Kernel-jar-1380-mcz-td5127524.html

Thanks,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Christoph Thiede

Hi Jaromir,


thanks a lot for your answer! Great to hear that you have found some other examples that would benefit from my patch as well. :-)


> A question: why not simply `ctxt restart`? Without assigning to suspendedContext. Like this:
> #terminate
>                 "... lots of code"
>                 ctxt := self popTo: suspendedContext bottomContext.
>                 ctxt == suspendedContext bottomContext ifFalse:
> >>>                 [ctxt restart.
>                         "self debug: ctxt title: 'Unwind error during termination'"].
>                 "Set the context to its endPC for the benefit of isTerminated."
>                 ctxt pc: ctxt endPC]

The "ctxt restart" was necessary to have the raised exception handled again after it has been absorbed by #runUntilErrorOrReturnFrom:.
I'm not sure about the exact semantics of suspendedContext, but shouldn't it always point to the latest context frame unless the process is running? Should we rather set suspendedContext to nil here?

That's another point I find questionary about my changeset - if the process refuses to terminate, it will be resumed again, even if it has been suspended before. Can or should we change that or is it already the most logical way (because every "unwinding" essentially means to "resume" the process anyway)?
Anyway, I think the method return after restarting the context is necessary because otherwise, its pc will be manipulated which we don't want unless the process has really been terminated.

#valueUninterruptably is also an interesting example. But this should also have been fixed using my proposal, shouldn't it? :-)

Besides #isTerminated I've been trying to refactor #terminate and #newProcess to avoid 'cannot return' errors and unify different kinds of termination (active process termination and suspended process termination).

Nice work!

Best,
Christoph

Von: Squeak-dev <[hidden email]> im Auftrag von Jaromir Matas <[hidden email]>
Gesendet: Dienstag, 9. März 2021 10:54:02
An: [hidden email]
Betreff: Re: [squeak-dev] Bug in Process>>#terminate | Returning from unwind contexts
 
Hi Christopher,

I've just been wondering what that part of #terminate was for... Thanks for
the answer :)

I tried a recursive version of your example and it behaves even worse:

   [self error: 'x1'] ensure: [[self error: 'x2'] ensure: [[[[self error:
'x3'] ensure: [^2]] ensure: [^2]] ensure: [^2]]]

And this one's bad too - hit Abandon and you get the unwind error right
away:

   [self error] ensure: [1/0]

After your fix/workaround they miraculously seem to behave again :)

A question: why not simply `ctxt restart`? Without assigning to
suspendedContext. Like this:

#terminate
                "... lots of code"
                ctxt := self popTo: suspendedContext bottomContext.
                ctxt == suspendedContext bottomContext ifFalse:
>>>                 [ctxt restart.
                        "self debug: ctxt title: 'Unwind error during termination'"].
                "Set the context to its endPC for the benefit of isTerminated."
                ctxt pc: ctxt endPC]


And finally, I noticed #valueUninterruptably uses return from an unwind
context as well and it doesn't seem to function properly due to the bug (ok
after your fix):

   [1/0] valueUninterruptably

First doit generates ZeroDivide, hit Abandon and the next doit shows unwind
error but Proceed will just end the process (I'd expect another ZeroDivide -
am I right?)

#valueUninterruptably
        "Prevent remote returns from escaping the sender.  Even attempts to
terminate (unwind) this process will be halted and the process will resume
here.  A terminate message is needed for every one of these in the sender
chain to get the entire process unwound."

        ^ self ifCurtailed: [^ self]


Besides #isTerminated I've been trying to refactor #terminate and
#newProcess to avoid 'cannot return' errors and unify different kinds of
termination (active process termination and suspended process termination).
I've sent it to the Inbox for discussion - see here:
http://forum.world.st/The-Inbox-Kernel-jar-1380-mcz-td5127524.html

Thanks,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Christoph,

I'd like to recommend to use the fixed #isTerminated: (it's not in the Trunk
yet because it can be done nicer but this one is at least working ok).

isTerminated
        "Answer if the receiver is terminated, or at least terminating."
        self isActiveProcess ifTrue: [^ false].
        ^suspendedContext isNil
          or: ["If the suspendedContext is the bottomContext and the pc is at the
endPC,
                then there is nothing more to do."
                suspendedContext isBottomContext
                and: [suspendedContext pc >= suspendedContext endPC
                        or: [suspendedContext closure isNil
                                and: [suspendedContext methodClass == Process
                                and: [suspendedContext selector == #terminate]]]]]

Originally I noticed the bug on a nonsensical example but in your scenario
it also plays a role! Some processes will be rightly considered terminated
and disappear from Process Browser (#runUntilErrorOrReturnFrom among others,
in my scenarios).

It seems to me after `ctxt restart` the thread never returns back so the
assignment is moot?

At any rate it seems to me the problem is not related to non-local returns
exclusively. Other scenario I have now:
x:=nil. [self error: 'x1'] ensure: [[self error: 'x2'] ensure: [[self error:
'x3'] ensure: [x:=3]. x:=2]. x:=1].
x

Unfortunately I've got to run now but I'll continue investigating tomorrow.
Very interesting! :)
Thanks,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Christoph, hi all,

There's a couple of (probably) independent issues:

1. broken #isTerminated - fix suggested in the previous post, will send to
The Inbox

2. Debugger leaving unterminated processes behind (see in Process Browser).
The problem is in my view in Debugger >> windowIsClosing:

        Debugger >> windowIsClosing
                "My window is being closed; clean up. Restart the low space watcher."

                interruptedProcess == nil ifTrue: [^ self].
                interruptedProcess terminate.

>>> at this point the interruptedProcess may still not be terminated, so I
suggest to replace the above line by:

>>> [interruptedProcess isTerminated] whileFalse: [interruptedProcess
terminate].

                interruptedProcess := nil.
               
                contextStack := nil.
                receiverInspector := nil.
                contextVariablesInspector := nil.
                Smalltalk installLowSpaceWatcher.  "restart low space handler"

This patch would cleanup unterminated processes when closing the debugger
and resolve my examples without non-local returns but your example with
non-local return would still remain unresolved.

Note: I know nothing about the Debugger so take my patch as a desperate
attempt :)

3. your fix `ctxt restart` applied for this recursive scenario:

        x:=nil. [self error: 'x1'] ensure: [
                        [self error: 'x2'] ensure: [
                                [self error: 'x3'] ensure: [x:=3].
                                x:=2].
                        x:=1].

... leads me to a question what the expected beahvior actually is: If you
Abandon the FIRST error debugger - do you expect to continue to the second
debugger, abandon it and continue to the third debugger or rather quit right
after the FIRST debugger Abandon? If the former then what would the
difference between Proceed and Abandon be? I'm no longer sure of anything :)
Your fix leads to the former.

Please let me know what you think.
Thanks,




-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Christoph and all,

What is the expected semantics of the Debugger's Abandon? To quit and do
nothing more to the debugged process or continue unwinding?

I thought the former was right but the Debugger does the latter. Is it
really necessary to run all unwinds when closing the abandoned Debugger
window?

Because that is the cause of the Unwind errors... Normally, when hitting
Abandon, the debugged process terminates and the current UI continues.

But if there are unfinished unwinds in the debugged process and we hit
Abandon, the control jumps from the UI process to the debugged one (as
Christoph described) and stays there as an active UI process which creates
an impossible situation where we have:

- an active process being the former debugged one which is not the real
active one but only referred to as an effective process
- the real active process (genuine process) is the UI process previously
controlling the debugging with the effective process variable set
- this real active process is even erroneously answering true to
#isTerminated (because it's not 'active' and its suspendedContext is nil)
- and this real active process will eventually become the active UI again...
- and all this further leads to two simultaneous UI processes alternating
control... and more... a nightmare ;)

Check the enclosed printscreen...
<http://forum.world.st/file/t372955/After_NLR_mixup_-_genuine2.png>

Long story short if I'm not wrong and the Debugger shouldn't do all the
unwinds when Abandoned, then the fix is a oneliner:
(Instead of termintating the debugged process via #terminate which runs the
unwinds, let's just 'make it' terminated)

Debugger >> windowIsClosing
        "My window is being closed; clean up. Restart the low space watcher."

        interruptedProcess == nil ifTrue: [^ self].
>>> "Now terminate the interruptedProcess but without unwinds!"
>>> interruptedProcess suspendedContext terminate.
        interruptedProcess := nil.
       
        contextStack := nil.
        receiverInspector := nil.
        contextVariablesInspector := nil.
        Smalltalk installLowSpaceWatcher.  "restart low space handler"


Thanks,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Levente Uzonyi
Hi Jaromir,

If we just terminated the process as you suggested without doing proper
unwinding, resource management would become impossible and there would be
leaked resources (files, sockets, etc) everywhere.


Levente

On Sat, 13 Mar 2021, Jaromir Matas wrote:

> Hi Christoph and all,
>
> What is the expected semantics of the Debugger's Abandon? To quit and do
> nothing more to the debugged process or continue unwinding?
>
> I thought the former was right but the Debugger does the latter. Is it
> really necessary to run all unwinds when closing the abandoned Debugger
> window?
>
> Because that is the cause of the Unwind errors... Normally, when hitting
> Abandon, the debugged process terminates and the current UI continues.
>
> But if there are unfinished unwinds in the debugged process and we hit
> Abandon, the control jumps from the UI process to the debugged one (as
> Christoph described) and stays there as an active UI process which creates
> an impossible situation where we have:
>
> - an active process being the former debugged one which is not the real
> active one but only referred to as an effective process
> - the real active process (genuine process) is the UI process previously
> controlling the debugging with the effective process variable set
> - this real active process is even erroneously answering true to
> #isTerminated (because it's not 'active' and its suspendedContext is nil)
> - and this real active process will eventually become the active UI again...
> - and all this further leads to two simultaneous UI processes alternating
> control... and more... a nightmare ;)
>
> Check the enclosed printscreen...
> <http://forum.world.st/file/t372955/After_NLR_mixup_-_genuine2.png>
>
> Long story short if I'm not wrong and the Debugger shouldn't do all the
> unwinds when Abandoned, then the fix is a oneliner:
> (Instead of termintating the debugged process via #terminate which runs the
> unwinds, let's just 'make it' terminated)
>
> Debugger >> windowIsClosing
> "My window is being closed; clean up. Restart the low space watcher."
>
> interruptedProcess == nil ifTrue: [^ self].
>>>> "Now terminate the interruptedProcess but without unwinds!"
>>>> interruptedProcess suspendedContext terminate.
> interruptedProcess := nil.
>
> contextStack := nil.
> receiverInspector := nil.
> contextVariablesInspector := nil.
> Smalltalk installLowSpaceWatcher.  "restart low space handler"
>
>
> Thanks,
>
>
>
> -----
> ^[^ Jaromir
> --
> Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Levente,
Thanks a lot for clarifying my confusion, my bad ;)



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
In reply to this post by Christoph Thiede

> Hi Jaromir,


> thanks a lot for your answer! Great to hear that you have found some other
> examples that would benefit > from my patch as well. :-)

Hi Christoph,

I've realized there's one disadvantage to your patch. It quite successfully
masks the real issue but every occurrence/application of the patch leaves
one suspended process behind - the one that was left prematurely through the
non-local return - see the enclosed printscreen after two iterations of your
example:

<http://forum.world.st/file/t372955/Christoph_fix_disadvantage.png>

Purple highligted processes are being left behind every occurrence of the
patch application.

The other problem I raised in
http://forum.world.st/Please-try-out-Fixes-for-debugger-invocation-during-code-simulation-tp5127684p5127748.html
is that the really executing process (which really is the genuine process)
is invisible to the user because it responds true to isTerminated! In order
to make this process visible I tweaked a bit the isTerminated as you can see
at the top of the printscreen.

The scary part is after one occurrence of the issue the image may happily
run with the invisible UI forever - or to be precise until another error
occurs and the invisible becomes visible :)

Best,
 




-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

marcel.taeumel
Hi Jaromir,

> The other problem I raised in [...]
> is that the really executing process (which really is the genuine process)
> is invisible to the user because it responds true to isTerminated! In order
> to make this process visible I tweaked a bit the isTerminated as you can see
> at the top of the printscreen.

In this discussion, you should always specify whether "responds to" involves a sender within a simulated or genuine "context". ;-) I am still not convinced that this is an actual problem.

Best,
Marcel

Am 15.03.2021 09:35:15 schrieb Jaromir Matas <[hidden email]>:


> Hi Jaromir,


> thanks a lot for your answer! Great to hear that you have found some other
> examples that would benefit > from my patch as well. :-)

Hi Christoph,

I've realized there's one disadvantage to your patch. It quite successfully
masks the real issue but every occurrence/application of the patch leaves
one suspended process behind - the one that was left prematurely through the
non-local return - see the enclosed printscreen after two iterations of your
example:



Purple highligted processes are being left behind every occurrence of the
patch application.

The other problem I raised in
http://forum.world.st/Please-try-out-Fixes-for-debugger-invocation-during-code-simulation-tp5127684p5127748.html
is that the really executing process (which really is the genuine process)
is invisible to the user because it responds true to isTerminated! In order
to make this process visible I tweaked a bit the isTerminated as you can see
at the top of the printscreen.

The scary part is after one occurrence of the issue the image may happily
run with the invisible UI forever - or to be precise until another error
occurs and the invisible becomes visible :)

Best,





-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Marcel,

>> In this discussion, you should always specify whether "responds to"
>> involves a sender within a simulated or genuine "context". ;-) I am still
>> not convinced that this is an actual problem.

Sorry for confusing you, this all new to me :)

How can I convince you? I debug quite naively, no fancy stuff. I'll try
again to show the exact steps replicating the issue using Christoph's
example:

0. downloaded the latest trunk image
Squeak6.0alpha-20304-64bit-202003021730-Windows with Kernel-mt.1381
1. open Process Browser with auto-update on
2. open an explorer window on the current UI process to watch it
3. do-it `[self error: 'e1'] ensure: [^2]` in a Workspace

Now three things happened, as expected (screenshot 1):
a) an Error debugger window popped up
a) the UI process has become suspended
b) a new UI has been created

4. open an explorer window on the new UI process to watch it
5. Abandon the Error window

And now a couple of highly surprising things happened (screenshot 2):
d) Error window disapeared - that's still expected but
e) the new UI process (88230) disappeared from the Process Browser - NOT
expected
    AND became the genuineProcess - expected
    AND it responds true to isTerminated sent to self in its Explorer window
- NOT expected
f) the old UI process (48259) became the current UI process - NOT expected
     AND became the effectiveProcess - NOT expected
     AND responds true to isActiveProcess sent to self in its Explorer
window - confusing

At this point however the genuine process is indeed not terminated but
executing. It is not shown on the Process Browser however because it filters
out terminated processes. You can verify this by changing first line in
isTerminated, for a moment, to:

    (self isActiveProcess or: [self effectiveProcess == Processor
activeProcess]) ifTrue: [^ false].

And now everything runs as usual as if nothing happened except the really
executing process is hidden from your view and the formerly debugged process
acts as the active process but is not really executing (see screenshot 3
with the invisible process 88230 shown).

6. now I reverted isTerminated back to original and did Christoph's do-it
`self error`

And the rest went as Christoph described; I'd just like to point out the
hidden process 88230 now reappears as the current UI process and correctly
responds true to isActiveProcess e.g. in the Explorer window. In case you
hit Proceed on the Unwind error window the two UI processes above will start
alternating because they are now both runnable in the priority 40 queue.

I find this course of events very unexpected; trying to follow using the
standard tools is very confusing - namely the disappearing genuine process.
I know it's due to the bug Christoph discovered and described but the tools
should rather reveal, not hide information vital for understanding what's
happening :)

I apologize for such a long description and sincerely hope I haven't made
any error here to waste your time.

Thanks,

Screenshot1.png <http://forum.world.st/file/t372955/Screenshot1.png>  
Screenshot2.png <http://forum.world.st/file/t372955/Screenshot2.png>  
Screenshot3.png <http://forum.world.st/file/t372955/Screenshot3.png>  



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

marcel.taeumel
Hi Jaromir.

How can I convince you?

Well, I understand the issue that you described. However, I think it is very bad style to return from within an ensure-block. That's why I would like to set the urgency for this to "rather low". :-)

The idiom for using #ensure: in the return-expression is this:

^ aBlock ensure: ensureBlock
^ [ self doTrickyStuff ] ensure: [ self cleanUp ]

That is, you should only use *inner* returns to get out early. And then, the ensure will clean up after you.

If you really would want to "ensure" a certain return value, you could do it like this:

| result |
[ self doTrickyStuff ] ensure: [ result := 42 ].
^ result

But why using #ensure: at all for such a case? If you think that exceptions might happen, you would have to wrap it this way:

| result |
result := 42.
[ self doTrickyStuff ] on: KnownError do: [ :ex | "Ignore." ].
^ result

Well, doesn't make much sense to me. :-) You clean up resources (i.e. state) via #ensure: when facing exceptions. You do not manage control flow through it. The exception handlers #on:do: are for that.

Do not put a return "^" into an ensure block.

Best,
Marcel

Am 15.03.2021 21:54:47 schrieb Jaromir Matas <[hidden email]>:

Hi Marcel,

>> In this discussion, you should always specify whether "responds to"
>> involves a sender within a simulated or genuine "context". ;-) I am still
>> not convinced that this is an actual problem.

Sorry for confusing you, this all new to me :)

How can I convince you? I debug quite naively, no fancy stuff. I'll try
again to show the exact steps replicating the issue using Christoph's
example:

0. downloaded the latest trunk image
Squeak6.0alpha-20304-64bit-202003021730-Windows with Kernel-mt.1381
1. open Process Browser with auto-update on
2. open an explorer window on the current UI process to watch it
3. do-it `[self error: 'e1'] ensure: [^2]` in a Workspace

Now three things happened, as expected (screenshot 1):
a) an Error debugger window popped up
a) the UI process has become suspended
b) a new UI has been created

4. open an explorer window on the new UI process to watch it
5. Abandon the Error window

And now a couple of highly surprising things happened (screenshot 2):
d) Error window disapeared - that's still expected but
e) the new UI process (88230) disappeared from the Process Browser - NOT
expected
AND became the genuineProcess - expected
AND it responds true to isTerminated sent to self in its Explorer window
- NOT expected
f) the old UI process (48259) became the current UI process - NOT expected
AND became the effectiveProcess - NOT expected
AND responds true to isActiveProcess sent to self in its Explorer
window - confusing

At this point however the genuine process is indeed not terminated but
executing. It is not shown on the Process Browser however because it filters
out terminated processes. You can verify this by changing first line in
isTerminated, for a moment, to:

(self isActiveProcess or: [self effectiveProcess == Processor
activeProcess]) ifTrue: [^ false].

And now everything runs as usual as if nothing happened except the really
executing process is hidden from your view and the formerly debugged process
acts as the active process but is not really executing (see screenshot 3
with the invisible process 88230 shown).

6. now I reverted isTerminated back to original and did Christoph's do-it
`self error`

And the rest went as Christoph described; I'd just like to point out the
hidden process 88230 now reappears as the current UI process and correctly
responds true to isActiveProcess e.g. in the Explorer window. In case you
hit Proceed on the Unwind error window the two UI processes above will start
alternating because they are now both runnable in the priority 40 queue.

I find this course of events very unexpected; trying to follow using the
standard tools is very confusing - namely the disappearing genuine process.
I know it's due to the bug Christoph discovered and described but the tools
should rather reveal, not hide information vital for understanding what's
happening :)

I apologize for such a long description and sincerely hope I haven't made
any error here to waste your time.

Thanks,

Screenshot1.png
Screenshot2.png
Screenshot3.png



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Christoph Thiede
Hi Jaromir,

again, very sorry for keeping you waiting for so long time.

I'll try to go through your questions step by step, hoping everything is
still relevant: :-)

> It seems to me after `ctxt restart` the thread never returns back so the
> assignment is moot?

Fair question. After evaluating "[self error] ensure: [^1]", abandoning the
debugger, and raising another error, the relevant #terminate context is
indeed dead, so somehow the "ctxt restart" appears to have returned indeed
... But I don't know why. :)

(By the way, in case you did not yet find this out by yourself, you can find
the activated #terminate context by exploring the second-bottommost context
(#ensure:) of the UI process and navigating up in the sender stack of its
aBlock argument's outerContext ...

<http://forum.world.st/file/t372205/inspect_%23terminate.png> )

> At any rate it seems to me the problem is not related to non-local returns
> exclusively. Other scenario I have now:
> x:=nil. [self error: 'x1'] ensure: [[self error: 'x2'] ensure: [[self
> error: 'x3'] ensure: [x:=3]. x:=2]. x:=1].
> x

Oh no. Important findings, though! This enlarges the issue we are discussing
here. Returning from unwind contexts *could* be considered as something
stupid and illegal as I wrote above, but raising errors from them is
something much more likely. Even non-Smalltalkish languages such as Python
support this and receiving unwind errors in these situations has always
appeared as a sad limitation to me.

Hm ... in this example, my changeset raises a #cannotReturn: after
abandoning all the debuggers. :/

((Side note: Maybe we should tackle ExceptionTests>>testHandlerFromAction
these days, too. However, not sure whether they are later to each other,
thus the double brackets :)))

> If you Abandon the FIRST error debugger - do you expect to continue to the
> second debugger, abandon it and continue to the third debugger or rather
> quit right after the FIRST debugger Abandon? If the former then what would
> the difference between Proceed and Abandon be?

In my mental model, these debuggers form a (virtual) stack. Maybe thinks
would get clearer if we would only delete the debugger window after the
termination had been completed. I tell the first debugger to abandon -
during the abandoning, another error occurs, and a second debugger informs
me about this incident - I tell it to abandon, too - so what exactly did I
abandon now? The unwind handling (i.e. a single unwind block) or rather the
unwinding activity itself (which would be equivalent to continue the regular
execution)? Let's assume the first one for now ... And finally, the third
debugger appears because a second unwind block raised an error - I tell it
to continue, meaning that the unwind handling should be resumed right from
the place where the error has occurred, so in the end, the termination
should be continued. Phew! What do you think about this example? ;-)

> What is the expected semantics of the Debugger's Abandon? To quit and do
> nothing more to the debugged process or continue unwinding?

I have to agree with Levente here that unwinding should not be skipped here
definitively. Abandoning is used every often when an error occurs and it's a
fundamental meaning of #ensure: contexts to be executed even if the regular
execution is not being continued, giving their senders the chance to clean
up things.

And regarding your proposal of enforcing termination:

> >>> [interruptedProcess isTerminated] whileFalse: [interruptedProcess
terminate].

I'm not sure whether this would be a good idea, though ... The debugger is
only a graphical tool that makes process control more easily. IMHO Debugger
>> #abandon should have exactly the same effect as Process >> #terminate.

So the next question would be whether we would want such an enforce loop
directly in Processor >> #terminate. I don't know.

tl;dr: Unwind errors are a problem, but my proposed changeset is not really
suitable to fix all of them. Should we probably just stick with the
UnhandledError handling from within Process >> #terminate? This only seems
to be necessary for calls from the Debugger (see Debugger >> #stepOver).
And with regard to the returns from unwind contexts, here is another vague
idea: Could it help us in any way to raise a second exception from
#aboutToReturn:through: to keep senders such as Process >> #terminate
informed? Or would this just be unnecessary? We already appear to have used
ExceptionAboutToReturn in the past. Can anyone remember its story? Maybe
nested unwinding even has worked better in past Squeak versions before it
has been removed?
Hm ...
I have the feeling that finding a better solution for this problem will keep
occupying us for some more time. Maybe you have some other great ideas! :-)

Best,
Christoph



-----
Carpe Squeak!
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Christoph Thiede
Hi Marcel,

> Do not put a return "^" into an ensure block.

Thanks for the statement. In this case, I have two follow-up questions for
you:

1. So do you think #valueUninterruptably should be deprecated and
disrecommended entirely?

(Also note that in this method comment, Anthony Hannan already proposes a
terminate message "to get the entire process unwound" ... This might be
something similar to the new exception I considered in my previous post.)

2. Note that Jaromir has also found examples for unwind errors when
abandoning an error that has been raised from an unwind context. While they
also include control flow management, I think we cannot avoid such
situations in general.

Best,
Christoph



-----
Carpe Squeak!
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
In reply to this post by Christoph Thiede
Hi Christoph, Marcel,

Thanks for your feedback!

I'm responding jointly also to your Debugger post... again, apologies for
mixing this up with your genuine process work :)

> The only exception from this rule appears to occur when the Processor >>
> #evaluate:onBehalfOf: logic does not get unwound correctly so the
> effectiveProcess is not reset, which is a consequence of the bug I
> described
> [here].

Exactly, and that's precisely why I thought it was serious because the bug
made discovering itself so much harder :) This is when one really needs to
be aware of the genuine process and what it is doing.

> [...] after terminating it, the debugged process keeps answering false
> when sent
> #isTerminated. I think this is also related to the thread you started in
> [3]

Yes, isTerminated (and isSuspended) is fixed by Kernel-jar.1381.mcz in [4].
I'd like to consider it a 'baby step' towards more refined versions in [2]
or [3] you mentioned.

>> What is the expected semantics of the Debugger's Abandon? To quit and do
>> nothing more to the debugged process or continue unwinding?

> I have to agree with Levente here that unwinding should not be skipped
> here
> definitively.

Oh, absolutely! I managed to throw the baby out with the bathwater
spectacularly :)

> IMHO Debugger >> #abandon should have exactly the same effect as Process
> >> #terminate.

Yes, that's my feeling as well now (although I'm not in a position to give
opinions; I have to go back and study the exceptions again more thoroughly).

The examples I gave (without the non-local return) are not as nasty as your
non-local return example:

The main difference between your non-local example:

[self error: 'e1'] ensure: [^2]

and my one without the non-local return:

[self error: 'e1'] ensure: [self error: 'e2']

is that in your case the ^2 interrupts the orderly unwinding/termination and
the UI process continues as if still in simulation (because of the missed
effective process reset) - until another exception happens *somewhen* in the
future - while in my examples the next exception resetting the
effectiveProcess happens right away in the ensure block, presenting "only"
the Unwind error on the outside.

And on top of that, in your case the image is in such a bad state that
running Process or Semaphore tests even crash the image!

> I wonder whether we can and whether we should find a way to heal
> processes again after a missing reset [of the effectiveProcess].

Very interesting question indeed!

To summarize: the non-local return in the ensure block causes a major
disruption to the flow of control. Indeed the easiest and probably
reasonable solution is to somehow prohibit non-local returns in the ensure
blocks. This scenario, however uncovers a weakness in the #isTerminated,
reporting the currently executing process with effectiveProcess assigned, as
terminated and as a result preventing Process Browser to reveal such a
pathological situation (which is precisely when you need to see it).

Best regards,


PS: For debugging this issue I tweaked my #isTerminated with the first line
conditon reading:

    (self isActiveProcess or: [self effectiveProcess == Processor
activeProcess]) ifTrue: [^ false].

instead of just

    self isActiveProcess ifTrue: [^ false].

to make the genuine process always visible...

[1]
http://forum.world.st/Please-try-out-Fixes-for-debugger-invocation-during-code-simulation-tp5127684p5127807.html
[2] http://forum.world.st/The-Inbox-Kernel-jar-1377-mcz-tp5127438.html
[3]
http://forum.world.st/Refactoring-terminate-to-get-rid-of-cannot-return-errors-etc-td5127732.html
[4] http://forum.world.st/The-Inbox-Kernel-jar-1381-mcz-tp5127665.html




-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Christoph Thiede
Hi Jaromir,

> The main difference between your non-local example:
>
> [self error: 'e1'] ensure: [^2]
>
> and my one without the non-local return:
>
> [self error: 'e1'] ensure: [self error: 'e2']
>
> is that in your case the ^2 interrupts the orderly unwinding/termination
> and the UI process continues as if still in simulation (because of the
> missed effective process reset) - until another exception happens
> *somewhen* in the future - while in my examples the next exception
> resetting the effectiveProcess happens right away in the ensure block,
> presenting "only" the Unwind error on the outside.

Very precisely. For this reason, I suggest leaving aside my non-local return
example for the moment and focusing on your much simpler case instead. :-)

I have been sleeping over these problems a few times ... It seems strange to
me that the termination logic needs to involve a send to
#runUntilErrorOrReturnFrom: at all (also via #popTo:). This is usually meant
to be used by the debugger only (that's also why it has this special return
value format). In the current Trunk, the behavior of
#runUntilErrorOrReturnFrom: to catch UnhandledErrors is not even used at all
in #terminate (though I added rudimentary support to it in my previous
changeset). I have the feeling that we should not catch UnhandledErrors in
this place at all since there is not any present debugger that would process
these errors. Ideally, I would like to just hand over control to the process
to be terminated and have it unwind the entire stack down to the bottom
context.

Again from a rather conceptual point of view: How useful is it to terminate
a process "remotely", i.e. simulating it from the outside? Intuitively, I
would rather have said that terminating a process is a request to the
process to "please stop everything you are just doing and only unwind your
current operations". Why can't we just activate the relevant return context
on the process (see #popTo:) and then resume it again? Then everything
exceptional that might happen during the unwinding would behave exactly like
if the process weren't being unwound right now. The difference, though,
would be that the #terminate would return before the process would be
unwound, and we would need to rethink the priority of the process to be
terminated. For example, we could give it the priority of the active process
+ 1 or, in the worst case, use busy waiting in #terminate until the process
is terminated. Probably these considerations have been the reason to
implement #terminate in the current way. But conceptionally, wouldn't such
an approach be closer to the conceptual meaning of terminating a process?

Just thinking aloud ... I'm looking forward to your opinion! :-)

> This scenario, however uncovers a weakness in the #isTerminated, reporting
> the currently executing process with effectiveProcess assigned, as
> terminated and as a result preventing Process Browser to reveal such a
> pathological situation (which is precisely when you need to see it).

I don't think so ... The violated invariant in this situation is that the
effectiveProcess variable has not been reset. In general, Process >>
#isTerminated should indeed consider the effectiveProcess. Debugging
"Processor activeProcess isTerminated" should always work exactly like
executing it outside the debugger. :-)

Best,
Christoph



-----
Carpe Squeak!
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
> Again from a rather conceptual point of view: How useful is it to terminate
> a process "remotely", i.e. simulating it from the outside?
>
> Why can't we just activate the relevant return context
> on the process (see #popTo:) and then resume it again?
>
> But conceptionally, wouldn't such
> an approach be closer to the conceptual meaning of terminating a process?
>
> Just thinking aloud ... I'm looking forward to your opinion! :-)


Hi Christoph,

thanks! yes, I've been thinking along the same lines :)

I think #runUntilErrorOrReturnFrom is not the cause of the problem though;
#popTo is! It uses #evaluate:onBehalf:, the indirect way to unwind the
terminated process. I still don't understand why we should want to unwind
indirectly instead of just making the suspended process active and have it
unwound itself... I created a new top context for it effectively sending it
back to #terminate as an active process with the #activePriority, and
replaced #popTo (and the remainder of #terminate) with:

                suspendedContext := Context
                        sender: suspendedContext
                        receiver: self
                        method: (Process>>#terminate)
                        arguments: {}.
                self priority: Processor activePriority.
                self resume.
                Processor yield

See the whole #terminate method further below.

All Process tests are green. It seems to resolve the 'easy' issues.

Your non-local example seems to work as well with one caveat: It leaves the
image with two UI processes, both in the prio 40 queue, alternating control.

A question is: what is the semantics of the non-local return in

        [self error] ensure: [^2]
or
        [self error] valueUninterruptably

And I think it is exactly what it does now - it escapes the termination! I
think it's called "cheating death"... A powerful and meaningful feature ;)
#valueUninterruptably says: don't let *anything* interrupt me, not even a
debugger Abandon.

More precisesly: `self error` causes an error window is opened and a new UI
process is spawned. Abandon then causes the new UI process sends #terminate
to the previous UI process running the do-it. #terminate then makes the old
UI execute #terminate on itself but during the unwind the non-local return
(^2) causes both the unwind loop and #terminate are jumped over back to the
original do-it and the old UI continues as if nothing happened :) The new
UI, in the meantime, has finished closing the Error (debugger) window and
continues it's everyday UI cycles. So we end up with two normal healthy UIs.

Now my question is: how to remove the extra UI?

One idea I haven't explored yet is why run a do-it withing the UI and not as
a separate process? In this scenario the non-local return would continue or
return from the do-it instead of returning inside the UI itself causing it
survive the expected termination.

But that's a big change. Would there be anything simpler?

As for #runUntilErrorOrReturnFrom: I haven't studied it thorougly yet but my
understanding is it's supposed to deal with unwind blocks caught in the
middle of their unwinding - e.g. when abandoning the debugger (but I'd like
to see some real examples first and I have to think about your remarks).


As for #terminate: I had to do one modification the original unwind code:
according to the N.B. note we do not set `complete` variable to true - but I
can't see why: was a necessity for something or just saving a line? (I hope
the latter but can get around the former too)

>>> Why??? ["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt:
>>> 2) to true."
>>> ctxt tempAt: 2 put: true.        "added, otherwise Abandon doesn't
>>> work"

Without it the debugger Abandon in this example

        [self error: 'e1'] ensure: [self error: 'e2']

wouldn't work: without the `complete` mark the Abandon (#windowIsClosing)
sends #terminate, the process runs unwind, opens a debugger, Abandon sends
#terminate etc.


Process >> terminate
        "Stop the process that the receiver represents forever.
         Unwind to execute pending ensure:/ifCurtailed: blocks before terminating.
         If the process is in the middle of a critical: critical section, release
it properly."

        | ctxt unwindBlock oldList |
        self isActiveProcess ifTrue:
                [ctxt := thisContext.
                 [ctxt := ctxt findNextUnwindContextUpTo: nil.
                  ctxt ~~ nil] whileTrue:
                        [(ctxt tempAt: 2) ifNil:
>>> Why??? ["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt:
>>> 2) to true."
>>> ctxt tempAt: 2 put: true.        "this is added otherwise Abandon
>>> doesn't work"
                                 unwindBlock := ctxt tempAt: 1.
                                 thisContext terminateTo: ctxt.
                                 unwindBlock value]].
                thisContext terminateTo: nil.
                self suspend.
                "If the process is resumed this will provoke a cannotReturn: error.
                 Would self debug: thisContext title: 'Resuming a terminated process' be
better?"
                ^self].

        "Always suspend the process first so it doesn't accidentally get woken up.
         N.B. If oldList is a LinkedList then the process is runnable. If it is a
Semaphore/Mutex et al
         then the process is blocked, and if it is nil then the process is already
suspended."
        oldList := self suspend.
        suspendedContext ifNotNil:
                ["Release any method marked with the <criticalSection> pragma.
                  The argument is whether the process is runnable."
                 self releaseCriticalSection: (oldList isNil or: [oldList class ==
LinkedList]).

                "If terminating a process halfways through an unwind, try to complete that
unwind block first."
                (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
                        [:outer|
                         (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt:
1)]) ifNotNil:
                                [:inner| "This is an unwind block currently under evaluation"
                                 suspendedContext runUntilErrorOrReturnFrom: inner]].

>>> suspendedContext := Context      "now top context"
                        sender: suspendedContext
                        receiver: self
                        method: (Process>>#terminate)
                        arguments: {}.
                self priority: Processor activePriority.
                self resume.
>>> Processor yield          "important when using activePriority"
        ]



>
> > This scenario, however uncovers a weakness in the #isTerminated,
> reporting
> > the currently executing process with effectiveProcess assigned, as
> > terminated and as a result preventing Process Browser to reveal such a
> > pathological situation (which is precisely when you need to see it).
>
> I don't think so ... The violated invariant in this situation is that the
> effectiveProcess variable has not been reset. In general, Process >>
> #isTerminated should indeed consider the effectiveProcess. Debugging
> "Processor activeProcess isTerminated" should always work exactly like
> executing it outside the debugger. :-)
>

Yes, 100% agreed, but that's not the point I'm making :) I'm saying such a
situation (where the effective process has not been reset) is rare but
dangerous and we need to be aware of it rather sooner than later - i.e. for
instance we need to see in the Process Browser there's one more process left
- but now it's hiding as invisible because #isTerminated reports it as
terminated.

To demonstrate how dangerous the scenario is, do-it:

        [self error] ensure: [^2]

and Abandon the ensuing Error window. Now the image becomes 'corrupted' or
'unstable' in the sense that the following do-it will hang the image:

        Semaphore new waitTimeoutMSecs: 50

If you try Test Runner the following Process and Semaphore tests fail and
may even crash the image:

        testWaitTimeoutMSecs
        testProcessFaithfulSimulation
        testProcessFaithfulRunning
        testEvaluateOnBehalfOf

The reason is after abandoning the error the following expressions are no
longer true and some tests get confused:

        Processor activeProcess = Project current uiProcess  "-> FALSE"
        Processor activeProcess = (Processor instVarNamed: 'genuineProcess')  "->
FALSE"

We can observe the anomaly indirectly by testing these expressions but what
I'm suggesting is the Process Browser should be the place to reveal such a
situation.

To summarize: I admit such situations are rare so it's not a pressing issue.


I look forward to your opinion about the #terminate experiment and the twin
UI processes :)






-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
Hi Christoph, All,

I'm enclosing a changeset that could solve the original issue: unwind errors
and image instability after Abandoning the debugger in Christoph's example:

   [self error] ensure: [^2]

The unwind errors also appears in common situations like:

   [self error] ensure: [self error]

The root cause of the problem is that termination of a suspended process is
performed indirectly via #evaluate:onBehalf: in #popTo (as if in
simulation). Christoph described the problem above and elsewhere.

The solution may be to replace the #popTo: mechanism by having the suspended
process terminate itself as an active process by placing a new context atop
its stack to invoke #terminate, and resume the suspended process with the
active process' priority.

This, however leaves us with another problem: when a non-local return is
executed in an unwind block during termination the process may jump over
it's termination back to the original do-it code and continue executing -
which is in my opinion precisely the semantics of the non-local return in
this case (see #valueUninterruptably). But the problem is the original UI
process that started the do-it continues and the new UI process that opened
a debugger continues as well, alternating control.

To avoid this twin UI situation we have to make sure the original UI is
terminated after finishing execution of the do-it code that escaped its
termination via a non-local return. To do this we may insert a context
invoking #teminate under the DoIt context in the stack when we hit Abandon
and the debugger tries to terminate the debugged do-it code. We cannot
modify the stack sooner because we may hit Proceed and let the process
continue (which would close the new debugger process and leave the original
process as the sole UI process).

The change involves two methods only: Debugger>>windowIsClosing a
Process>>terminate. Please review the enclosed changeset with my comments;
I'd appreciate any feedback.
Thanks!

NonLocalReturnTermination.cs
<http://forum.world.st/file/t372955/NonLocalReturnTermination.cs>  









-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Bug in Process>>#terminate | Returning from unwind contexts

Jaromir Matas
This issue has been dealt with (fixed) and discussed here:
http://forum.world.st/Solving-multiple-termination-bugs-summary-amp-proposal-td5128285.html



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir