Solving multiple termination bugs - summary & proposal

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi All,

here's a summary of bugs in the current termination procedure:

1. #isTerminated and #isSuspended fail to report correctly - in [1]

2. active process termination bug causing an image crash - in [2]

3. nested unwind bug: skipping some ensure blocks - in [3]

4. likely a bug: a failure to complete nested unwind blocks halfway thru
execution - also in [3]

5. Christoph's discovery: a failure to correctly close the debugger in case
of error or non-local return in unwind - in [4]

6. inconsistent semantics of unwinding protected blocks during active vs.
suspended process termination
(not reported yet - different sets of unwind blocks are executed depending
on whether a process terminates itself or whether some other process
initiates its termination; the proposed solution unifies active and
suspended process termination and is consistent with Visual Works semantics
(and goes beyond))

[1]
http://forum.world.st/The-Inbox-Kernel-jar-1376-mcz-td5127335.html#a5127336
[2]
http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
[3]
http://forum.world.st/Another-bug-in-Process-terminate-in-unwinding-contexts-tp5128171p5128178.html
[4]
http://forum.world.st/Bug-in-Process-gt-gt-terminate-Returning-from-unwind-contexts-td5127570.html#a5128095


I'd like to propose a solution - a rewrite of #terminate - addressing all of
the above issues. Main points of the solution are:

(i) why differentiate active and suspended process termination? I propose to
unify the two: to suspend the active process and terminate it as a suspended
process. As a result the semantics/outcome of the unwind algorithm will be
the same in both cases. Fixes 2 and 6.

(ii) the above leads to a susbstantial simplification of #isTerminated - a
process is terminated when at the last instruction of its bottom context.
Fixes 1.

(iii) unwind algorithm for incomplete unwind blocks is extended to execute
the outer-most instead of the inner-most incomplete unwind context. Fixes 3
and 4.

(iv) indirect termination via #popTo is replaced with a analogue of
#unwindTo: using Eliot's ingenious #runUntilErrorOrReturnFrom: the same way
as in (iii). I'd very much appreciate if someone commented this part because
I haven't found any clue why the indirect termination has implemented using
#popTo and not #runUntilErrorOrReturnFrom. Fixes 5.

I enclose a set of examples used to compare the current Squeak semantics
with the proposed one (and with VW) that can be used to build test cases.

Here's a commented code. I'll be happy to provide detailed step-by-step
guidance if you find this whole idea interesting and worth implementing. I'm
convinced at least parts of the proposal should be integrated as simple
fixes of the current bugs. Thank you for your patience if you're still
reading :) Any feedback extremely welcome. A changeset is enclosed for your
convenience here:  Fix_terminate_v2.cs
<http://forum.world.st/file/t372955/Fix_terminate_v2.cs>  

Process >> terminate
        "Stop the process that the receiver represents so that it answers true to
#isTerminated.
         Unwind to execute pending and unfinished ensure:/ifCurtailed: blocks
before terminating.
         If the process is in the middle of a critical: critical section, release
it properly."

        | ctxt unwindBlock oldList outerMost |
        self isActiveProcess ifTrue: [
                "If terminating the active process, suspend it first and terminate it as a
suspended process.
                Nested #terminate messages could derail the termination so let's enclose
it in #ensure."
                [[] ensure: [self terminate]] fork.
                ^self suspend].

        "Always suspend the process first so it doesn't accidentally get woken up.
         N.B. If oldList is a LinkedList then the process is runnable. If it is a
Semaphore/Mutex et al
         then the process is blocked, and if it is nil then the process is already
suspended."
        oldList := self suspend.
        suspendedContext ifNotNil:
                ["Release any method marked with the <criticalSection> pragma.
                  The argument is whether the process is runnable."
                 self releaseCriticalSection: (oldList isNil or: [oldList class ==
LinkedList]).

                "If terminating a process halfways through an unwind, try to complete that
unwind block first;
                if there are multiple such nested unwind blocks, try to complete the
outer-most one; the inner
                blocks will be completed in the process."
                ctxt := suspendedContext.
                [(ctxt := ctxt findNextUnwindContextUpTo: nil) isNil] whileFalse:
                        "Contexts under evaluation have already set their complete (tempAt: 2) to
true."
                        [(ctxt tempAt:2) ifNotNil: [outerMost := ctxt]].
                outerMost ifNotNil: [
                        "This is the outer-most unwind context currently under evaluation;
                        let's find an inner context executing outerMost's argument block (tempAt:
1)"
                        (suspendedContext findContextSuchThat: [:ctx |
                                ctx closure == (outerMost tempAt: 1)]) ifNotNil: [:inner |
                                        "Let's finish the unfinished unwind context only (i.e. up to inner) and
return here"
                                        suspendedContext runUntilErrorOrReturnFrom: inner.
                                        "Update the receiver's suspendedContext (the previous step reset its
sender to nil)"
                                        suspendedContext := outerMost]].

                "Now all unwind blocks caught halfway through have been completed;
                let's execute the ones still pending. Note: #findNextUnwindContextUpTo:
starts
                searching from the receiver's sender but the receiver itself may be an
unwind context."
                ctxt := suspendedContext.
                ctxt isUnwindContext ifFalse: [ctxt := ctxt findNextUnwindContextUpTo:
nil].
                [ctxt isNil] whileFalse: [
                        (ctxt tempAt: 2) ifNil: [
                                ctxt tempAt: 2 put: true.
                                unwindBlock := ctxt tempAt: 1.
                                "Create a context for the unwind block and execute it on the unwind
block's stack.
                                Note: using #value instead of #runUntilErrorOrReturnFrom: would lead to
executing
                                the unwind on the wrong stack preventing the correct execution of
non-local returns."
                                suspendedContext := unwindBlock asContextWithSender: ctxt.
                                suspendedContext runUntilErrorOrReturnFrom: suspendedContext].
                        ctxt := ctxt findNextUnwindContextUpTo: nil].

                "Set the context to its endPC and its sender to nil for the benefit of
isTerminated."
                ctxt := suspendedContext.
                ctxt terminateTo: nil.
                ctxt pc: ctxt endPC]


Process >> isTerminated
        "Answer if the receiver is terminated. A process is considered terminated
        if the suspendedContext is the bottomContext and the pc is at the endPC"

        self isActiveProcess ifTrue: [^ false].
        ^suspendedContext isNil or: [
                suspendedContext isBottomContext and: [
                        suspendedContext isDead or: [suspendedContext atEnd]]]


Process >> isSuspended
        "A process is suspended if it has non-nil suspendedContext (e.g. new or
        previously suspended with the suspend primitive) and is not terminated or
        waiting in a scheduler or a semaphore queue (i.e. is not runnable or
blocked)."
       
        ^myList isNil
                and: [suspendedContext notNil
                and: [self isTerminated not]]


Debugger >> windowIsClosing
        "My window is being closed; clean up. Restart the low space watcher.
        If the debugged process is a do-it, it may do a non-local return escaping
        termination so we need to insert a #terminate context under the do-it
        to make sure the debugged UI process will terminate."

        | doItContext terminateContext |
        (interruptedProcess == nil or: [interruptedProcess suspendedContext isNil])
ifTrue: [^ self].

        "find a do-it context; answer nil if it doesn't exist in the sender chain"
        doItContext := interruptedProcess suspendedContext
                findContextSuchThat: [:ctx |
                        ctx methodClass = UndefinedObject and: [
                        ctx selector = #DoIt and: [
                        ctx closure isNil]]].
        doItContext ifNotNil: [ "it exists so let's insert a #terminate context"
                terminateContext := Context
                        sender: doItContext sender
                        receiver: interruptedProcess
                        method: (Process>>#terminate)
                        arguments: {}.
                doItContext privSender: terminateContext ].

        interruptedProcess terminate.
        interruptedProcess := nil.
       
        contextStack := nil.
        receiverInspector := nil.
        contextVariablesInspector := nil.
        Smalltalk installLowSpaceWatcher.  "restart low space handler"




A few examples to illustrate the idea (many more enclosed here
Some_examples_to_examine_terminate_bugs.txt
<http://forum.world.st/file/t372955/Some_examples_to_examine_terminate_bugs.txt>  
):

Ex. 1:

| p |
"Suspend process inside ensure block and make sure x1 x2 and x3 are printed.
Currently only x1 is printed."
p := [
        [
                [ ] ensure: [
                        [ ] ensure: [
                                Processor activeProcess suspend.
                                Transcript show: 'x1'].
                        Transcript show: 'x2']
        ] ensure: [
                Transcript show: 'x3']
] fork.
Processor yield.
p terminate

---> x1   (instead of x1 x2 x3)


Ex. 2:

| p |
Currently only x3 is printed."
p := [
        [
                [ ] ensure: [
                        [ ] ensure: [
                                Processor activeProcess terminate.
                                Transcript show: 'x1'].
        "terminate active process inside ensure block and make sure x1 x2 and x3
are printed.
                Transcript show: 'x2']
        ] ensure: [
                Transcript show: 'x3']
] fork.
Processor yield

---> x3   (instead of x1 x2 x3)


Ex. 3:

"unwind after error inside ensure block and make sure x1 x2 and x3 are
printed.
[
        [ ] ensure: [
                [ ] ensure: [
                        self error.
                        Transcript show: 'x1'].
                Transcript show: 'x2']
] ensure: [
        ^Transcript show: 'x3']


---> x3   (instead of x1 x2 x3)

Note: nested errors during unwind follow #runUntilErrorOrReturnFrom: logic,
i.e. return exception and let user decide...



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi,

The enclosed changeset fixes all bugs/inconsistencies concerning process
termination listed below. It's a simplified version of the previous
proposal. Feedback very welcome indeed!

The idea is to simply extend the existing mechanism for completing
halfway-through unwind blocks and let it deal with *all* unwind blocks.

best,

Fix_terminate_v3.cs <http://forum.world.st/file/t372955/Fix_terminate_v3.cs>  


Jaromir Matas wrote

> Hi All,
>
> here's a summary of bugs in the current termination procedure:
>
> 1. #isTerminated and #isSuspended fail to report correctly - in [1]
>
> 2. active process termination bug causing an image crash - in [2]
>
> 3. nested unwind bug: skipping some ensure blocks - in [3]
>
> 4. likely a bug: a failure to complete nested unwind blocks halfway thru
> execution - also in [3]
>
> 5. Christoph's discovery: a failure to correctly close the debugger in
> case
> of error or non-local return in unwind - in [4]
>
> 6. inconsistent semantics of unwinding protected blocks during active vs.
> suspended process termination





-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Nicolas Cellier
Hi Jaromir,
many thanks for those efforts and for perseverating.
It must be a bit frustrating to not receive any answer to this good
work and deep investment, but there is a good reason for that: the
subject is tough, and not everyone is able to give you a quick answer
without reinvesting an equivalent slot of time as you personally did.

This change-set looks good to my eyes, the level of comment is good
too and gives me some additional confidence.
I'm probably not at your level of expertise yet, but I vote for
integrating this in trunk.
In the meantime, I'll leave with it in my image(s) for a few days.
You can as well submit an inbox contribution as you did previously,
this is the preferred way, unless the changes span many packages.

cheers

Nicolas

Le sam. 10 avr. 2021 à 10:58, Jaromir Matas <[hidden email]> a écrit :

>
> Hi,
>
> The enclosed changeset fixes all bugs/inconsistencies concerning process
> termination listed below. It's a simplified version of the previous
> proposal. Feedback very welcome indeed!
>
> The idea is to simply extend the existing mechanism for completing
> halfway-through unwind blocks and let it deal with *all* unwind blocks.
>
> best,
>
> Fix_terminate_v3.cs <http://forum.world.st/file/t372955/Fix_terminate_v3.cs>
>
>
> Jaromir Matas wrote
> > Hi All,
> >
> > here's a summary of bugs in the current termination procedure:
> >
> > 1. #isTerminated and #isSuspended fail to report correctly - in [1]
> >
> > 2. active process termination bug causing an image crash - in [2]
> >
> > 3. nested unwind bug: skipping some ensure blocks - in [3]
> >
> > 4. likely a bug: a failure to complete nested unwind blocks halfway thru
> > execution - also in [3]
> >
> > 5. Christoph's discovery: a failure to correctly close the debugger in
> > case
> > of error or non-local return in unwind - in [4]
> >
> > 6. inconsistent semantics of unwinding protected blocks during active vs.
> > suspended process termination
>
>
>
>
>
> -----
> ^[^ Jaromir
> --
> Sent from: http://forum.world.st/Squeak-Dev-f45488.html
>

Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Nicolas Cellier
I'll live with it, xcuz the bad anglishe.

Le sam. 10 avr. 2021 à 23:21, Nicolas Cellier
<[hidden email]> a écrit :

>
> Hi Jaromir,
> many thanks for those efforts and for perseverating.
> It must be a bit frustrating to not receive any answer to this good
> work and deep investment, but there is a good reason for that: the
> subject is tough, and not everyone is able to give you a quick answer
> without reinvesting an equivalent slot of time as you personally did.
>
> This change-set looks good to my eyes, the level of comment is good
> too and gives me some additional confidence.
> I'm probably not at your level of expertise yet, but I vote for
> integrating this in trunk.
> In the meantime, I'll leave with it in my image(s) for a few days.
> You can as well submit an inbox contribution as you did previously,
> this is the preferred way, unless the changes span many packages.
>
> cheers
>
> Nicolas
>
> Le sam. 10 avr. 2021 à 10:58, Jaromir Matas <[hidden email]> a écrit :
> >
> > Hi,
> >
> > The enclosed changeset fixes all bugs/inconsistencies concerning process
> > termination listed below. It's a simplified version of the previous
> > proposal. Feedback very welcome indeed!
> >
> > The idea is to simply extend the existing mechanism for completing
> > halfway-through unwind blocks and let it deal with *all* unwind blocks.
> >
> > best,
> >
> > Fix_terminate_v3.cs <http://forum.world.st/file/t372955/Fix_terminate_v3.cs>
> >
> >
> > Jaromir Matas wrote
> > > Hi All,
> > >
> > > here's a summary of bugs in the current termination procedure:
> > >
> > > 1. #isTerminated and #isSuspended fail to report correctly - in [1]
> > >
> > > 2. active process termination bug causing an image crash - in [2]
> > >
> > > 3. nested unwind bug: skipping some ensure blocks - in [3]
> > >
> > > 4. likely a bug: a failure to complete nested unwind blocks halfway thru
> > > execution - also in [3]
> > >
> > > 5. Christoph's discovery: a failure to correctly close the debugger in
> > > case
> > > of error or non-local return in unwind - in [4]
> > >
> > > 6. inconsistent semantics of unwinding protected blocks during active vs.
> > > suspended process termination
> >
> >
> >
> >
> >
> > -----
> > ^[^ Jaromir
> > --
> > Sent from: http://forum.world.st/Squeak-Dev-f45488.html
> >

Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Nicolas Cellier
BTW, I have moved your previous attempts to treated inbox, so please,
do not fear to pollute the inbox again :)

Le sam. 10 avr. 2021 à 23:25, Nicolas Cellier
<[hidden email]> a écrit :

>
> I'll live with it, xcuz the bad anglishe.
>
> Le sam. 10 avr. 2021 à 23:21, Nicolas Cellier
> <[hidden email]> a écrit :
> >
> > Hi Jaromir,
> > many thanks for those efforts and for perseverating.
> > It must be a bit frustrating to not receive any answer to this good
> > work and deep investment, but there is a good reason for that: the
> > subject is tough, and not everyone is able to give you a quick answer
> > without reinvesting an equivalent slot of time as you personally did.
> >
> > This change-set looks good to my eyes, the level of comment is good
> > too and gives me some additional confidence.
> > I'm probably not at your level of expertise yet, but I vote for
> > integrating this in trunk.
> > In the meantime, I'll leave with it in my image(s) for a few days.
> > You can as well submit an inbox contribution as you did previously,
> > this is the preferred way, unless the changes span many packages.
> >
> > cheers
> >
> > Nicolas
> >
> > Le sam. 10 avr. 2021 à 10:58, Jaromir Matas <[hidden email]> a écrit :
> > >
> > > Hi,
> > >
> > > The enclosed changeset fixes all bugs/inconsistencies concerning process
> > > termination listed below. It's a simplified version of the previous
> > > proposal. Feedback very welcome indeed!
> > >
> > > The idea is to simply extend the existing mechanism for completing
> > > halfway-through unwind blocks and let it deal with *all* unwind blocks.
> > >
> > > best,
> > >
> > > Fix_terminate_v3.cs <http://forum.world.st/file/t372955/Fix_terminate_v3.cs>
> > >
> > >
> > > Jaromir Matas wrote
> > > > Hi All,
> > > >
> > > > here's a summary of bugs in the current termination procedure:
> > > >
> > > > 1. #isTerminated and #isSuspended fail to report correctly - in [1]
> > > >
> > > > 2. active process termination bug causing an image crash - in [2]
> > > >
> > > > 3. nested unwind bug: skipping some ensure blocks - in [3]
> > > >
> > > > 4. likely a bug: a failure to complete nested unwind blocks halfway thru
> > > > execution - also in [3]
> > > >
> > > > 5. Christoph's discovery: a failure to correctly close the debugger in
> > > > case
> > > > of error or non-local return in unwind - in [4]
> > > >
> > > > 6. inconsistent semantics of unwinding protected blocks during active vs.
> > > > suspended process termination
> > >
> > >
> > >
> > >
> > >
> > > -----
> > > ^[^ Jaromir
> > > --
> > > Sent from: http://forum.world.st/Squeak-Dev-f45488.html
> > >

Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi Nicolas,
Thanks a lot for your encouragement!
Sent to the Inbox. I'll add a set of tests a bit later.

>
> BTW, I have moved your previous attempts to treated inbox, so please,
> do not fear to pollute the inbox again :)

Yes, thanks again, that's exactly what I was fearing - creating more noise
in the Inbox :)

>
> Le sam. 10 avr. 2021 à 23:21, Nicolas Cellier
> <[hidden email]> a écrit :
> >
> > Hi Jaromir,
> > many thanks for those efforts and for perseverating.
> > It must be a bit frustrating to not receive any answer to this good
> > work and deep investment, but there is a good reason for that: the
> > subject is tough, and not everyone is able to give you a quick answer
> > without reinvesting an equivalent slot of time as you personally did.
> >
> > This change-set looks good to my eyes, the level of comment is good
> > too and gives me some additional confidence.
> > I'm probably not at your level of expertise yet, but I vote for
> > integrating this in trunk.
> > In the meantime, I'll leave with it in my image(s) for a few days.
> > You can as well submit an inbox contribution as you did previously,
> > this is the preferred way, unless the changes span many packages.
> >
> > cheers
> >
> > Nicolas

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Christoph Thiede
Hi Jaromir,

I am extremely sorry for not giving any feedback on all your fixes and
proposals earlier. A few weeks ago, I was on holiday, then back at home, I
was greeted by a huge inbox with 100+ unread emails, and somehow I just lost
track of all your important updates ... Luckily, Nicolas has shown more
energy than me and has helped your contributions to find their way into the
Trunk! 🎉

Your summary in this post was extremely helpful for me to get an overview of
all the ongoing changes. I think I understood most of it, but here are some
open questions and to-dos from my point of view:

1. Regarding issue no. #5 in your list above ("Bug in Process>>#terminate |
Returning from unwind contexts" [1]): Do you consider this thread resolved
by now or is my answer to it still being expected? At the moment, this
snippet you mentioned fails to unwind completely:

x := nil.
[self error: 'x1'] ensure: [
    [self error: 'x2'] ensure: [
        [self error: 'x3'] ensure: [
            x:=3].
        x:=2].
    x:=1].

2. What is the current state of this thread [2]? If all issues are resolved
from your perspective, there is no need to discuss anything further -
otherwise, I guess it's your turn to answer again. :)

3. I noticed that Process >> #terminate contains multiple direct sends to
#runUntilErrorOrReturnFrom:. There are multiple real and potential problems
with this:

3.1 Consider the following snippet:

| p |
p := Processor activeProcess.
Transcript showln: p == Processor activeProcess.
[Transcript showln: p == Processor activeProcess] ensure: [
        Transcript showln: p == Processor activeProcess].
p

Debug it, then step into the first block, and abandon the debugger. We would
expect to see another "true" in the Transcript, but instead, we see a
"false". This is because #runUntilErrorOrReturnFrom: does not honor
process-faithful debugging. The protocol on Process, on the other hand, does
so. So probably we would want to wrap these sends into
#evaluate:onBehalfOf:.

3.2 As I think I mentioned somewhere else already, the result of
#runUntilErrorOrReturnFrom: *must* be checked to make sure that the
execution or the unwinding has not halted halfway. I don't see this in
Process >> #terminate either. This might be the cause of the bug I mentioned
in #1 of this post. Probably it's the best idea to discuss this in [1], too.
:-)

4. What's the current state of tests? Have you contributed tests for all the
issues you mentioned above? This would be awesome. :-)

Organizational notes: For sake of overview, I propose to keep this thread of
the current level of abstraction and discuss all the implementation details
in separate threads such as [1]. Ideally, we should also "close" all the
different threads about Process issues by adding another message to each of
them in order to help our future selves to keep an overview of their
solutions ...

Happy processing!

Best,
Christoph

[1]
http://forum.world.st/Bug-in-Process-gt-gt-terminate-Returning-from-unwind-contexts-td5127570.html
[2]
http://forum.world.st/Refactoring-terminate-to-get-rid-of-cannot-return-errors-etc-td5127732.html



-----
Carpe Squeak!
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi Christoph!

I apologize for not responding earlier to your great comments. I had to
educate myself in error handling first :)

> 1. Regarding issue no. #5 in your list above ("Bug in Process>>#terminate
> |
> Returning from unwind contexts" [1]): Do you consider this thread resolved
> by now or is my answer to it still being expected? At the moment, this
> snippet you mentioned fails to unwind completely:
>
> x := nil.
> [self error: 'x1'] ensure: [
>     [self error: 'x2'] ensure: [
>         [self error: 'x3'] ensure: [
>             x:=3].
>         x:=2].
>     x:=1].

I reread your remarks regarding how to interpret a situation like above:
what do we actually abandon when new errors appear during termination and we
abandon the nested debuggers? I've enclosed a changeset that makes
abandoning the debuggers equivalent to terminating the debugged process
(including unwind) - i.e. in the example above we'll get the first debugger,
abandon it which causes a process termination, encounter the second error
and start the second debugger, abandon it which again causes another
termination, etc. As a result all assignments will be executed (imagine a
`stream close` instead of `x:=1` so I guess it's justified).

However, because this is happening in the ensure block during unwind, it
seems that abandoning is almost equivalent to proceeding :) (Not entirely
though: proceed would continue after unwinding, abandon only proceeds within
the unwind scope). This poses a new challenge however - how to kill a
debugger if we deliberately want or have to stop debugging a process
immediately, i.e. without unwinding? Consider this example:

`[] ensure: [self gotcha]`

We'd get a debugger with a MNU error (Message Not Understood), abandon it
and get another debugger with the same error creating an infinite recursion
(due to how #doesNotUnderstand is written). This particular example is taken
care of in the changeset but in general I miss a Kill button - has this been
ever considered? Note: the infinite recursion danger is present even in the
current implementation but neutralized by allowing just one error during
unwinding halfway through ensure blocks :)

There's also a file Kernel-jar.1403 in the Inbox:
http://forum.world.st/The-Inbox-Kernel-jar-1403-mcz-td5129607.html

There are some additional changes to #terminate - mostly cleaning and
simplifying the code. And more comments.



> 2. What is the current state of this thread [2]? If all issues are
> resolved
> from your perspective, there is no need to discuss anything further -
> otherwise, I guess it's your turn to answer again. :)

No progress on my side but I look forward to getting to it and responding :)



> 3.1 Consider the following snippet:
>
> | p |
> p := Processor activeProcess.
> Transcript showln: p == Processor activeProcess.
> [Transcript showln: p == Processor activeProcess] ensure: [
>     Transcript showln: p == Processor activeProcess].
> p
>
> Debug it, then step into the first block, and abandon the debugger. We
> would
> expect to see another "true" in the Transcript, but instead, we see a
> "false". This is because #runUntilErrorOrReturnFrom: does not honor
> process-faithful debugging. The protocol on Process, on the other hand,
> does
> so. So probably we would want to wrap these sends into
> #evaluate:onBehalfOf:.

This is not a new issue; if you step further - into the ensure (i.e. the
argument block) block and then abandon the debugger, you will see false
instead of true even in images before the change I introduced. The reason is
precisely what you described - the use of #runUntilErrorOrReturnFrom which
operates on a context stack belonging to an other process and this way
guarantees a correct execution of non-local returns on that process's
context stack (for the price of losing process-faithful debugging).

I'm aware of the process-faithful debugging issue and I'd love to fix it,
but I'm afraid my debugger implementation knowledge is presently next to
none; I'll have to put it on my to-do list ;) I'd expect though simple
wrapping into #evaluate:onBehalfOf: may reintroduce the original nasty
non-local error bug. Would you have an idea how to wrap it so that non-local
returns still worked? That would be awesome.



> 3.2 As I think I mentioned somewhere else already, the result of
> #runUntilErrorOrReturnFrom: *must* be checked to make sure that the
> execution or the unwinding has not halted halfway. I don't see this in
> Process >> #terminate either. This might be the cause of the bug I
> mentioned
> in #1 of this post. Probably it's the best idea to discuss this in [1],
> too.
> :-)

The cause of the bug in [1] (i.e. the disastrous behavior of `[self error]
ensure: [^2]`) was caused by executing the non-local return (`^2`) on a
wrong context stack which happened as a result of using #popTo (and
consequently #evaluate:onBehalfOf:) for evaluation of the said non-local
return.

Checking the return value of #runUntilErrorOrReturnFrom is the main idea
behind the fix presented in this changeset. If the execution of the unwind
block gets halted by an error, #runUntilErrorOrReturnFrom returns from the
"foreign" stack and reports the error. Because we are operating in an unwind
block I suggest the execution of the unwind block continues by opening a new
debugger window for the found error, and the user will decide what to de
next. To achieve this the implementation of #runUntilErrorOrReturnFrom must
be modified slightly to resignal the caught exception rather than resume it
- see the enclosed changeset. No other methods use
#runUntilErrorOrReturnFrom so let's either accept the suggested modification
or create a version of #runUntilErrorOrReturnFrom with the modified
behavior.



> 4. What's the current state of tests? Have you contributed tests for all
> the
> issues you mentioned above? This would be awesome. :-)

There's a set of basic semantics tests Tests-jar.448 in the Inbox. I
realized I don't know how to "simulate" pressing debugger's Abandon in a
test but I'll add more when I figure it out :) Plus more test will come if
the change proposed here is accepted.



> Organizational notes: For sake of overview, I propose to keep this thread
> of
> the current level of abstraction and discuss all the implementation
> details
> in separate threads such as [1]. Ideally, we should also "close" all the
> different threads about Process issues by adding another message to each
> of
> them in order to help our future selves to keep an overview of their
> solutions ...

Absolutely, I'll go through all relevant discussions and update them.

Thanks again very much for all your comments,

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Sorry, forgot to enclose the changeset:  Fix_terminate_v5.cs
<http://forum.world.st/file/t372955/Fix_terminate_v5.cs>  



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
In reply to this post by Jaromir Matas
Hi Christoph, hi all,

I've added a tweak to #cannotReturn: that would make sure the method won't
return to the block that attempted an illegal non-local return. This would
prevent crashing the image/VM when accidentally pressing Proceed or stepping
over it. #terminate can take an advantage of this improvement as well - it'd
be able to safely deal with illegal non-local returns and continue
unwinding.

The latest version of #terminate is enclosed and in the Inbox here:
http://forum.world.st/The-Inbox-Kernel-jar-1405-mcz-td5129644.html

Thanks for reviewing/commenting.

Fix_terminate_v6.cs <http://forum.world.st/file/t372955/Fix_terminate_v6.cs>  

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi Christoph & all,

One more update:


Christoph Thiede wrote

> 3.1 Consider the following snippet:
>
> | p |
> p := Processor activeProcess.
> Transcript showln: p == Processor activeProcess.
> [Transcript showln: p == Processor activeProcess] ensure: [
>         Transcript showln: p == Processor activeProcess].
> p
>
> Debug it, then step into the first block, and abandon the debugger. We
> would
> expect to see another "true" in the Transcript, but instead, we see a
> "false". This is because #runUntilErrorOrReturnFrom: does not honor
> process-faithful debugging. The protocol on Process, on the other hand,
> does
> so. So probably we would want to wrap these sends into
> #evaluate:onBehalfOf:.

Yes, you were right, wrapping #runUntilErrorOrReturnFrom: into
#evaluate:onBehalfOf: works well and your example now returns true as
expected. Thanks very much! Here's a changeset:  Fix_terminate_v7.cs
<http://forum.world.st/file/t372955/Fix_terminate_v7.cs>  

A new version in the Inbox is:
http://forum.world.st/The-Inbox-Kernel-jar-1406-mcz-td5129657.html 

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Christoph Thiede
This post was updated on .
Hi Jaromir,

sorry for the late reply. I'm always impressed again by your energy and
efforts for Squeak! :-) So, now let's hope that I'll be able to obtain an
overview of all the new stuff ...

First of all, please read my message in [1] first if you have not already
done it. :-) I am quite sure that halfway-executed unwind contexts should
*not* be resumed when terminating a process.

After studying your changeset, I have rewritten #terminate in the attached
changeset to 1) not resume halfway-executed contexts any longer and b) clean
up the implementation IMHO significantly. Instead of reinventing the
unwinding wheel in Process, I reused the existing logic from Context which
is important deduplication. Process-faithful debugging now also works
consistently during termination. I have also changed #testNestedUnwinding as
proposed in [1].
My implementation passes all process tests, non-local returns seem to work
as expected, and, in particular, nested debuggers are spawned and terminated
correctly for the hot example from above:

> x := nil.
> [self error: 'x1'] ensure: [
>     [self error: 'x2'] ensure: [
>         [self error: 'x3'] ensure: [
>             x:=3].
>         x:=2].
>     x:=1].

Hope you do not find any regressions. :-) I am looking forward to your
feedback!

---

Even though I have rewritten #terminate entirely, I wanted to leave you some
general comments on your approach, maybe they are helpful for your next
important contribution: :-)

- I think that the fact that you needed to skip certain exceptions manually
was a giant suboptimal hack. :-) It was a consequence of the older behavior
of #terminate to resume halfway-executed unwinded contexts. I think I have
explained in [1] why this was not a good idea.
- Instead of modifying #runUntilErrorOrReturnFrom:, I have moved the logic
to re-signal the UnhandledError into Process >> #complete:ifError:. The
reason is that roerf seems to me like something which we should change as
rarely as possible - because of its mind-blowing complexity and because it
is used in regular debugging as well. The #resumeUnchecked: part could
actually be relevant if there occurs a second UnhandledError while jumping
out of reorf.
- Some minor comments regarding coding style: I always recommend using as
many block-local temps as possible, this makes it easier to understand their
scope. In case you haven't heard it before, you might also want to google
Guard Clause. :-) It's a very helpful idiom to avoid unnecessarily indented
structures. But that's minor remarks only, of course. :-)

---

Regarding your other ideas:

> This poses a new challenge however - how to kill a debugger if we
> deliberately want or have to stop debugging a process immediately, i.e.
> without unwinding? Consider this example:
>
> `[] ensure: [self gotcha]`
>
> We'd get a debugger with a MNU error (Message Not Understood), abandon it
> and get another debugger with the same error creating an infinite
> recursion (due to how #doesNotUnderstand is written). This particular
> example is taken care of in the changeset but in general I miss a Kill
> button - has this been ever considered?

If you got an infinite recursion, that must have been another consequence of
resuming halfway-executed unwind contexts. Normally, this shouldn't happen
(and I also could not reproduce this). But yes, there might be - though very
exotic - situations in which you actually want to force-kill a process
without executing any unwind contexts. In such situations, I usually
manually set the suspendedContext's sender to nil, you can do this directly
from the debugger and it does what it should. :-) But yes, I am already
planning to add a few more secondary buttons to the debugger (they did
something similar in Pharo some time ago, too), and a small Kill button next
to Proceed/Abandon could indeed be a nice extension.

> I'm afraid my debugger implementation knowledge is presently next to none;
> I'll have to put it on my to-do list ;)

Learning by doing and debugging it. :-) If you have any questions, please
feel free to ask them on the list ...

> There's a set of basic semantics tests Tests-jar.448 in the Inbox. I
> realized I don't know how to "simulate" pressing debugger's Abandon in a
> test but I'll add more when I figure it out :) Plus more test will come if
> the change proposed here is accepted.

Great work! They're unfortunately deprecated if we do not want to resume
halfway-executed unwind contexts, but you could update them if you agree.
Apart from that, you could also take a look at DebuggerTests if you want to
learn how to write E2E tests for the debugger. But I think that your tests
already operate at a good level of abstraction.

Best,
Christoph

fix-Process-terminate.cs
<http://forum.world.st/file/t372205/fix-Process-terminate.cs
[1]
http://forum.world.st/The-semantics-of-halfway-executed-unwind-contexts-during-process-termination-td5129800.html



-----
Carpe Squeak!
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi Christoph, hi all,

I've updated my #terminate fix to a "final" version Kernel-jar.1409 in the
Inbox:

I'm still 100% convinced completing unwind block halfway-through their
execution is a good idea; and it's not my idea ;)

If a process gets interrupted or is suspended in the middle of unwinding, it
should be allowed to finish all unwind blocks - both halfway-through and
not-yet-started. It's not necessarily a case of a raised error or abandoning
the debugger - it's a general termination procedure. If we say Abandoning a
debugger equals terminating the debugged process then we should terminate it
including all unwinds. If we don't want to equate Abandon to general
termination that doesn't mean we have to change the termination logic - we
should rather change the Abandoning logic in such case :)

I think ProcessTest >> #testNestedUnwind illustrates the idea clearly:

x1 := x2 := x3 := nil.
p:=[
                [
                        [ ] ensure: [ "halfway through completion when suspended"
                                [ ] ensure: [ "halfway through completion when suspended"
                                        Processor activeProcess suspend. "here the process gets terminated"
                                        x1 := true].
                                x2 := true]
                ] ensure: [ "not started yet when suspended"
                        x3 := true]
] fork.
Processor yield.
p terminate
self assert: x1 & x2 & x3.

There was nothing wrong with the process p when it got terminated in the
first place and thus there's no reason to prevent it from finishing all its
unwind blocks.



> [...] the fact that an error has been signaled means that the
> signalerContext is "infected" so under no circumstances, abandoning the
> process should resume the execution of this infected context!

You cannot know whether an error will be raised during termination - so you
should not be changing the general termination logic but rather adjusting
the "debugger abandon/termination" logic; currently the consensus is:
debugger Abandon equals process termination but that's not necessarily so...



> Instead of reinventing the unwinding wheel in Process, I reused the
> existing logic from Context which
> is important deduplication.

Well, actually I didn't reinvent the unwind pattern but intentionally reused
it with as few changes as possible - I think it improves readability because
people easily recognize this pattern from #resume:, #resume:through:,
#unwindTo and even the previous #terminate used the exact same pattern for
an active process termination. Besides, using the same pattern for achieving
a similar goal feels "safer" to me.



> Instead of modifying #runUntilErrorOrReturnFrom:, I have moved the logic
> to re-signal the UnhandledError into Process >> #complete:ifError:. [...]
> The #resumeUnchecked: part could
> actually be relevant if there occurs a second UnhandledError while jumping
> out of reorf.

Yes indeed, I made a silly assumption but reverted the change back already
in Kernel-jar.1408. I use its modified replica instead.



> I think that the fact that you needed to skip certain exceptions manually
> was a giant suboptimal hack. :-)

Yes, it was a sort of my to-do list ;) I've sorted out the
MessageNotUnderstood but BlockCannotReturn is still a hot topic :)


> I always recommend using as many block-local temps as possible, this makes
> it easier to understand their
> scope. In case you haven't heard it before, you might also want to google
> Guard Clause. :-)

Again, I wanted to make as few changes as possible; but agreed absolutely :)

Thanks again very much for your comments.
best regards,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi all,

there's one more "final" version (Kernel-jar.1411) of #terminate fixing
unwind in a situation where #ensure is the top context when a process is
terminated.

More tests covering unwind from non-local returns, unwind from nested errors
and a stress test #testTerminateInEnsure presented by Martin McClure at 2019
Smalltalk conference have been added:
Tests-jar.465
ToolsTests-jar.105
KernelTests-jar.405
Finally managed to add debugger tests - thanks to Marcel and Christoph for
inspiration!

A patch from Kernel-jar.1410 solving catastrophic returns and infinite loops
caused by BlockCannotReturn has been integrated here as well.

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Christoph Thiede
Hi Jaromir,

you convinced me with regard to the behavior of #terminate as well as the current definition of #testNestedUnwind - see [1]. :-) I still think my counter-proposal is relevant, we just should put it into a method with a different name. Debugger's Abandon should then use this method instead of #terminate. But please let's discuss in [1], it's already hard enough to keep an overview. :D

Regarding to your proposal:

Please see my comments in [2] about your proposed change to BlockCannotReturn.

> > Instead of reinventing the unwinding wheel in Process, I reused the existing logic from Context which is important deduplication.
> Well, actually I didn't reinvent the unwind pattern but intentionally reused it with as few changes as possible - I think it improves readability because people easily recognize this pattern from #resume:, #resume:through:, #unwindTo and even the previous #terminate used the exact same pattern for
an active process termination. Besides, using the same pattern for achieving a similar goal feels "safer" to me.

A pattern is good, but reusing the same code is even better. :-) I still see some signification duplication between #runUntilErrorOrReturnFrom: and #runUnwindUntilErrorOrReturnFrom: as well as between Process >> #terminate and Context >> #unwindTo:. But Kernel-jar.1411 already is a good step into the right direction as far as I can tell. :-)

What remains unacceptable or dangerous to me are your hard-coded exceptions in Process >> #complete:to:. If this is crucial to prevent akwards infinite recursions, we might not be immune against similar incidents for other kinds of recursion as well. Object >> #at:, for example, is no better than Object >> #doesNotUnderstand:. Actually, any exception or exception handler might produce a similar behavior. Could you provide a few concrete examples where this check is needed? Maybe we can find a more holistic solution to this issue.

> Again, I wanted to make as few changes as possible; but agreed absolutely :)

That is also a very reasonable goal which I had to learn myself the hard way. :) Keep going! :-)

Best,
Christoph

[1] http://forum.world.st/The-semantics-of-halfway-executed-unwind-contexts-during-process-termination-tp5129800p5130110.html
[2] http://forum.world.st/The-Inbox-Kernel-ct-1405-mcz-tp5129706p5130114.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi Christoph,


Christoph Thiede wrote

>> > Instead of reinventing the unwinding wheel in Process, I reused the
>> existing logic from Context which is important deduplication.
>> Well, actually I didn't reinvent the unwind pattern but intentionally
>> reused it with as few changes as possible - I think it improves
>> readability because people easily recognize this pattern from #resume:,
>> #resume:through:, #unwindTo and even the previous #terminate used the
>> exact same pattern for
> an active process termination. Besides, using the same pattern for
> achieving a similar goal feels "safer" to me.
>
> A pattern is good, but reusing the same code is even better. :-) I still
> see some signification duplication between #runUntilErrorOrReturnFrom: and
> #runUnwindUntilErrorOrReturnFrom: as well as between Process >> #terminate
> and Context >> #unwindTo:. But Kernel-jar.1411 already is a good step into
> the right direction as far as I can tell. :-)

Yes, I was wondering why I couldn't get rid of the duplication and now I
think it's because there really are two distinct unwind semantics : one
"light" for regular returns and one "heavy" for termination. Both are very
similar yet each require a slightly different behavior - that's why the
duality #runUntilErrorOrReturnFrom / #runUnwindUntilErrorOrReturnFrom or
#complete: / #complete:to: and #unwindTo: / #terminate.

With regards to #unwindTo: - I haven't tested it yet but I'm wondering
whether it wouldn't have the same unwind problem with non-local returns as
the original #terminate and require a similar fix?


Christoph Thiede wrote
> What remains unacceptable or dangerous to me are your hard-coded
> exceptions in Process >> #complete:to:. If this is crucial to prevent
> akwards infinite recursions, we might not be immune against similar
> incidents for other kinds of recursion as well. Object >> #at:, for
> example, is no better than Object >> #doesNotUnderstand:. Actually, any
> exception or exception handler might produce a similar behavior. Could you
> provide a few concrete examples where this check is needed? Maybe we can
> find a more holistic solution to this issue.

Yes, this bothers me as well. I consider two common sources of infinite
recursions:
(1) MessageNotUnderstood - #doesNotUnderstand is intentionally written so
that it resends the unknown message to facilitate writing new methods while
debugging. So for the moment to recover termination from this error I
suggested to deal with it on an individual basis - i.e. skip the unwind
block with the error. (and yes, you're right this only applies to the
"heavy" version of unwinding)
(2) BlockCannonReturn - we'll discuss this in [2]

But in general - yes, any method/exception purposefully (or not) written to
create a loop will break this patch (I admit it is just a patch really). I
extracted it to #complete:to: to make #terminate clean; this is a WIP; I
wish there was a holistic solution to this - maybe checking for exception
recursion by default? :)


Christoph Thiede wrote
>> Again, I wanted to make as few changes as possible; but agreed absolutely
>> :)
>
> That is also a very reasonable goal which I had to learn myself the hard
> way. :) Keep going! :-)
>
> Best,
> Christoph

[1]
http://forum.world.st/The-semantics-of-halfway-executed-unwind-contexts-during-process-termination-tp5129800p5130110.html
[2]
http://forum.world.st/The-Inbox-Kernel-ct-1405-mcz-tp5129706p5130114.html





-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Hi All,
I've sent an updated version of #teminate integrating Christoph's solution
of BlockCannotReturn recursion problem (in [1]), along with a battery of
tests exploring termination of nested ensure and cascading errors behavior
(Debugger tests are for info and a final version can wait until releasing
Christoph's proposal in [2]).

It's pretty much final, I hope...

Any complaints about #terminate - please shout ;)

[1] http://forum.world.st/The-Inbox-Kernel-ct-1405-mcz-td5129706.html
[2]
http://forum.world.st/The-semantics-of-halfway-executed-unwind-contexts-during-process-termination-tp5129800p5130110.html

best,



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir
Reply | Threaded
Open this post in threaded view
|

Re: Solving multiple termination bugs - summary & proposal

Jaromir Matas
Jaromir Matas wrote

> Hi All,
> I've sent an updated version of #teminate integrating Christoph's solution
> of BlockCannotReturn recursion problem (in [1]), along with a battery of
> tests exploring termination of nested ensure and cascading errors behavior
> (Debugger tests are for info and a final version can wait until releasing
> Christoph's proposal in [2]).
>
> It's pretty much final, I hope...
>
> Any complaints about #terminate - please shout ;)
>
> [1] http://forum.world.st/The-Inbox-Kernel-ct-1405-mcz-td5129706.html
> [2]
> http://forum.world.st/The-semantics-of-halfway-executed-unwind-contexts-during-process-termination-tp5129800p5130110.html
>
> best,

Here's the link:
http://forum.world.st/The-Inbox-Kernel-jar-1414-mcz-td5130198.html



-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

^[^ Jaromir