trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Chris Muller-3
My image just locked after it tried to unwind normally from a network
timeout.   I had to kill it from the OS.

Eliot would you guide me how to get my trunk image rolled back to a
state without this fix so I can try to get back to where I was to see
if its related?

Should I just load Kernel-eem.1184, an earlier one,
or something else?



On Fri, Jul 27, 2018 at 1:08 PM,  <[hidden email]> wrote:

> A new version of Kernel was added to project The Inbox:
> http://source.squeak.org/inbox/Kernel-eem.1185.mcz
>
> ==================== Summary ====================
>
> Name: Kernel-eem.1185
> Author: eem
> Time: 27 July 2018, 11:08:12.641836 am
> UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70
> Ancestors: Kernel-eem.1184
>
> In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:.  Now it will handle ifCurtailed: too.  Thanks Tobias!
>
> Reformat Process>>#terminate.  It was baaad.
>
> =============== Diff against Kernel-eem.1184 ===============
>
> Item was changed:
>   ----- Method: Process>>releaseCriticalSection: (in category 'private') -----
>   releaseCriticalSection: runnable
>         "Figure out if we are terminating a process that is in the ensure: block of a critical section.
>          In this case, if the block has made progress, pop the suspendedContext so that we leave the
>          ensure: block inside the critical: without signaling the semaphore/exiting the primitive section,
>          since presumably this has already happened.  But if it hasn't made progress but is beyond the
> +        wait (which we can tell by the oldList being one of the runnable lists, i.e. a LinkedList, not a
> -        wait (which we can tell my the oldList being one of the runnable lists, i.e. a LinkedList, not a
>          Semaphore or Mutex, et al), then the ensure: block needs to be run."
>         | selectorJustSent |
>         (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self].
>         selectorJustSent := suspendedContext selectorJustSentOrSelf.
>
>         "Receiver and/or argument blocks of ensure: in Semaphore>>critical: or Mutex>>#critical:"
>         suspendedContext isClosureContext ifTrue:
> +               [suspendedContext sender isUnwindContext ifTrue:
> -               [suspendedContext sender selector == #ensure: ifTrue:
>                         [| notWaitingButMadeNoProgress |
>                         "Avoid running the ensure: block twice, popping it if it has already been run. If runnable
>                          but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has
>                          not started to run (via suspendedContext pc = suspendedContext startpc) because ensure:
>                          uses valueNoContextSwitch, and so there is no suspension point before the wait."
>                          notWaitingButMadeNoProgress :=
>                                 runnable
>                                 and: [selectorJustSent == #wait
>                                 and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]].
>                          notWaitingButMadeNoProgress ifFalse:
>                                 [suspendedContext := suspendedContext home]].
>                  ^self].
>
>         "Either Semaphore>>critical: or Mutex>>#critical:.  Is the process still blocked?  If so, nothing further to do."
>         runnable ifFalse: [^self].
>
>         "If still at the wait the ensure: block has not been activated, so signal to restore."
>         selectorJustSent == #wait ifTrue:
>                 [suspendedContext receiver signal].
>
>         "If still at the lock primitive and the lock primitive just acquired ownership (indicated by it answering false)
>          then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock."
>         (selectorJustSent == #primitiveEnterCriticalSection
>          or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>                 [(suspendedContext stackPtr > 0
>                   and: [suspendedContext top == false]) ifTrue:
>                         [suspendedContext receiver primitiveExitCriticalSection]]!
>
> Item was changed:
>   ----- Method: Process>>terminate (in category 'changing process state') -----
>   terminate
>         "Stop the process that the receiver represents forever.
>          Unwind to execute pending ensure:/ifCurtailed: blocks before terminating.
>          If the process is in the middle of a critical: critical section, release it properly."
>
>         | ctxt unwindBlock oldList |
> +       self isActiveProcess ifTrue:
> +               [ctxt := thisContext.
> +                [ctxt := ctxt findNextUnwindContextUpTo: nil.
> +                 ctxt ~~ nil] whileTrue:
> +                       [(ctxt tempAt: 2) ifNil:
> +                               ["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt: 2) to true."
> +                                unwindBlock := ctxt tempAt: 1.
> +                                thisContext terminateTo: ctxt.
> +                                unwindBlock value]].
> -       self isActiveProcess ifTrue: [
> -               ctxt := thisContext.
> -               [       ctxt := ctxt findNextUnwindContextUpTo: nil.
> -                       ctxt isNil
> -               ] whileFalse: [
> -                       (ctxt tempAt: 2) ifNil:[
> -                               ctxt tempAt: 2 put: nil.
> -                               unwindBlock := ctxt tempAt: 1.
> -                               thisContext terminateTo: ctxt.
> -                               unwindBlock value].
> -               ].
>                 thisContext terminateTo: nil.
>                 self suspend.
> +               "If the process is resumed this will provoke a cannotReturn: error.
> +                Would self debug: thisContext title: 'Resuming a terminated process' be better?"
> +               ^self].
> -       ] ifFalse:[
> -               "Always suspend the process first so it doesn't accidentally get woken up.
> -                N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
> -                then the process is blocked, and if it is nil then the process is already suspended."
> -               oldList := self suspend.
> -               suspendedContext ifNotNil:
> -                       ["Release any method marked with the <criticalSection> pragma.
> -                         The argument is whether the process is runnable."
> -                        self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
>
> +       "Always suspend the process first so it doesn't accidentally get woken up.
> +        N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
> +        then the process is blocked, and if it is nil then the process is already suspended."
> +       oldList := self suspend.
> +       suspendedContext ifNotNil:
> +               ["Release any method marked with the <criticalSection> pragma.
> +                 The argument is whether the process is runnable."
> +                self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
> -                       "If terminating a process halfways through an unwind, try to complete that unwind block first."
> -                       (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
> -                               [:outer|
> -                               (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
> -                                       [:inner| "This is an unwind block currently under evaluation"
> -                                       suspendedContext runUntilErrorOrReturnFrom: inner]].
>
> +               "If terminating a process halfways through an unwind, try to complete that unwind block first."
> +               (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
> +                       [:outer|
> +                        (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
> +                               [:inner| "This is an unwind block currently under evaluation"
> +                                suspendedContext runUntilErrorOrReturnFrom: inner]].
> +
> +               ctxt := self popTo: suspendedContext bottomContext.
> +               ctxt == suspendedContext bottomContext ifFalse:
> +                       [self debug: ctxt title: 'Unwind error during termination'].
> +               "Set the context to its endPC for the benefit of isTerminated."
> +               ctxt pc: ctxt endPC]!
> -                       ctxt := self popTo: suspendedContext bottomContext.
> -                       ctxt == suspendedContext bottomContext ifFalse:
> -                               [self debug: ctxt title: 'Unwind error during termination'].
> -                       "Set the context to its endPC for the benefit of isTerminated."
> -                       ctxt pc: ctxt endPC]]!
>
>

Reply | Threaded
Open this post in threaded view
|

Re: trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Eliot Miranda-2
Hi Chris,

On Wed, Aug 1, 2018 at 6:55 PM, Chris Muller <[hidden email]> wrote:
My image just locked after it tried to unwind normally from a network
timeout.   I had to kill it from the OS.

Eliot would you guide me how to get my trunk image rolled back to a
state without this fix so I can try to get back to where I was to see
if its related?

Should I just load Kernel-eem.1184, an earlier one,
or something else?

1182 is the last one before any of the critical: termination changes.  
1183 has the critical: termination changes but doesn't change Semaphore>>critical: to the simple version.
1184 has the simpler Semaphore>>critical:

I would like to have a look at this image.  My changes look good to me and others.  One possible issue is that the send of releaseCriticalSection: in Process>>terminate considers a double attempt at terminating a process (for example sending terminate within an unwind block that is being run during terminate) an attempt to terminate a live process.  That could be a mistake, and instead the send should only consider the process live if suspend answers a LinkedList.  So you could experiment with changing the line

self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).

to
self releaseCriticalSection: oldList class == LinkedList.

But if it is possible to reproduce this lookup then I want to have a look and try and see what is going on.

 



On Fri, Jul 27, 2018 at 1:08 PM,  <[hidden email]> wrote:
> A new version of Kernel was added to project The Inbox:
> http://source.squeak.org/inbox/Kernel-eem.1185.mcz
>
> ==================== Summary ====================
>
> Name: Kernel-eem.1185
> Author: eem
> Time: 27 July 2018, 11:08:12.641836 am
> UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70
> Ancestors: Kernel-eem.1184
>
> In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:.  Now it will handle ifCurtailed: too.  Thanks Tobias!
>
> Reformat Process>>#terminate.  It was baaad.
>
> =============== Diff against Kernel-eem.1184 ===============
>
> Item was changed:
>   ----- Method: Process>>releaseCriticalSection: (in category 'private') -----
>   releaseCriticalSection: runnable
>         "Figure out if we are terminating a process that is in the ensure: block of a critical section.
>          In this case, if the block has made progress, pop the suspendedContext so that we leave the
>          ensure: block inside the critical: without signaling the semaphore/exiting the primitive section,
>          since presumably this has already happened.  But if it hasn't made progress but is beyond the
> +        wait (which we can tell by the oldList being one of the runnable lists, i.e. a LinkedList, not a
> -        wait (which we can tell my the oldList being one of the runnable lists, i.e. a LinkedList, not a
>          Semaphore or Mutex, et al), then the ensure: block needs to be run."
>         | selectorJustSent |
>         (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self].
>         selectorJustSent := suspendedContext selectorJustSentOrSelf.
>
>         "Receiver and/or argument blocks of ensure: in Semaphore>>critical: or Mutex>>#critical:"
>         suspendedContext isClosureContext ifTrue:
> +               [suspendedContext sender isUnwindContext ifTrue:
> -               [suspendedContext sender selector == #ensure: ifTrue:
>                         [| notWaitingButMadeNoProgress |
>                         "Avoid running the ensure: block twice, popping it if it has already been run. If runnable
>                          but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has
>                          not started to run (via suspendedContext pc = suspendedContext startpc) because ensure:
>                          uses valueNoContextSwitch, and so there is no suspension point before the wait."
>                          notWaitingButMadeNoProgress :=
>                                 runnable
>                                 and: [selectorJustSent == #wait
>                                 and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]].
>                          notWaitingButMadeNoProgress ifFalse:
>                                 [suspendedContext := suspendedContext home]].
>                  ^self].
>
>         "Either Semaphore>>critical: or Mutex>>#critical:.  Is the process still blocked?  If so, nothing further to do."
>         runnable ifFalse: [^self].
>
>         "If still at the wait the ensure: block has not been activated, so signal to restore."
>         selectorJustSent == #wait ifTrue:
>                 [suspendedContext receiver signal].
>
>         "If still at the lock primitive and the lock primitive just acquired ownership (indicated by it answering false)
>          then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock."
>         (selectorJustSent == #primitiveEnterCriticalSection
>          or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>                 [(suspendedContext stackPtr > 0
>                   and: [suspendedContext top == false]) ifTrue:
>                         [suspendedContext receiver primitiveExitCriticalSection]]!
>
> Item was changed:
>   ----- Method: Process>>terminate (in category 'changing process state') -----
>   terminate
>         "Stop the process that the receiver represents forever.
>          Unwind to execute pending ensure:/ifCurtailed: blocks before terminating.
>          If the process is in the middle of a critical: critical section, release it properly."
>
>         | ctxt unwindBlock oldList |
> +       self isActiveProcess ifTrue:
> +               [ctxt := thisContext.
> +                [ctxt := ctxt findNextUnwindContextUpTo: nil.
> +                 ctxt ~~ nil] whileTrue:
> +                       [(ctxt tempAt: 2) ifNil:
> +                               ["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt: 2) to true."
> +                                unwindBlock := ctxt tempAt: 1.
> +                                thisContext terminateTo: ctxt.
> +                                unwindBlock value]].
> -       self isActiveProcess ifTrue: [
> -               ctxt := thisContext.
> -               [       ctxt := ctxt findNextUnwindContextUpTo: nil.
> -                       ctxt isNil
> -               ] whileFalse: [
> -                       (ctxt tempAt: 2) ifNil:[
> -                               ctxt tempAt: 2 put: nil.
> -                               unwindBlock := ctxt tempAt: 1.
> -                               thisContext terminateTo: ctxt.
> -                               unwindBlock value].
> -               ].
>                 thisContext terminateTo: nil.
>                 self suspend.
> +               "If the process is resumed this will provoke a cannotReturn: error.
> +                Would self debug: thisContext title: 'Resuming a terminated process' be better?"
> +               ^self].
> -       ] ifFalse:[
> -               "Always suspend the process first so it doesn't accidentally get woken up.
> -                N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
> -                then the process is blocked, and if it is nil then the process is already suspended."
> -               oldList := self suspend.
> -               suspendedContext ifNotNil:
> -                       ["Release any method marked with the <criticalSection> pragma.
> -                         The argument is whether the process is runnable."
> -                        self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
>
> +       "Always suspend the process first so it doesn't accidentally get woken up.
> +        N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
> +        then the process is blocked, and if it is nil then the process is already suspended."
> +       oldList := self suspend.
> +       suspendedContext ifNotNil:
> +               ["Release any method marked with the <criticalSection> pragma.
> +                 The argument is whether the process is runnable."
> +                self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
> -                       "If terminating a process halfways through an unwind, try to complete that unwind block first."
> -                       (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
> -                               [:outer|
> -                               (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
> -                                       [:inner| "This is an unwind block currently under evaluation"
> -                                       suspendedContext runUntilErrorOrReturnFrom: inner]].
>
> +               "If terminating a process halfways through an unwind, try to complete that unwind block first."
> +               (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
> +                       [:outer|
> +                        (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
> +                               [:inner| "This is an unwind block currently under evaluation"
> +                                suspendedContext runUntilErrorOrReturnFrom: inner]].
> +
> +               ctxt := self popTo: suspendedContext bottomContext.
> +               ctxt == suspendedContext bottomContext ifFalse:
> +                       [self debug: ctxt title: 'Unwind error during termination'].
> +               "Set the context to its endPC for the benefit of isTerminated."
> +               ctxt pc: ctxt endPC]!
> -                       ctxt := self popTo: suspendedContext bottomContext.
> -                       ctxt == suspendedContext bottomContext ifFalse:
> -                               [self debug: ctxt title: 'Unwind error during termination'].
> -                       "Set the context to its endPC for the benefit of isTerminated."
> -                       ctxt pc: ctxt endPC]]!
>
>




--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Levente Uzonyi
In reply to this post by Chris Muller-3
Hi Chris,

Did you try to send SIGUSR1 to this image before killing it? (assuming it
didn't happen on Windows)

Levente

On Wed, 1 Aug 2018, Chris Muller wrote:

> My image just locked after it tried to unwind normally from a network
> timeout.   I had to kill it from the OS.
>
> Eliot would you guide me how to get my trunk image rolled back to a
> state without this fix so I can try to get back to where I was to see
> if its related?
>
> Should I just load Kernel-eem.1184, an earlier one,
> or something else?
>
>
>
> On Fri, Jul 27, 2018 at 1:08 PM,  <[hidden email]> wrote:
>> A new version of Kernel was added to project The Inbox:
>> http://source.squeak.org/inbox/Kernel-eem.1185.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Kernel-eem.1185
>> Author: eem
>> Time: 27 July 2018, 11:08:12.641836 am
>> UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70
>> Ancestors: Kernel-eem.1184
>>
>> In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:.  Now it will handle ifCurtailed: too.  Thanks Tobias!
>>
>> Reformat Process>>#terminate.  It was baaad.
>>
>> =============== Diff against Kernel-eem.1184 ===============
>>
>> Item was changed:
>>   ----- Method: Process>>releaseCriticalSection: (in category 'private') -----
>>   releaseCriticalSection: runnable
>>         "Figure out if we are terminating a process that is in the ensure: block of a critical section.
>>          In this case, if the block has made progress, pop the suspendedContext so that we leave the
>>          ensure: block inside the critical: without signaling the semaphore/exiting the primitive section,
>>          since presumably this has already happened.  But if it hasn't made progress but is beyond the
>> +        wait (which we can tell by the oldList being one of the runnable lists, i.e. a LinkedList, not a
>> -        wait (which we can tell my the oldList being one of the runnable lists, i.e. a LinkedList, not a
>>          Semaphore or Mutex, et al), then the ensure: block needs to be run."
>>         | selectorJustSent |
>>         (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self].
>>         selectorJustSent := suspendedContext selectorJustSentOrSelf.
>>
>>         "Receiver and/or argument blocks of ensure: in Semaphore>>critical: or Mutex>>#critical:"
>>         suspendedContext isClosureContext ifTrue:
>> +               [suspendedContext sender isUnwindContext ifTrue:
>> -               [suspendedContext sender selector == #ensure: ifTrue:
>>                         [| notWaitingButMadeNoProgress |
>>                         "Avoid running the ensure: block twice, popping it if it has already been run. If runnable
>>                          but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has
>>                          not started to run (via suspendedContext pc = suspendedContext startpc) because ensure:
>>                          uses valueNoContextSwitch, and so there is no suspension point before the wait."
>>                          notWaitingButMadeNoProgress :=
>>                                 runnable
>>                                 and: [selectorJustSent == #wait
>>                                 and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]].
>>                          notWaitingButMadeNoProgress ifFalse:
>>                                 [suspendedContext := suspendedContext home]].
>>                  ^self].
>>
>>         "Either Semaphore>>critical: or Mutex>>#critical:.  Is the process still blocked?  If so, nothing further to do."
>>         runnable ifFalse: [^self].
>>
>>         "If still at the wait the ensure: block has not been activated, so signal to restore."
>>         selectorJustSent == #wait ifTrue:
>>                 [suspendedContext receiver signal].
>>
>>         "If still at the lock primitive and the lock primitive just acquired ownership (indicated by it answering false)
>>          then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock."
>>         (selectorJustSent == #primitiveEnterCriticalSection
>>          or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>>                 [(suspendedContext stackPtr > 0
>>                   and: [suspendedContext top == false]) ifTrue:
>>                         [suspendedContext receiver primitiveExitCriticalSection]]!
>>
>> Item was changed:
>>   ----- Method: Process>>terminate (in category 'changing process state') -----
>>   terminate
>>         "Stop the process that the receiver represents forever.
>>          Unwind to execute pending ensure:/ifCurtailed: blocks before terminating.
>>          If the process is in the middle of a critical: critical section, release it properly."
>>
>>         | ctxt unwindBlock oldList |
>> +       self isActiveProcess ifTrue:
>> +               [ctxt := thisContext.
>> +                [ctxt := ctxt findNextUnwindContextUpTo: nil.
>> +                 ctxt ~~ nil] whileTrue:
>> +                       [(ctxt tempAt: 2) ifNil:
>> +                               ["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt: 2) to true."
>> +                                unwindBlock := ctxt tempAt: 1.
>> +                                thisContext terminateTo: ctxt.
>> +                                unwindBlock value]].
>> -       self isActiveProcess ifTrue: [
>> -               ctxt := thisContext.
>> -               [       ctxt := ctxt findNextUnwindContextUpTo: nil.
>> -                       ctxt isNil
>> -               ] whileFalse: [
>> -                       (ctxt tempAt: 2) ifNil:[
>> -                               ctxt tempAt: 2 put: nil.
>> -                               unwindBlock := ctxt tempAt: 1.
>> -                               thisContext terminateTo: ctxt.
>> -                               unwindBlock value].
>> -               ].
>>                 thisContext terminateTo: nil.
>>                 self suspend.
>> +               "If the process is resumed this will provoke a cannotReturn: error.
>> +                Would self debug: thisContext title: 'Resuming a terminated process' be better?"
>> +               ^self].
>> -       ] ifFalse:[
>> -               "Always suspend the process first so it doesn't accidentally get woken up.
>> -                N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
>> -                then the process is blocked, and if it is nil then the process is already suspended."
>> -               oldList := self suspend.
>> -               suspendedContext ifNotNil:
>> -                       ["Release any method marked with the <criticalSection> pragma.
>> -                         The argument is whether the process is runnable."
>> -                        self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
>>
>> +       "Always suspend the process first so it doesn't accidentally get woken up.
>> +        N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
>> +        then the process is blocked, and if it is nil then the process is already suspended."
>> +       oldList := self suspend.
>> +       suspendedContext ifNotNil:
>> +               ["Release any method marked with the <criticalSection> pragma.
>> +                 The argument is whether the process is runnable."
>> +                self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
>> -                       "If terminating a process halfways through an unwind, try to complete that unwind block first."
>> -                       (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
>> -                               [:outer|
>> -                               (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
>> -                                       [:inner| "This is an unwind block currently under evaluation"
>> -                                       suspendedContext runUntilErrorOrReturnFrom: inner]].
>>
>> +               "If terminating a process halfways through an unwind, try to complete that unwind block first."
>> +               (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
>> +                       [:outer|
>> +                        (suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
>> +                               [:inner| "This is an unwind block currently under evaluation"
>> +                                suspendedContext runUntilErrorOrReturnFrom: inner]].
>> +
>> +               ctxt := self popTo: suspendedContext bottomContext.
>> +               ctxt == suspendedContext bottomContext ifFalse:
>> +                       [self debug: ctxt title: 'Unwind error during termination'].
>> +               "Set the context to its endPC for the benefit of isTerminated."
>> +               ctxt pc: ctxt endPC]!
>> -                       ctxt := self popTo: suspendedContext bottomContext.
>> -                       ctxt == suspendedContext bottomContext ifFalse:
>> -                               [self debug: ctxt title: 'Unwind error during termination'].
>> -                       "Set the context to its endPC for the benefit of isTerminated."
>> -                       ctxt pc: ctxt endPC]]!
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Chris Muller-3
Hi guys,

Thanks, Eliot, and please accept my apologies -- when I later rebooted
the image it still only had 1182, and so I'm not 100% certain I had
updated that image or simply hadn't saved after I updated.  I am going
to update and save everything and keep moving forward normally to see
it happens again.  The 201807260206 was flawless for me in the first
big stress test, this time I'll be sure to save my updated trunk from
the get go.

Thanks for that reminder about kill SIGUSR1, Levente.  I don't use
post-it notes but that's so useful I probably should get one for
that..   :)

Best,
  Chris

On Thu, Aug 2, 2018 at 6:52 AM, Levente Uzonyi <[hidden email]> wrote:

> Hi Chris,
>
> Did you try to send SIGUSR1 to this image before killing it? (assuming it
> didn't happen on Windows)
>
> Levente
>
>
> On Wed, 1 Aug 2018, Chris Muller wrote:
>
>> My image just locked after it tried to unwind normally from a network
>> timeout.   I had to kill it from the OS.
>>
>> Eliot would you guide me how to get my trunk image rolled back to a
>> state without this fix so I can try to get back to where I was to see
>> if its related?
>>
>> Should I just load Kernel-eem.1184, an earlier one,
>> or something else?
>>
>>
>>
>> On Fri, Jul 27, 2018 at 1:08 PM,  <[hidden email]> wrote:
>>>
>>> A new version of Kernel was added to project The Inbox:
>>> http://source.squeak.org/inbox/Kernel-eem.1185.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Kernel-eem.1185
>>> Author: eem
>>> Time: 27 July 2018, 11:08:12.641836 am
>>> UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70
>>> Ancestors: Kernel-eem.1184
>>>
>>> In releaseCriticalSection: use isUnwindContext instead of the more
>>> fragile context selector == #ensure:.  Now it will handle ifCurtailed: too.
>>> Thanks Tobias!
>>>
>>> Reformat Process>>#terminate.  It was baaad.
>>>
>>> =============== Diff against Kernel-eem.1184 ===============
>>>
>>> Item was changed:
>>>   ----- Method: Process>>releaseCriticalSection: (in category 'private')
>>> -----
>>>   releaseCriticalSection: runnable
>>>         "Figure out if we are terminating a process that is in the
>>> ensure: block of a critical section.
>>>          In this case, if the block has made progress, pop the
>>> suspendedContext so that we leave the
>>>          ensure: block inside the critical: without signaling the
>>> semaphore/exiting the primitive section,
>>>          since presumably this has already happened.  But if it hasn't
>>> made progress but is beyond the
>>> +        wait (which we can tell by the oldList being one of the runnable
>>> lists, i.e. a LinkedList, not a
>>> -        wait (which we can tell my the oldList being one of the runnable
>>> lists, i.e. a LinkedList, not a
>>>          Semaphore or Mutex, et al), then the ensure: block needs to be
>>> run."
>>>         | selectorJustSent |
>>>         (suspendedContext method pragmaAt: #criticalSection) ifNil:
>>> [^self].
>>>         selectorJustSent := suspendedContext selectorJustSentOrSelf.
>>>
>>>         "Receiver and/or argument blocks of ensure: in
>>> Semaphore>>critical: or Mutex>>#critical:"
>>>         suspendedContext isClosureContext ifTrue:
>>> +               [suspendedContext sender isUnwindContext ifTrue:
>>> -               [suspendedContext sender selector == #ensure: ifTrue:
>>>                         [| notWaitingButMadeNoProgress |
>>>                         "Avoid running the ensure: block twice, popping
>>> it if it has already been run. If runnable
>>>                          but at the wait, leave it in place. N.B. No need
>>> to check if the block receiver of ensure: has
>>>                          not started to run (via suspendedContext pc =
>>> suspendedContext startpc) because ensure:
>>>                          uses valueNoContextSwitch, and so there is no
>>> suspension point before the wait."
>>>                          notWaitingButMadeNoProgress :=
>>>                                 runnable
>>>                                 and: [selectorJustSent == #wait
>>>                                 and: [suspendedContext sender
>>> selectorJustSentOrSelf == #valueNoContextSwitch]].
>>>                          notWaitingButMadeNoProgress ifFalse:
>>>                                 [suspendedContext := suspendedContext
>>> home]].
>>>                  ^self].
>>>
>>>         "Either Semaphore>>critical: or Mutex>>#critical:.  Is the
>>> process still blocked?  If so, nothing further to do."
>>>         runnable ifFalse: [^self].
>>>
>>>         "If still at the wait the ensure: block has not been activated,
>>> so signal to restore."
>>>         selectorJustSent == #wait ifTrue:
>>>                 [suspendedContext receiver signal].
>>>
>>>         "If still at the lock primitive and the lock primitive just
>>> acquired ownership (indicated by it answering false)
>>>          then the ensure block has not been activated, so explicitly
>>> primitiveExitCriticalSection to unlock."
>>>         (selectorJustSent == #primitiveEnterCriticalSection
>>>          or: [selectorJustSent ==
>>> #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>>>                 [(suspendedContext stackPtr > 0
>>>                   and: [suspendedContext top == false]) ifTrue:
>>>                         [suspendedContext receiver
>>> primitiveExitCriticalSection]]!
>>>
>>> Item was changed:
>>>   ----- Method: Process>>terminate (in category 'changing process state')
>>> -----
>>>   terminate
>>>         "Stop the process that the receiver represents forever.
>>>          Unwind to execute pending ensure:/ifCurtailed: blocks before
>>> terminating.
>>>          If the process is in the middle of a critical: critical section,
>>> release it properly."
>>>
>>>         | ctxt unwindBlock oldList |
>>> +       self isActiveProcess ifTrue:
>>> +               [ctxt := thisContext.
>>> +                [ctxt := ctxt findNextUnwindContextUpTo: nil.
>>> +                 ctxt ~~ nil] whileTrue:
>>> +                       [(ctxt tempAt: 2) ifNil:
>>> +                               ["N.B. Unlike Context>>unwindTo: we do
>>> not set complete (tempAt: 2) to true."
>>> +                                unwindBlock := ctxt tempAt: 1.
>>> +                                thisContext terminateTo: ctxt.
>>> +                                unwindBlock value]].
>>> -       self isActiveProcess ifTrue: [
>>> -               ctxt := thisContext.
>>> -               [       ctxt := ctxt findNextUnwindContextUpTo: nil.
>>> -                       ctxt isNil
>>> -               ] whileFalse: [
>>> -                       (ctxt tempAt: 2) ifNil:[
>>> -                               ctxt tempAt: 2 put: nil.
>>> -                               unwindBlock := ctxt tempAt: 1.
>>> -                               thisContext terminateTo: ctxt.
>>> -                               unwindBlock value].
>>> -               ].
>>>                 thisContext terminateTo: nil.
>>>                 self suspend.
>>> +               "If the process is resumed this will provoke a
>>> cannotReturn: error.
>>> +                Would self debug: thisContext title: 'Resuming a
>>> terminated process' be better?"
>>> +               ^self].
>>> -       ] ifFalse:[
>>> -               "Always suspend the process first so it doesn't
>>> accidentally get woken up.
>>> -                N.B. If oldList is a LinkedList then the process is
>>> runnable. If it is a Semaphore/Mutex et al
>>> -                then the process is blocked, and if it is nil then the
>>> process is already suspended."
>>> -               oldList := self suspend.
>>> -               suspendedContext ifNotNil:
>>> -                       ["Release any method marked with the
>>> <criticalSection> pragma.
>>> -                         The argument is whether the process is
>>> runnable."
>>> -                        self releaseCriticalSection: (oldList isNil or:
>>> [oldList class == LinkedList]).
>>>
>>> +       "Always suspend the process first so it doesn't accidentally get
>>> woken up.
>>> +        N.B. If oldList is a LinkedList then the process is runnable. If
>>> it is a Semaphore/Mutex et al
>>> +        then the process is blocked, and if it is nil then the process
>>> is already suspended."
>>> +       oldList := self suspend.
>>> +       suspendedContext ifNotNil:
>>> +               ["Release any method marked with the <criticalSection>
>>> pragma.
>>> +                 The argument is whether the process is runnable."
>>> +                self releaseCriticalSection: (oldList isNil or: [oldList
>>> class == LinkedList]).
>>> -                       "If terminating a process halfways through an
>>> unwind, try to complete that unwind block first."
>>> -                       (suspendedContext findNextUnwindContextUpTo: nil)
>>> ifNotNil:
>>> -                               [:outer|
>>> -                               (suspendedContext
>>> findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
>>> -                                       [:inner| "This is an unwind block
>>> currently under evaluation"
>>> -                                       suspendedContext
>>> runUntilErrorOrReturnFrom: inner]].
>>>
>>> +               "If terminating a process halfways through an unwind, try
>>> to complete that unwind block first."
>>> +               (suspendedContext findNextUnwindContextUpTo: nil)
>>> ifNotNil:
>>> +                       [:outer|
>>> +                        (suspendedContext findContextSuchThat:[:c| c
>>> closure == (outer tempAt: 1)]) ifNotNil:
>>> +                               [:inner| "This is an unwind block
>>> currently under evaluation"
>>> +                                suspendedContext
>>> runUntilErrorOrReturnFrom: inner]].
>>> +
>>> +               ctxt := self popTo: suspendedContext bottomContext.
>>> +               ctxt == suspendedContext bottomContext ifFalse:
>>> +                       [self debug: ctxt title: 'Unwind error during
>>> termination'].
>>> +               "Set the context to its endPC for the benefit of
>>> isTerminated."
>>> +               ctxt pc: ctxt endPC]!
>>> -                       ctxt := self popTo: suspendedContext
>>> bottomContext.
>>> -                       ctxt == suspendedContext bottomContext ifFalse:
>>> -                               [self debug: ctxt title: 'Unwind error
>>> during termination'].
>>> -                       "Set the context to its endPC for the benefit of
>>> isTerminated."
>>> -                       ctxt pc: ctxt endPC]]!
>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Eliot Miranda-2
Hi Chris,

On Thu, Aug 2, 2018 at 4:03 PM, Chris Muller <[hidden email]> wrote:
Hi guys,

Thanks, Eliot, and please accept my apologies -- when I later rebooted
the image it still only had 1182, and so I'm not 100% certain I had
updated that image or simply hadn't saved after I updated.

If you look at the changes file you should be able to see whether the changes from 1183, 1184 et al are present on the changes file or not.
The package name is logged on the changes file as a comment doit, e.g. 
"Kernel"!

If people think it's a good idea we could also log update beginnings and endings.  It would be nice to see
"==========  Update completed:  18150 -> 18163 =========="!
logged on the changes file.
 
  I am going
to update and save everything and keep moving forward normally to see
it happens again.  The 201807260206 was flawless for me in the first
big stress test, this time I'll be sure to save my updated trunk from
the get go.

Cool.  Thanks.  And if there is a lock up, I'd love to look at the image with you.
 

Thanks for that reminder about kill SIGUSR1, Levente.  I don't use
post-it notes but that's so useful I probably should get one for
that..   :)

Best,
  Chris

On Thu, Aug 2, 2018 at 6:52 AM, Levente Uzonyi <[hidden email]> wrote:
> Hi Chris,
>
> Did you try to send SIGUSR1 to this image before killing it? (assuming it
> didn't happen on Windows)
>
> Levente
>
>
> On Wed, 1 Aug 2018, Chris Muller wrote:
>
>> My image just locked after it tried to unwind normally from a network
>> timeout.   I had to kill it from the OS.
>>
>> Eliot would you guide me how to get my trunk image rolled back to a
>> state without this fix so I can try to get back to where I was to see
>> if its related?
>>
>> Should I just load Kernel-eem.1184, an earlier one,
>> or something else?
>>
>>
>>
>> On Fri, Jul 27, 2018 at 1:08 PM,  <[hidden email]> wrote:
>>>
>>> A new version of Kernel was added to project The Inbox:
>>> http://source.squeak.org/inbox/Kernel-eem.1185.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Kernel-eem.1185
>>> Author: eem
>>> Time: 27 July 2018, 11:08:12.641836 am
>>> UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70
>>> Ancestors: Kernel-eem.1184
>>>
>>> In releaseCriticalSection: use isUnwindContext instead of the more
>>> fragile context selector == #ensure:.  Now it will handle ifCurtailed: too.
>>> Thanks Tobias!
>>>
>>> Reformat Process>>#terminate.  It was baaad.
>>>
>>> =============== Diff against Kernel-eem.1184 ===============
>>>
>>> Item was changed:
>>>   ----- Method: Process>>releaseCriticalSection: (in category 'private')
>>> -----
>>>   releaseCriticalSection: runnable
>>>         "Figure out if we are terminating a process that is in the
>>> ensure: block of a critical section.
>>>          In this case, if the block has made progress, pop the
>>> suspendedContext so that we leave the
>>>          ensure: block inside the critical: without signaling the
>>> semaphore/exiting the primitive section,
>>>          since presumably this has already happened.  But if it hasn't
>>> made progress but is beyond the
>>> +        wait (which we can tell by the oldList being one of the runnable
>>> lists, i.e. a LinkedList, not a
>>> -        wait (which we can tell my the oldList being one of the runnable
>>> lists, i.e. a LinkedList, not a
>>>          Semaphore or Mutex, et al), then the ensure: block needs to be
>>> run."
>>>         | selectorJustSent |
>>>         (suspendedContext method pragmaAt: #criticalSection) ifNil:
>>> [^self].
>>>         selectorJustSent := suspendedContext selectorJustSentOrSelf.
>>>
>>>         "Receiver and/or argument blocks of ensure: in
>>> Semaphore>>critical: or Mutex>>#critical:"
>>>         suspendedContext isClosureContext ifTrue:
>>> +               [suspendedContext sender isUnwindContext ifTrue:
>>> -               [suspendedContext sender selector == #ensure: ifTrue:
>>>                         [| notWaitingButMadeNoProgress |
>>>                         "Avoid running the ensure: block twice, popping
>>> it if it has already been run. If runnable
>>>                          but at the wait, leave it in place. N.B. No need
>>> to check if the block receiver of ensure: has
>>>                          not started to run (via suspendedContext pc =
>>> suspendedContext startpc) because ensure:
>>>                          uses valueNoContextSwitch, and so there is no
>>> suspension point before the wait."
>>>                          notWaitingButMadeNoProgress :=
>>>                                 runnable
>>>                                 and: [selectorJustSent == #wait
>>>                                 and: [suspendedContext sender
>>> selectorJustSentOrSelf == #valueNoContextSwitch]].
>>>                          notWaitingButMadeNoProgress ifFalse:
>>>                                 [suspendedContext := suspendedContext
>>> home]].
>>>                  ^self].
>>>
>>>         "Either Semaphore>>critical: or Mutex>>#critical:.  Is the
>>> process still blocked?  If so, nothing further to do."
>>>         runnable ifFalse: [^self].
>>>
>>>         "If still at the wait the ensure: block has not been activated,
>>> so signal to restore."
>>>         selectorJustSent == #wait ifTrue:
>>>                 [suspendedContext receiver signal].
>>>
>>>         "If still at the lock primitive and the lock primitive just
>>> acquired ownership (indicated by it answering false)
>>>          then the ensure block has not been activated, so explicitly
>>> primitiveExitCriticalSection to unlock."
>>>         (selectorJustSent == #primitiveEnterCriticalSection
>>>          or: [selectorJustSent ==
>>> #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>>>                 [(suspendedContext stackPtr > 0
>>>                   and: [suspendedContext top == false]) ifTrue:
>>>                         [suspendedContext receiver
>>> primitiveExitCriticalSection]]!
>>>
>>> Item was changed:
>>>   ----- Method: Process>>terminate (in category 'changing process state')
>>> -----
>>>   terminate
>>>         "Stop the process that the receiver represents forever.
>>>          Unwind to execute pending ensure:/ifCurtailed: blocks before
>>> terminating.
>>>          If the process is in the middle of a critical: critical section,
>>> release it properly."
>>>
>>>         | ctxt unwindBlock oldList |
>>> +       self isActiveProcess ifTrue:
>>> +               [ctxt := thisContext.
>>> +                [ctxt := ctxt findNextUnwindContextUpTo: nil.
>>> +                 ctxt ~~ nil] whileTrue:
>>> +                       [(ctxt tempAt: 2) ifNil:
>>> +                               ["N.B. Unlike Context>>unwindTo: we do
>>> not set complete (tempAt: 2) to true."
>>> +                                unwindBlock := ctxt tempAt: 1.
>>> +                                thisContext terminateTo: ctxt.
>>> +                                unwindBlock value]].
>>> -       self isActiveProcess ifTrue: [
>>> -               ctxt := thisContext.
>>> -               [       ctxt := ctxt findNextUnwindContextUpTo: nil.
>>> -                       ctxt isNil
>>> -               ] whileFalse: [
>>> -                       (ctxt tempAt: 2) ifNil:[
>>> -                               ctxt tempAt: 2 put: nil.
>>> -                               unwindBlock := ctxt tempAt: 1.
>>> -                               thisContext terminateTo: ctxt.
>>> -                               unwindBlock value].
>>> -               ].
>>>                 thisContext terminateTo: nil.
>>>                 self suspend.
>>> +               "If the process is resumed this will provoke a
>>> cannotReturn: error.
>>> +                Would self debug: thisContext title: 'Resuming a
>>> terminated process' be better?"
>>> +               ^self].
>>> -       ] ifFalse:[
>>> -               "Always suspend the process first so it doesn't
>>> accidentally get woken up.
>>> -                N.B. If oldList is a LinkedList then the process is
>>> runnable. If it is a Semaphore/Mutex et al
>>> -                then the process is blocked, and if it is nil then the
>>> process is already suspended."
>>> -               oldList := self suspend.
>>> -               suspendedContext ifNotNil:
>>> -                       ["Release any method marked with the
>>> <criticalSection> pragma.
>>> -                         The argument is whether the process is
>>> runnable."
>>> -                        self releaseCriticalSection: (oldList isNil or:
>>> [oldList class == LinkedList]).
>>>
>>> +       "Always suspend the process first so it doesn't accidentally get
>>> woken up.
>>> +        N.B. If oldList is a LinkedList then the process is runnable. If
>>> it is a Semaphore/Mutex et al
>>> +        then the process is blocked, and if it is nil then the process
>>> is already suspended."
>>> +       oldList := self suspend.
>>> +       suspendedContext ifNotNil:
>>> +               ["Release any method marked with the <criticalSection>
>>> pragma.
>>> +                 The argument is whether the process is runnable."
>>> +                self releaseCriticalSection: (oldList isNil or: [oldList
>>> class == LinkedList]).
>>> -                       "If terminating a process halfways through an
>>> unwind, try to complete that unwind block first."
>>> -                       (suspendedContext findNextUnwindContextUpTo: nil)
>>> ifNotNil:
>>> -                               [:outer|
>>> -                               (suspendedContext
>>> findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
>>> -                                       [:inner| "This is an unwind block
>>> currently under evaluation"
>>> -                                       suspendedContext
>>> runUntilErrorOrReturnFrom: inner]].
>>>
>>> +               "If terminating a process halfways through an unwind, try
>>> to complete that unwind block first."
>>> +               (suspendedContext findNextUnwindContextUpTo: nil)
>>> ifNotNil:
>>> +                       [:outer|
>>> +                        (suspendedContext findContextSuchThat:[:c| c
>>> closure == (outer tempAt: 1)]) ifNotNil:
>>> +                               [:inner| "This is an unwind block
>>> currently under evaluation"
>>> +                                suspendedContext
>>> runUntilErrorOrReturnFrom: inner]].
>>> +
>>> +               ctxt := self popTo: suspendedContext bottomContext.
>>> +               ctxt == suspendedContext bottomContext ifFalse:
>>> +                       [self debug: ctxt title: 'Unwind error during
>>> termination'].
>>> +               "Set the context to its endPC for the benefit of
>>> isTerminated."
>>> +               ctxt pc: ctxt endPC]!
>>> -                       ctxt := self popTo: suspendedContext
>>> bottomContext.
>>> -                       ctxt == suspendedContext bottomContext ifFalse:
>>> -                               [self debug: ctxt title: 'Unwind error
>>> during termination'].
>>> -                       "Set the context to its endPC for the benefit of
>>> isTerminated."
>>> -                       ctxt pc: ctxt endPC]]!
>>>
>>>
>




--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: trunk image locked (was: The Inbox: Kernel-eem.1185.mcz)

Craig Latta

> If people think it's a good idea we could also log update beginnings
> and endings.  It would be nice to see
> "==========  Update completed:  18150 -> 18163 =========="!
> logged on the changes file.

     +1


-C

--
Craig Latta
Black Page Digital
Amsterdam :: San Francisco
[hidden email]
+31   6 2757 7177 (SMS ok)
+ 1 415  287 3547 (no SMS)