Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Johan Brichau-2
Hi,

I ran into a freeze issue when trying to save and/or quit an image after loading Seaside3 and Zinc (not an infrequent combination :)

The cause seems to be that terminating processes in a `shutDown` handler (called on image shutdown / snapshot) leads to a deadlock (somewhere).
Both Comet and Zinc will send #terminate to a process when an image is quit or saved. If I disable one of those, the deadlock does not occur.

I managed to reconstruct the issue with a simple example, attached.

The attached Freeze class creates two processes (that loop with a Delay>>wait inside) in the `startUp` method (on image start).
These processes are terminated by the shutdown method (called on image quit/save).
However, when you try to quit and/or save the image, you will experience an image freeze.

I’m not familiar with the process scheduler, or anything else in this context… so help would be appreciated while I try to dive further into it :)

To reproduce: just file in the attached file in a Pharo7 image and try to quit.
For reference, System Reporter output for my configuration.

Any ideas?

Cheers
Johan




Image
-----
/Users/jbrichau/Pharo/images/Pharo 7.0 - 64bit (development version)/Pharo 7.0 - 64bit (development version).image
Pharo7.0alpha
Build information: Pharo-7.0+alpha.build.1131.sha.a46bfaef44ba209888c92fa0e3bd3a5e04a38c5d (64 Bit)
Unnamed

Virtual Machine
---------------
/Users/jbrichau/Pharo/vms/70-x64/Pharo.app/Contents/MacOS/Pharo
CoInterpreter VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
StackToRegisterMappingCogit VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
VM: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date: Thu Jun 28 14:56:30 2018 CommitHash: a8a1dc1 Plugins: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git

Mac OS X built on Jun 28 2018 13:07:33 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)
VMMaker versionString VM: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date: Thu Jun 28 14:56:30 2018 CommitHash: a8a1dc1 Plugins: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
CoInterpreter VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
StackToRegisterMappingCogit VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018


Freeze.st (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Ben Coman
I don't have a total answer, but can confirm the problem and help
characterize it.
First, great test-snippet. Really nice and concise.

On Windows it indeed freezes my Pharo 7 image when saving the image.

It doesn't freeze Pharo 6.

It doesn't freeze if the test-processes are not terminated.

It doesn't freeze if the shutdown code runs at a lower priority that
the test-processes.
i.e. it works with these added lines...
    shutDown: quitting
    "+"    |restorePriority|
    "+"    restorePriority := Processor activeProcess priority.
    "+"    Processor activeProcess priority: 15.
            Process1 ifNotNil:[ Process1 terminate ].
            Process2 ifNotNil: [Process2 terminate ].
    "+"   Processor activeProcess priority: restorePriority.
            Process1 := Process2 := nil


Does that help someone else identify the relevant changes between Pharo 6 & 7
to explain the behaviour in more depth.

cheers -ben

On 26 July 2018 at 00:46, Johan Brichau <[hidden email]> wrote:

> Hi,
>
> I ran into a freeze issue when trying to save and/or quit an image after loading Seaside3 and Zinc (not an infrequent combination :)
>
> The cause seems to be that terminating processes in a `shutDown` handler (called on image shutdown / snapshot) leads to a deadlock (somewhere).
> Both Comet and Zinc will send #terminate to a process when an image is quit or saved. If I disable one of those, the deadlock does not occur.
>
> I managed to reconstruct the issue with a simple example, attached.
>
> The attached Freeze class creates two processes (that loop with a Delay>>wait inside) in the `startUp` method (on image start).
> These processes are terminated by the shutdown method (called on image quit/save).
> However, when you try to quit and/or save the image, you will experience an image freeze.
>
> I’m not familiar with the process scheduler, or anything else in this context… so help would be appreciated while I try to dive further into it :)
>
> To reproduce: just file in the attached file in a Pharo7 image and try to quit.
> For reference, System Reporter output for my configuration.
>
> Any ideas?
>
> Cheers
> Johan
>
>
>
>
> Image
> -----
> /Users/jbrichau/Pharo/images/Pharo 7.0 - 64bit (development version)/Pharo 7.0 - 64bit (development version).image
> Pharo7.0alpha
> Build information: Pharo-7.0+alpha.build.1131.sha.a46bfaef44ba209888c92fa0e3bd3a5e04a38c5d (64 Bit)
> Unnamed
>
> Virtual Machine
> ---------------
> /Users/jbrichau/Pharo/vms/70-x64/Pharo.app/Contents/MacOS/Pharo
> CoInterpreter VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
> StackToRegisterMappingCogit VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
> VM: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date: Thu Jun 28 14:56:30 2018 CommitHash: a8a1dc1 Plugins: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
>
> Mac OS X built on Jun 28 2018 13:07:33 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)
> VMMaker versionString VM: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date: Thu Jun 28 14:56:30 2018 CommitHash: a8a1dc1 Plugins: 201806281256 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
> CoInterpreter VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
> StackToRegisterMappingCogit VMMaker.oscog-eem.2401 uuid: 29232e0e-c9e3-41d8-ae75-519db862e02c Jun 28 2018
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Ben Coman
On 26 July 2018 at 00:46, Johan Brichau <[hidden email]> wrote:
> Hi,
>
> I ran into a freeze issue when trying to save and/or quit an image after loading Seaside3 and Zinc (not an infrequent combination :)
>
> The cause seems to be that terminating processes in a `shutDown` handler (called on image shutdown / snapshot) leads to a deadlock (somewhere).
> Both Comet and Zinc will send #terminate to a process when an image is quit or saved. If I disable one of those, the deadlock does not occur.
>
> I managed to reconstruct the issue with a simple example, attached.
>
> The attached Freeze class creates two processes (that loop with a Delay>>wait inside) in the `startUp` method (on image start).
> These processes are terminated by the shutdown method (called on image quit/save).
> However, when you try to quit and/or save the image, you will experience an image freeze.
>
> I’m not familiar with the process scheduler, or anything else in this context… so help would be appreciated while I try to dive further into it :)
>
> To reproduce: just file in the attached file in a Pharo7 image and try to quit.
> For reference, System Reporter output for my configuration.
>
> Any ideas?
>
> Cheers
> Johan

On Thu, 26 Jul 2018 at 19:06, Ben Coman <[hidden email]> wrote:
I don't have a total answer, but can confirm the problem and help
characterize it.
First, great test-snippet. Really nice and concise.

On Windows it indeed freezes my Pharo 7 image when saving the image.

It doesn't freeze Pharo 6.

It doesn't freeze if the test-processes are not terminated.

It doesn't freeze if the shutdown code runs at a lower priority that
the test-processes.
i.e. it works with these added lines...
    shutDown: quitting
    "+"    |restorePriority|
    "+"    restorePriority := Processor activeProcess priority.
    "+"    Processor activeProcess priority: 15.
            Process1 ifNotNil:[ Process1 terminate ].
            Process2 ifNotNil: [Process2 terminate ].
    "+"   Processor activeProcess priority: restorePriority.
            Process1 := Process2 := nil


Does that help someone else identify the relevant changes between Pharo 6 & 7
to explain the behaviour in more depth.

cheers -ben

From recent discussion Vincent, I believe I've got my head fully around this...

IIUC part of the shutDown/startUp was being run at highestPriority. So your case boiled down to...

  |process1 process2|
  Delay delaySchedulerClass: DelaySpinScheduler. "default < build 1273"
  processA := [ (Delay forSeconds: 10) wait ] forkAt: 20 named: 'processA'.
  processB := [ (Delay forSeconds: 10) wait ] forkAt: 20 named: 'processB'.
  1 second wait.
  [  
    processA terminate.
    processB terminate. "image locked here" 
  ] forkAt: Processor highestPriority.

The freeze is due to the #terminate causing the curtailed block in Delay>>wait being run at highestPriority.
    Delay>>wait
self schedule.
[delaySemaphore wait] ifCurtailed: [self unschedule].

The first termination (Process A) fills the transfer variable /finishedDelay/ in DelaySpinScheduler>>unschedule: 
which is normally cleared by "timingSemaphore signal" waking up #handleTimerEvent: running at highestPriority. 
But with #unschedule: at highestPriority, co-operative scheduling within priorities instead continues execution 
with the second termination (Process B) which finds /finishedDelay/ still filled and spins forever with no chance
of #handleTimerEvent: running to clear the transfer variable.  

The following instrumentation helps observe this...

    DelayExperimentalScheduler>>unschedule: aDelay
finishedDelay == nil 
ifTrue: [ 
finishedDelay := aDelay. "...and this assignment" 
timingSemaphore signal.
]
ifFalse: [ |context|
Transcript crShow: Processor activeProcess name, ' finishedDelay not nil'.
context := thisContext.
[ context = nil ] whileFalse: [  
Transcript crShow: '  <-- ', context printString.
context := context sender ]].
which then using "Delay delaySchedulerClass: DelayExperimentalSpinScheduler" for the boiled-down-case produces...
 
processB finishedDelay not nil
  <-- DelayExperimentalSpinScheduler>>unschedule:
  <-- Delay>>unschedule
  <-- [ self unschedule ] in Delay>>wait
  <-- Context>>resume:through:
  <-- BlockClosure>>ifCurtailed:
  <-- Delay>>wait
  <-- [ (Delay forSeconds: 10) wait ] in UndefinedObject>>DoIt
  <-- BlockClosure>>on:do:
  <-- BlockClosure>>ensure:
  <-- [ self value. Processor terminateActive ] in BlockClosure>>newProcess


# POSSIBLE FIXES

1. Adding a "Processor yield" like this...
    DelaySpinScheduler>>unschedule: aDelay
[ finishedDelay == nil 
                      ifTrue: [ 
finishedDelay := aDelay. 
timingSemaphore signal.
finishedDelay ifNotNil: [ Processor yield ].
^true.
].
true.
] whileTrue.


2. Change the #signal semantics to immediately activate a signaled process when signalled/signalling priorities are the same. Maybe like this...
    StackInterpreter>>resume: aProcess preemptedYieldingIf: yieldImplicitly
- "Make aProcess runnable and if its priority is higher than that of the current process, preempt the current process. "
+ "Make aProcess runnable and if its priority is the same or higher than that of the current process, preempt the current process."
| activeProc activePriority newPriority |
<inline: false>
activeProc := self activeProcess.
activePriority := self quickFetchInteger: PriorityIndex ofObject: activeProc.
newPriority := self quickFetchInteger: PriorityIndex ofObject: aProcess.
- newPriority <= activePriority ifTrue:
+ newPriority <  activePriority ifTrue:
[self putToSleep: aProcess yieldingIf: true.
^false].
self putToSleep: activeProc yieldingIf: yieldImplicitly.
self transferTo: aProcess.
^true


3. Switching to DelaySemaphoreScheduler which suspends the activeProcess when the transfer variable is not empty.  
This is default in the latest builds.


@Vincent, revising what I said elsewhere that the delay scheduler must be the only process running at highestPriority,
I've realised that was an assumption I'd held without detailed consideration.  Perhaps that is over constrained.
Now understanding better the cause of the freeze, it only affects the spin-scheduler due to its busy-loop in the methods I'd considered only ran at user-priority.
So other processes running at highestPriority is probably okay, as long as they have no busy loops. 

cheers -ben