Re: [Pharo-dev] Anyone else seen crashes like these ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Anyone else seen crashes like these ?

Eliot Miranda-2
 
Hi Sven,
On Mon, Nov 12, 2018 at 5:37 AM Sven Van Caekenberghe <[hidden email]> wrote:
Hi,

I run Pharo 7 64-bit on a macOS laptop, where the images are kept running across sleep/wake cycles.

For many weeks, it often happens that an image crashes before/after such a sleep/wakeup (not all the time, just regularly).

Here is a crash dump from today (fresh image/vm from WE, nothing special loaded).

Related to scheduling ? Event handling ?

The stack trace points to OpenGL rendering:

2   Pharo                               sigsegv + 155
3   libsystem_platform.dylib            _sigtramp + 29
4   ???                                 0x0 + 0
5   AppleIntelKBLGraphicsGLDriver       glrUpdateTexture + 8979
6   libGPUSupportMercury.dylib          gpusLoadCurrentTextures + 591
7   AppleIntelKBLGraphicsGLDriver       updateDispatch + 663
8   GLEngine                            gleDoDrawDispatchCore + 629
9   GLEngine                            glDrawArraysInstanced_STD_Exec + 264
10  GLEngine                            glDrawArrays_UnpackThread + 40
11  GLEngine                            gleCmdProcessor + 77
12  libdispatch.dylib                   _dispatch_client_callout + 8
13  libdispatch.dylib                   _dispatch_lane_barrier_sync_invoke_and_complete + 60
14  GLEngine                            glFlush_ExecThread + 15
15  Pharo                               -[sqSqueakOSXOpenGLView drawRect:flush:] + 301
16  Pharo                               -[sqSqueakOSXOpenGLView drawRect:] + 43
17  AppKit                              _NSViewDrawRect + 66
18  AppKit                              -[NSView _drawRect:clip:] + 1745
19  AppKit                              -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 1400
20  AppKit                              -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2622
21  AppKit                              -[NSView _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 938
22  AppKit                              -[NSThemeFrame _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 65
23  AppKit                              -[NSView _oldDisplayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] + 1887
24  AppKit                              -[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] + 253
25  AppKit                              -[NSView displayIfNeeded] + 1300
26  AppKit                              -[NSWindow displayIfNeeded] + 283
27  AppKit                              __NSWindowGetDisplayCycleObserverForDisplay_block_invoke + 722
28  AppKit                              NSDisplayCycleObserverInvoke + 170
29  AppKit                              NSDisplayCycleFlush +  1073
30  QuartzCore                          _ZN2CA11Transaction19run_commit_handlersE18CATransactionPhase + 49
31  QuartzCore                          _ZN2CA11Transaction6commitEv + 186
32  AppKit                              __65+[CATransaction(NSCATransaction) NS_setFlushesWithDisplayRefresh]_block_invoke + 274
33  CoreFoundation                      __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
34  CoreFoundation                      __CFRunLoopDoObservers + 452
35  CoreFoundation                      __CFRunLoopRun + 1166
36  CoreFoundation                      CFRunLoopRunSpecific + 463
37  Foundation                          -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 280
38  Pharo                               -[sqSqueakMainApplication(events) pumpRunLoop] + 103
39  Pharo                               -[sqSqueakOSXApplication(events) pumpRunLoop] + 43
40  Pharo                               vmIOProcessEvents + 255

and this is core VM code, so likely an pensmalltalk-vm macOS platform bug, hence best discussed on vm-dev.  Can you describe exactly the steps you take to reproduce the crash?
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Anyone else seen crashes like these ?

Eliot Miranda-2
 
Hi Ben,
On Mon, Nov 12, 2018 at 8:51 AM Ben Coman <[hidden email]> wrote:


On Mon, 12 Nov 2018 at 21:37, Sven Van Caekenberghe <[hidden email]> wrote:
Hi,

I run Pharo 7 64-bit on a macOS laptop, where the images are kept running across sleep/wake cycles.

Just to confirm, you mean your laptop sleep/wake cycle.
On my Window 10 laptop I do the same quite often with no problems.
 
For many weeks, it often happens that an image crashes before/after such a sleep/wakeup (not all the time, just regularly).

Here is a crash dump from today (fresh image/vm from WE, nothing special loaded).

Related to scheduling ? Event handling ?

In trying to understand the last few moments, is the recent primitives list strongly ordered in time?...
  Most recent primitives
  signal
  nowTick
  primSignal:atUTCMicroseconds:
  wait
  millisecondClockValue
  @
  actualScreenSize

Yes, it's a log of the most recent 256 named primitives.  Note that they're irrelevant in this case.  The crash is always rooted in vmIOProcessEvents. i.e. the VM is responding to some input event, and it calls pumpRunLoop to do (Objective-C) [NSRunLoop mainRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantPast], which in Smalltalk would be written NSRunLoop mainRunLoop runMode: NSDefaultRunLoopMode beforeDate: NSDate distantPast.  And somewhere within this a display update occurs which crashes, presumably because we're using stale data that should have been invalidated oil sleep and refreshed on wake.
 


Starting at #actualScreenSize, instrumenting the code by adding...
     Transcript crShow: thisContext sender printString.
indicates the following call chain...
  DisplayScreen-class>>actualScreenSize
  MorphicUIManager>>checkForNewDisplaySize
  DisplayScreen class>>checkForNewScreenSize
  WorldState>>doOneCycleNowFor:
  WorldState>>doOneCycleFor:
  WorldMorph>>doOneCycle
  WorldMorph class>>doOneCycle
  MorphicUIManager>>spawnNewProcess

In detail from a static analysis of the code...
           > WorldState>>doOneCycleNowFor:  
              > DisplayScreen class>>checkForNewScreenSize
                  > MorphicUIManager>>checkForNewDisplaySize 
                      > DisplayScreen-class>>actualScreenSize          PRIMITIVE 
                  < MorphicUIManager>>checkForNewDisplaySize   
               < DisplayScreen class>>checkForNewScreenSize 
            < WorldState>>doOneCycleNowFor
         < WorldState>>doOneCycleFor:  
      < WorldMorph>>doOneCycle  
   < WorldMorph class>>doOneCycle 
< MorphicUIManager>>spawnNewProcess  
   > WorldMorph class>>doOneCycle  
      > WorldMorph>>doOneCycle
         > WorldState>>doOneCycleNowFor:  
            > WorldState>>interCyclePause:
               > Time>>millisecondClockValue       PRIMITIVE
            < WorldState>>interCyclePause:
               > Delay>>schedule
                  > DelaySemaphoreScheduler>>schedule:
                      > Semaphore>>wait                            PRIMITIVE "readyToSchedule variable"
                      > Semaphore>>signal                         PRIMITIVE "timingSemaphore variable"   NOT RECORDED
                                                                                  maybe context change causes this to be recorded later in "Most recent primitives"
                             > DelayMicrosecondTicker>>waitForUserSignalled:orExpired:      waking up from Semaphore>>wait  
                                    the primSignal:atUTCMicroseconds:  PRIMITIVE  immediately before the #wait seems to have been logged now in "Most recent primitives"
                             > DelaySemaphoreScheduler>>scheduleAtTimingPriority
                                > Semaphore>>signal               PRIMITIVE  "readyToSchedule variable"  seems to have not been recorded in "Most recent primitives"
                                > DelayBasicScheduler>>scheduleAtTimingPriority
                             < DelaySemaphoreScheduler>>scheduleAtTimingPriority
                          < DelayBasicScheduler>>runBackendLoopAtTimingPriority
                             > DelayMicrosecondTicker>>nowTick         PRIMITIVE 
                             > Delay>>timingPrioritySignalExpired         
                                > Semaphore>>signal                PRIMITIVE "delaySemaphore variable"      CRASH

 Smalltalk stack dump:
    0x7ffee21d7138 M Delay>timingPrioritySignalExpired 0x113e3b138: a(n) Delay
    0x7ffee21d7170 M [] in DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority 0x11453b8c0: a(n) DelaySemaphoreScheduler
       0x1169f51c8 s BlockClosure>ensure:
       0x1169f5828 s DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority
       0x1169f5c38 s [] in DelaySemaphoreScheduler(DelayBasicScheduler)>startTimerEventLoopPriority:
       0x1169fae10 s [] in BlockClosure>newProcess

The only reason I can think that last semaphore call would be a problem is if variable delaySemaphore was not a Semaphore,
but the C stack backtrace doesn't look like it crashes in the Semaphore handling.
Its perhaps co-incidental with delay scheduling code is invoked fairly often, that its the current process when this crash occurs.


To hep isolate the failure point, I would considering copying #nowTIck to a multiple series like...
#nowTick1
#nowTick2
#nowTick3
etc so that all of the sender sites can be made distinct.
And maybe the same for #signal senders within the delay scheduling code.

One side-curiousity is I can't see where the primitive #@ is being called from.
Does it indicate some loop as its trying to determine 

To me the "vmIOProcessEvents" in the C backtrace with a fairly deep stack into the AppKit framework 
indicates its related to event handling.

cheers -ben


--
_,,,^..^,,,_
best, Eliot