Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions. When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause: I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so. -------------- In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ? -------------- [1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole cheers -ben P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux). Nicolai Hess wrote:
|
Hi Ben,
On Fri, Jul 25, 2014 at 7:56 AM, Ben Coman <[hidden email]> wrote: --
The TimingSemaphore gets installed in the specialObjectsArray via primSignal: aSemaphore atMilliseconds: aSmallInteger
"Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive."
<primitive: 136> ^self primitiveFailed and from that the VM sets the nextWakeupUsecs:
primitiveSignalAtMilliseconds "Cause the time semaphore, if one has been registered, to be signalled when the microsecond clock is greater than or equal to
the given tick value. A tick value of zero turns off timer interrupts." | msecsObj msecs deltaMsecs sema |
<var: #msecs type: #usqInt> msecsObj := self stackTop. sema := self stackValue: 1.
msecs := self positive32BitValueOf: msecsObj. self successful ifTrue:
[(objectMemory isSemaphoreOop: sema) ifTrue: [objectMemory splObj: TheTimerSemaphore put: sema.
deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). deltaMsecs < 0 ifTrue: [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000). ^self pop: 2]. sema = objectMemory nilObject ifTrue:
[objectMemory storePointer: TheTimerSemaphore ofObject: objectMemory specialObjectsOop
withValue: objectMemory nilObject. nextWakeupUsecs := 0. ^self pop: 2]].
self primitiveFailFor: PrimErrBadArgument
Every time the VM checks for interrupts, which, in the Cog.Stack VMs is controlled by the heartbeat frequency, which defaults to 2 milliseconds, the VM checks if the current time has progressed to or beyond nextWakeupUsecs and signals the timer semaphore if so.
and the problem here is not in the VM. So climb out and breath some fresh air ;-)
Aloha, Eliot
|
In reply to this post by Ben Coman
Hi Ben, I am on Windows too :( So, the fixes does not work (not always) on winddows too. But at least they make it less probable to occure, but it still happens. The most distracting thing is, after the first ui lock, pressing alt+dot, closing the debuggers, pressing alt+dot .... and trying to close the very first debugger, after that, it all works. The UI is responsive again and suspending the process does not block the ui anymore. It "looks like" supsending the process reactivates another process that blocks the UI. And as soon as I terminate this process (alt+dot, close debugger ...) all works. But I really don't know. Nicolai 2014-07-25 16:56 GMT+02:00 Ben Coman <[hidden email]>:
|
Hi Nicolai, Hi Ben,
On Fri, Jul 25, 2014 at 10:55 AM, Nicolai Hess <[hidden email]> wrote: --
if you can run a unix machine (in a VM?) then remember that kill -USR1 pid will cause the VM to print out a stack backtrace of all processes in the image. That can be very useful in debuggng lockups like this.
HTH
Aloha, Eliot
|
In reply to this post by Eliot Miranda-2
Eliot Miranda wrote:
Thanks Eliot. Just so I'm clear... the signals to the TimingSemaphore from the VM depend entirely on the Delays scheduled by the Image? The VM never signals the TimingSemaphore independently?
Yep. Just looking to understand the interaction between VM and image.
Soon :) but for the moment its a puzzle thats got hold of me, like a dog on a bone. This is a "hard" problem for me, and I like hard problems. It provides an opportunity to hold my attention to dig deeper and learn stuff that I otherwise might not. cheers -ben |
On Sat, Jul 26, 2014 at 4:36 PM, Ben Coman <[hidden email]> wrote: --
Right, yes and yes.
Aloha, Eliot
|
In reply to this post by Eliot Miranda-2
2014-07-25 20:05 GMT+02:00 Eliot Miranda <[hidden email]>:
Ok, but I don't know if this helps, at least it does not look very helpful to me:) SIGUSR1 Mon Jul 28 01:06:16 2014 pharo VM version: 3.9-7 #1 Tue May 6 08:30:23 UTC 2014 gcc 4.8.2 [Production ITHB VM] Built from: NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 With: NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 Revision: https://github.com/pharo-project/pharo-vm.git Commit: ef5832e6f70e5b24e8b9b1f4b8509a62b6c88040 Date: 2014-01-26 15:34:28 +0100 By: Esteban Lorenzano <[hidden email]> Jenkins build #14794 Build host: Linux chindi08 2.6.24-32-xen #1 SMP Mon Dec 3 16:12:25 UTC 2012 i686 i686 i686 GNU/Linux plugin path: /usr/lib/pharo-vm/ [default: /usr/lib/pharo-vm/] C stack backtrace: /usr/lib/pharo-vm/pharo-vm[0x809ad23] /usr/lib/pharo-vm/pharo-vm[0x809af6e] [0xf7784410] [0xf7784425] /lib/i386-linux-gnu/libc.so.6(__select+0x2d)[0xf763691d] /usr/lib/pharo-vm/pharo-vm(aioPoll+0x13d)[0x809748d] /usr/lib/pharo-vm/vm-display-X11.so(+0xdc85)[0xf71d2c85] /usr/lib/pharo-vm/pharo-vm(ioRelinquishProcessorForMicroseconds+0x17)[0x8099b57] /usr/lib/pharo-vm/pharo-vm[0x8070685] [0xb6f8dbc3] [0xb6f89700] [0xb7a2650e] [0xb6f895c0] All Smalltalk process stacks (active first): Process 0xb88376fc priority 10 0xff76c830 M ProcessorScheduler class>idleProcess 0xb7306b08: a(n) ProcessorScheduler class 0xff76c850 I [] in ProcessorScheduler class>startUp 0xb7306b08: a(n) ProcessorScheduler class 0xff76c870 I [] in BlockClosure>newProcess 0xb8837620: a(n) BlockClosure Process 0xb8838c78 priority 50 0xff768830 I WeakArray class>finalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768850 I [] in WeakArray class>restartFinalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768870 I [] in BlockClosure>newProcess 0xb8838b9c: a(n) BlockClosure Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>? Process 0xb82f9078 priority 80 0xff765858 M Delay class>handleTimerEvent 0xb8684d08: a(n) Delay class 0xff765870 M Delay class()>? 0xb8684d08: a(n) Delay class 0xb8623474 s [] in Delay class()>? 0xb82f9018 s [] in BlockClosure>newProcess Process 0xb883735c priority 60 0xff76680c M InputEventFetcher>waitForInput 0xb72f059c: a(n) InputEventFetcher 0xff766830 M InputEventFetcher>eventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766850 I [] in InputEventFetcher>installEventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766870 I [] in BlockClosure>newProcess 0xb8837280: a(n) BlockClosure Process 0xb8837534 priority 60 0xb8837568 s SmalltalkImage>lowSpaceWatcher 0xb9127478 s [] in SmalltalkImage>installLowSpaceWatcher 0xb88374d4 s [] in BlockClosure>newProcess Most recent primitives relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: ~ 200 times
|
On Sun, Jul 27, 2014 at 1:14 PM, Nicolai Hess <[hidden email]> wrote:
Then you're not reading it properly. It clearly shows you have a deadlock: 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
best, Eliot
|
Ah, Ok. So, it is not my "misuse" of delays but a bug in Delay>>#schedule, like ben already guessed?
Two processes in the same critical section should not happen, right? Nicolai |
Hi Nicolai,
Aloha, Eliot (phone) |
Free forum by Nabble | Edit this page |