From a message on the Squeak mailing list, by Andreas Raab:
> Some of you (mostly those who run heavy servers) may have noticed that > at times Squeak locks up in mysterious and unforeseen ways. One of those > lockups involves Delay's AccessProtect in an unsignaled state and > consequently the entire image locking up since Delay access is required > in many, many places. > > Today, David presented me an image that was locked up in such a state > but by sheer luck he managed to save it right before it happened which > allowed me to investigate the situation. The result can best be > explained by the little test case shown here: > > "Create mutex unsignaled so we can manually signal it" > mutex := Semaphore new. > "Create a process which will wait inside the mutex" > p := [mutex critical:[]] forkAt: Processor userBackgroundPriority. > "Wait until process has entered mutex" > [p suspendingList == mutex] > whileFalse:[(Delay forMilliseconds: 10) wait]. > "Signal mutex" > mutex signal. > "Kill process" > p terminate. > "and check to see if the mutex is signaled" > mutex isSignaled ifFalse:[self error: 'Mutex not signaled']. > > Note that despite the somewhat complex setup the basic idea is that a > low priority process waiting in a critical section receives a signal on > the semaphore it is waiting on but gets terminated by a higher priority > process inbetween receiving the signal and execution of the process itself. > > This situation (manually executed in the above to make it more easily > repeatable) can happen in many situations where processes get terminated > "from the outside" and it would cause particular grief in the timing > semaphore because it gets served by the highest priority process which > makes the unfortunate cause of events much more likely. > > All Squeak versions that I have access to expose this behavior. Looking at > Semaphore>>critical: which says [this is the code in GNU Smalltalk -pb] > > Semaphore>>critical: aBlock > self wait. > ^aBlock ensure: [self signal]. > > makes it seem as if moving the wait into the ensured block is the > correct answer, but that ain't necessarily so. When we move the wait > into the block we risk that the entering process is terminated after > entering the block but before entering the wait which would leave the > semaphore signaled twice, which is just as bad as not signaled at all. Smalltalk by wrapping the "self wait" in an exception handler. There may be a small problematic window between the end of the #wait and the beginning of ensure There are other problems with Delay, basically involving the Queue left in an half-updated state when the process manipulating it is terminated. The only way to fix this is to move all the processing of the Queue inside the high-priority process that handles timer events. I'll apply a patch to this end after converting the source code to the new syntax (I have the code, but it uses the new syntax and I don't want to have half-converted directories). Paolo 2007-10-01 Paolo Bonzini <[hidden email]> * kernel/RecursionLock.st: Signal waiting semaphore if a process is terminated inside its critical section. * kernel/Semaphore.st: Likewise. --- orig/kernel/RecursionLock.st +++ mod/kernel/RecursionLock.st @@ -99,7 +99,7 @@ critical: aBlock self isOwnerProcess ifTrue: [ ^aBlock value ]. "Look out for race conditions!" - self enter. + [ self enter ] on: ProcessBeingTerminated do: [ :ex | self exit. ex pass ]. ^aBlock ensure: [ self exit ] ! ! --- orig/kernel/Semaphore.st +++ mod/kernel/Semaphore.st @@ -67,7 +67,9 @@ forMutualExclusion critical: aBlock "Wait for the receiver to be free, execute aBlock and signal the receiver again. Return the result of evaluating aBlock." - self wait. + + "Look out for race conditions!" + [ self wait ] on: ProcessBeingTerminated do: [ :ex | self signal. ex pass ]. ^aBlock ensure: [ self signal ] ! ! _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |