[PATCH] Semaphore>>#critical: race condition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Semaphore>>#critical: race condition

Paolo Bonzini
 From a message on the Squeak mailing list, by Andreas Raab:

> Some of you (mostly those who run heavy servers) may have noticed that
> at times Squeak locks up in mysterious and unforeseen ways. One of those
> lockups involves Delay's AccessProtect in an unsignaled state and
> consequently the entire image locking up since Delay access is required
> in many, many places.
>
> Today, David presented me an image that was locked up in such a state
> but by sheer luck he managed to save it right before it happened which
> allowed me to investigate the situation. The result can best be
> explained by the little test case shown here:
>
>    "Create mutex unsignaled so we can manually signal it"
>    mutex := Semaphore new.
>    "Create a process which will wait inside the mutex"
>    p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
>    "Wait until process has entered mutex"
>    [p suspendingList == mutex]
>        whileFalse:[(Delay forMilliseconds: 10) wait].
>    "Signal mutex"
>    mutex signal.
>    "Kill process"
>    p terminate.
>    "and check to see if the mutex is signaled"
>    mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
>
> Note that despite the somewhat complex setup the basic idea is that a
> low priority process waiting in a critical section receives a signal on
> the semaphore it is waiting on but gets terminated by a higher priority
> process inbetween receiving the signal and execution of the process itself.
>
> This situation (manually executed in the above to make it more easily
> repeatable) can happen in many situations where processes get terminated
> "from the outside" and it would cause particular grief in the timing
> semaphore because it gets served by the highest priority process which
> makes the unfortunate cause of events much more likely.
>
> All Squeak versions that I have access to expose this behavior. Looking at
> Semaphore>>critical: which says [this is the code in GNU Smalltalk -pb]
>
> Semaphore>>critical: aBlock
>    self wait.
>    ^aBlock ensure: [self signal].
>
> makes it seem as if moving the wait into the ensured block is the
> correct answer, but that ain't necessarily so. When we move the wait
> into the block we risk that the entering process is terminated after
> entering the block but before entering the wait which would leave the
> semaphore signaled twice, which is just as bad as not signaled at all.
This patch solves, or rather tries to solve :-( this problem for GNU
Smalltalk by wrapping the "self wait" in an exception handler.  There
may be a small problematic window between the end of the #wait and the
beginning of ensure

There are other problems with Delay, basically involving the Queue left
in an half-updated state when the process manipulating it is terminated.
  The only way to fix this is to move all the processing of the Queue
inside the high-priority process that handles timer events.  I'll apply
a patch to this end after converting the source code to the new syntax
(I have the code, but it uses the new syntax and I don't want to have
half-converted directories).

Paolo

2007-10-01  Paolo Bonzini  <[hidden email]>

        * kernel/RecursionLock.st: Signal waiting semaphore if a process is
        terminated inside its critical section.
        * kernel/Semaphore.st: Likewise.


--- orig/kernel/RecursionLock.st
+++ mod/kernel/RecursionLock.st
@@ -99,7 +99,7 @@ critical: aBlock
     self isOwnerProcess ifTrue: [ ^aBlock value ].
 
     "Look out for race conditions!"
-    self enter.
+    [ self enter ] on: ProcessBeingTerminated do: [ :ex | self exit. ex pass ].
     ^aBlock ensure: [ self exit ]
 ! !
 


--- orig/kernel/Semaphore.st
+++ mod/kernel/Semaphore.st
@@ -67,7 +67,9 @@ forMutualExclusion
 critical: aBlock
     "Wait for the receiver to be free, execute aBlock and signal the receiver
      again. Return the result of evaluating aBlock."
-    self wait.
+
+    "Look out for race conditions!"
+    [ self wait ] on: ProcessBeingTerminated do: [ :ex | self signal. ex pass ].
     ^aBlock ensure: [ self signal ]
 ! !
 




_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk