[BUG] Mysterious Delay lockups

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[BUG] Mysterious Delay lockups

Andreas.Raab
Hi Folks -

Some of you (mostly those who run heavy servers) may have noticed that
at times Squeak locks up in mysterious and unforeseen ways. One of those
lockups involves Delay's AccessProtect in an unsignaled state and
consequently the entire image locking up since Delay access is required
in many, many places.

Today, David presented me an image that was locked up in such a state
but by sheer luck he managed to save it right before it happened which
allowed me to investigate the situation. The result can best be
explained by the little test case shown here:

   "Create mutex unsignaled so we can manually signal it"
   mutex := Semaphore new.
   "Create a process which will wait inside the mutex"
   p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
   "Wait until process has entered mutex"
   [p suspendingList == mutex]
       whileFalse:[(Delay forMilliseconds: 10) wait].
   "Signal mutex"
   mutex signal.
   "Kill process"
   p terminate.
   "and check to see if the mutex is signaled"
   mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].

Note that despite the somewhat complex setup the basic idea is that a
low priority process waiting in a critical section receives a signal on
the semaphore it is waiting on but gets terminated by a higher priority
process inbetween receiving the signal and execution of the process itself.

This situation (manually executed in the above to make it more easily
repeatable) can happen in many situations where processes get terminated
"from the outside" and it would cause particular grief in the timing
semaphore because it gets served by the highest priority process which
makes the unfortunate cause of events much more likely.

All Squeak versions that I have access to expose this behavior. Looking
at Semaphore>>critical: which says

Semaphore>>critical: aBlock
   self wait.
   [blockValue := aBlock value] ensure: [self signal].

makes it seem as if moving the wait into the ensured block is the
correct answer, but that ain't necessarily so. When we move the wait
into the block we risk that the entering process is terminated after
entering the block but before entering the wait which would leave the
semaphore signaled twice, which is just as bad as not signaled at all.

Methinks a solution would involve Process>>terminate but I'm running out
of steam after trying to understand the problem in all its implications.
Any ideas would be greatly welcome.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: [BUG] Mysterious Delay lockups

J J-6
Did anything ever happen with this?  I didn't see anything else get posted
here, or anything when I google with site: pointed at squeak archives.

>From: Andreas Raab <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>, Squeak Virtual Machine
>Development Discussion<[hidden email]>
>Subject: [BUG] Mysterious Delay lockups
>Date: Tue, 17 Apr 2007 19:20:28 -0700
>
>Hi Folks -
>
>Some of you (mostly those who run heavy servers) may have noticed that at
>times Squeak locks up in mysterious and unforeseen ways. One of those
>lockups involves Delay's AccessProtect in an unsignaled state and
>consequently the entire image locking up since Delay access is required in
>many, many places.
>
>Today, David presented me an image that was locked up in such a state but
>by sheer luck he managed to save it right before it happened which allowed
>me to investigate the situation. The result can best be explained by the
>little test case shown here:
>
>   "Create mutex unsignaled so we can manually signal it"
>   mutex := Semaphore new.
>   "Create a process which will wait inside the mutex"
>   p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
>   "Wait until process has entered mutex"
>   [p suspendingList == mutex]
>       whileFalse:[(Delay forMilliseconds: 10) wait].
>   "Signal mutex"
>   mutex signal.
>   "Kill process"
>   p terminate.
>   "and check to see if the mutex is signaled"
>   mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
>
>Note that despite the somewhat complex setup the basic idea is that a low
>priority process waiting in a critical section receives a signal on the
>semaphore it is waiting on but gets terminated by a higher priority process
>inbetween receiving the signal and execution of the process itself.
>
>This situation (manually executed in the above to make it more easily
>repeatable) can happen in many situations where processes get terminated
>"from the outside" and it would cause particular grief in the timing
>semaphore because it gets served by the highest priority process which
>makes the unfortunate cause of events much more likely.
>
>All Squeak versions that I have access to expose this behavior. Looking at
>Semaphore>>critical: which says
>
>Semaphore>>critical: aBlock
>   self wait.
>   [blockValue := aBlock value] ensure: [self signal].
>
>makes it seem as if moving the wait into the ensured block is the correct
>answer, but that ain't necessarily so. When we move the wait into the block
>we risk that the entering process is terminated after entering the block
>but before entering the wait which would leave the semaphore signaled
>twice, which is just as bad as not signaled at all.
>
>Methinks a solution would involve Process>>terminate but I'm running out of
>steam after trying to understand the problem in all its implications. Any
>ideas would be greatly welcome.
>
>Cheers,
>   - Andreas
>

_________________________________________________________________
Get a FREE Web site, company branded e-mail and more from Microsoft Office
Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/