Callback deadlock with Transcript usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Callback deadlock with Transcript usage

Carsten Haerle
I use SessionManager>>trace: or Transcript show: in our code (for debugging
purposes) and also use background processes which gather information about
various object. Now we get deadlocks quite often. One simple method to
reproduce this is to evaluate:

p := [1 to: 100000000000000 do: [:i | Transcript show: i printString; cr] ]
forkAt: 3.

Then try to clear the Transcript => Deadlock. (You can interruped with
Ctrl-Break).

I already spend some hours understanding the problem and it seems that the
is a interlock of callbacks on the VM stack like (time from top to bottom):

1: background process: lock the transcript semaphore and sendMessage to
update the Transscript
2: background process: callback from sendMessage to update the transcript.
3: main process: During the processing of the callback in the background
process in step 2 the main process comes to life because the VM signalled
the input semaphore and it has a higher priority than the background
process. The main process calls DispatchMessage with the wmCommand message
it received.
4: main process: callback from DispatchMessage to process the wmCommand
message.  The processing tries to lock the transcript semaphore in order to
clear the transcript. Since the background process has locked the mutex in
step 1 => MAIN PROCESS BLOCKED.
5: background process: the processing of the callback in step 2 is finished
(actually it was only a default window processing). The process tries to
return but cannot because callback must be returned in the same order as
they occured, i.e. the callback from the main process has to return first
but cannot because it is blocked by the transcript semaphore => BACKGROUND
PROCESS BLOCKED.
6. DEADLOCK

I don't know exactly if this is really the correct explanation because
actually the background process has two open callbacks and unfortunately it
is not possible to see the callback stack in the VM. I tried to lock at the
cookie values they provide, but they seem to have to order (Blair?).

Also what was to understand the problem I tried to understand how the VM
processes callbacks and window messages, but I don't know whether I have the
right picture. For example:

1) How got the main process in step 3 to life. Is it really the VM who
signals the input semaphore and does a hard process switch from the
background process to the main process?
2) What are the entry points from the VM into the Smalltalk system. Is it
only InputState>>wndProc:message:wParam:lParam:cookie: or are they other
callback from the VM?
3) If the VM sets the input semaphore and watches the message queue for
input, why is there also the idle process?

Regards

Carsten


Reply | Threaded
Open this post in threaded view
|

Re: Callback deadlock with Transcript usage

Blair McGlashan
"Carsten Haerle" <[hidden email]> wrote in message
news:burr1s$ojv$02$[hidden email]...
> I use SessionManager>>trace: or Transcript show: in our code (for
debugging
> purposes) and also use background processes which gather information about
> various object. Now we get deadlocks quite often. One simple method to
> reproduce this is to evaluate:
>
> p := [1 to: 100000000000000 do: [:i | Transcript show: i printString;
cr] ]
> forkAt: 3.
>
> Then try to clear the Transcript => Deadlock. (You can interruped with
> Ctrl-Break).
>
> I already spend some hours understanding the problem and it seems that the
> is a interlock of callbacks on the VM stack like (time from top to
bottom):
>

Hmmm yes, I can see how that might happen. The Transcript's current
implementation is infringing the rule of not performing UI updates from a
background process. It should either not be sending windows messages from
inside its mutex, or it should be using an unsubclassed control so that the
callbacks do not transit through Dolphin's message dispatching mechanism.

> 1: background process: lock the transcript semaphore and sendMessage to
> update the Transscript
> 2: background process: callback from sendMessage to update the transcript.
> 3: main process: During the processing of the callback in the background
> process in step 2 the main process comes to life because the VM signalled
> the input semaphore and it has a higher priority than the background
> process. The main process calls DispatchMessage with the wmCommand message
> it received.

Probably, but it might help to know that the VM only signals the input
semaphore when the system appears to be performing CPU intensive processing
such that the image has not checked the Windows message queue for a
pre-defined period. This period is defined in terms of the number of method
activations, and is known as the "sampling interval". See
InputState>>setSamplingInterval:. Sampling the Windows message queue is not
without cost, so the interval has to be reasonably large. However if it is
too large, then the system will become sluggish to respond to UI activity
when a background computation is being performed. The sampling can be turned
off altogether too - read the comment of InputState>>primSampleInterval: for
further information.

> 4: main process: callback from DispatchMessage to process the wmCommand
> message.  The processing tries to lock the transcript semaphore in order
to
> clear the transcript. Since the background process has locked the mutex in
> step 1 => MAIN PROCESS BLOCKED.
> 5: background process: the processing of the callback in step 2 is
finished
> (actually it was only a default window processing). The process tries to
> return but cannot because callback must be returned in the same order as
> they occured, i.e. the callback from the main process has to return first
> but cannot because it is blocked by the transcript semaphore => BACKGROUND
> PROCESS BLOCKED.
> 6. DEADLOCK
>
> I don't know exactly if this is really the correct explanation because
> actually the background process has two open callbacks and unfortunately
it
> is not possible to see the callback stack in the VM. I tried to lock at
the
> cookie values they provide, but they seem to have to order (Blair?).

It sounds like a reasonable explanation to me. The callbacks must be exited
in order, and the VM guarantees this by queueing up attempts by the image to
return out of order. This is the solution that Dolphin adopts to running
multiple green threads within a single native thread. A less satisfactory
alternative is to dequeue most incoming Windows messages and place them into
a separate queue managed either in the VM, or in Smalltalk. From the point
of view of the host OS, this means that those messages are effectively
handled asynchronously, and so it is only appropriate for cases where this
doesn't upset either Windows or the controls and where the return value is
ignored. Other cases (i.e. those that must be handled synchronously) have to
be handled by disabling process switching while the message is handled. This
approach is unsatisfactory because: 1) There aren't many cases where the
Windows messages can really be handled asynchronously, and even where this
appears to be the case there is potential for timing related issues to cause
unexpected behaviour or instability in either Windows or the controls, and
2) Disabling process switching means that, in effect, it is not possible to
debug through many Windows calls, including any inbound COM calls or other
forms of synchronous callback.

>
> Also what was to understand the problem I tried to understand how the VM
> processes callbacks and window messages, but I don't know whether I have
the
> right picture. For example:
>
> 1) How got the main process in step 3 to life. Is it really the VM who
> signals the input semaphore and does a hard process switch from the
> background process to the main process?

Probably (as a result of the sampling mechanism I described above), although
more normally it is signalled from the idle process, although there are
various other places it can be signalled from which you can easily find by
browsing from InputState.

> 2) What are the entry points from the VM into the Smalltalk system. Is it
> only InputState>>wndProc:message:wParam:lParam:cookie: or are they other
> callback from the VM?

No, there are a few others. Browse the 'vm entry points' category to see
them all.

> 3) If the VM sets the input semaphore and watches the message queue for
> input, why is there also the idle process?

Normally it is the idle process which detects new input - it is responsible
for quiescing the system in a Windows-friendly manner (by calling
MsgWaitForMultipleObjects). This means the system is, by preference, event
driven and sleeps except when processing input from the Windows message
queue. However a mechanism to cheaply poll the queue is also needed for the
case where background processing is consuming CPU time, and this is the
sampling mechanism I described above. Unfortunately it is not possible for
one Win32 thread to wait for input on another thread's message queue, as if
it were this would be a better way for the VM to detect available input.

Regards

Blair