Smalltalk › Squeak › Squeak - Dev

Re: nil in SuspendedDelays???

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

28 messages Options

Cees De Groot

Re: nil in SuspendedDelays???

On 2/1/06, Cees De Groot <[hidden email]> wrote:
> If you look at it, there's one funny thing. I added an #on:do: handler
> around the actual addition to the "SuspendedDelays add:..." bit in
> Delay>>schedule:
>
Not funny at all. Just my stupidity - there are two calls to #add: in
that method, I protected only one.

On IRC, Jecel Assumpção remarks that the Semaphore in both cases is
'in rest' - excessSignals is 1, where 0 would be expected when the
critical section is active. And -1 when two critical sections are
active :-)

Cees De Groot

Re: nil in SuspendedDelays???

On 2/1/06, Cees De Groot <[hidden email]> wrote:
> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is
> 'in rest' - excessSignals is 1, where 0 would be expected when the
> critical section is active. And -1 when two critical sections are
> active :-)
>
More debugging, it turns out that AccessProtect's excessSignals indeed
can become > 1 in the application. Whether that is due to the many
times that we terminate processes inside delays or not, I don't know.
Fact is that in my current image, the application starts up, does a
lot of stuff exercising Delay, and then hits this condition. It must
be some sort of race condition.

On IRC, Craig suggested I move over to the VM level for debugging.
However, I'm not sure how to proceed - I could presumably trap
excessSignals becoming '2' on all semaphores, but only some of them
are used as Mutex so that breakpoint would be called spuriously
(unless someone knows how to formulate a bp that only hits on
semaphores setup as mutex...). And then, the culprit could very well
be the first time the semaphore is signalled, but I can only set a
breakpoint on the second time. Manually checking all the times the BP
is hit is undoable, hundreds of Delays are scheduled before the bug
hits.

Any suggestions on how we could proceed?

Thanks,

Cees

timrowledge

Re: nil in SuspendedDelays???

Try making an Array to put them (the sems you know are going to be
used as mutex) in so GC keeps everything up to date and 'register'
this array. Easiest way to do that would be to stick it in the
specialOops array somewhere, there must be an empty slot. Hell,
nobody uses 40/41 so far as I know, abuse them for a bit. Once you
have this anchor you can follow the oops in the vm easily.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: RDL: Rotate Disk Left

johnmci

Re: nil in SuspendedDelays???

In reply to this post by Cees De Groot

Well if you recall a couple of weeks back I had a changeset that
turns on printing of the stack
in message send, so you can see the stack on each message send by
process.

Mind you'll need to build a VM for this, plus write some code to read
the log file and
determine how the semaphore signalling is done.

Or of course change the VM to note the excessive semaphore signalling
and print the stack when it
happens, and as Tim points out use the Special objects array to track
the semaphore of interest.

On 1-Feb-06, at 3:42 PM, Cees De Groot wrote:

> On 2/1/06, Cees De Groot <[hidden email]> wrote:
>> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is
>> 'in rest' - excessSignals is 1, where 0 would be expected when the
>> critical section is active. And -1 when two critical sections are
>> active :-)
>
>
> Any suggestions on how we could proceed?
>
> Thanks,
>
> Cees
>

--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Cees De Groot

Re: nil in SuspendedDelays???

Thanks John & Tim, for these suggestions. I'll continue debugging
tonight (have to work in VisualAge today, work hard, earn money) and
post as I make progress (or not).

Regards,

Cees

On 2/2/06, John M McIntosh <[hidden email]> wrote:

> Well if you recall a couple of weeks back I had a changeset that
> turns on printing of the stack
> in message send, so you can see the stack on each message send by
> process.
>
> Mind you'll need to build a VM for this, plus write some code to read
> the log file and
> determine how the semaphore signalling is done.
>
> Or of course change the VM to note the excessive semaphore signalling
> and print the stack when it
> happens, and as Tim points out use the Special objects array to track
> the semaphore of interest.
>
>
>
> On 1-Feb-06, at 3:42 PM, Cees De Groot wrote:
>
> > On 2/1/06, Cees De Groot <[hidden email]> wrote:
> >> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is
> >> 'in rest' - excessSignals is 1, where 0 would be expected when the
> >> critical section is active. And -1 when two critical sections are
> >> active :-)
> >
> >
> > Any suggestions on how we could proceed?
> >
> > Thanks,
> >
> > Cees
> >
>
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>
>

Andreas.Raab

Re: nil in SuspendedDelays???

BTW, you wouldn't kill a process *while* it is trying to execute
SortedCollection>>add: would you? Because this might get you a nil in
SuspendedDelays faster than you could say "oops". And I agree with your
analysis that something must be signaling the mutex twice but I will
point out that one of the processes was the wx event loop which may very
well be invoked via unlikely places (if I remember the discussion with
Rob then he had serious problems getting wx to work asynchronously).

Cheers,
- Andreas

Cees De Groot wrote:

> Thanks John & Tim, for these suggestions. I'll continue debugging
> tonight (have to work in VisualAge today, work hard, earn money) and
> post as I make progress (or not).
>
> Regards,
>
> Cees
>
> On 2/2/06, John M McIntosh <[hidden email]> wrote:
>
>>Well if you recall a couple of weeks back I had a changeset that
>>turns on printing of the stack
>>in message send, so you can see the stack on each message send by
>>process.
>>
>>Mind you'll need to build a VM for this, plus write some code to read
>>the log file and
>>determine how the semaphore signalling is done.
>>
>>Or of course change the VM to note the excessive semaphore signalling
>>and print the stack when it
>>happens, and as Tim points out use the Special objects array to track
>>the semaphore of interest.
>>
>>
>>
>>On 1-Feb-06, at 3:42 PM, Cees De Groot wrote:
>>
>>
>>>On 2/1/06, Cees De Groot <[hidden email]> wrote:
>>>
>>>>On IRC, Jecel Assumpção remarks that the Semaphore in both cases is
>>>>'in rest' - excessSignals is 1, where 0 would be expected when the
>>>>critical section is active. And -1 when two critical sections are
>>>>active :-)
>>>
>>>
>>>Any suggestions on how we could proceed?
>>>
>>>Thanks,
>>>
>>>Cees
>>>
>>
>>--
>>========================================================================
>>===
>>John M. McIntosh <[hidden email]> 1-800-477-2659
>>Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
>>========================================================================
>>===
>>
>>
>>
>
>
>

Cees De Groot

Re: nil in SuspendedDelays???

On 2/2/06, Andreas Raab <[hidden email]> wrote:
> BTW, you wouldn't kill a process *while* it is trying to execute
> SortedCollection>>add: would you?

I wouldn't know, of course, but yes, it could happen. Thanks for
pointing it out.

We do lots of network searches, which are sent into the network, and
then a process sleeps for a while and shuts down the search. If the
search gives results, the process is terminated. Conceivably, this
could happen while the process is being scheduled. Unlikely, but
possible.

Shouldn't this whole Delay>>schedule thing maybe be done in an
un-preemteable block? Or is this something we would need to solve on
our application level?

> And I agree with your
> analysis that something must be signaling the mutex twice but I will
> point out that one of the processes was the wx event loop which may very
> well be invoked via unlikely places (if I remember the discussion with
> Rob then he had serious problems getting wx to work asynchronously).
>
Yup. I'm still not 100% convinced that we can't have a situation like
two native threads going into the VM at the same time, but that is
probably mostly ignorance.

Chris Muller

Re: nil in SuspendedDelays???

In reply to this post by Cees De Groot

That such a low-level operation appears to be failing
like this is very disturbing..

> Message: 4
> Date: Wed, 1 Feb 2006 15:06:33 +0100
> From: Cees De Groot <[hidden email]>
> Subject: Re: nil in SuspendedDelays???
> To: The general-purpose Squeak developers list
> <[hidden email]>
> Message-ID:
>
>
<[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Well, I have "proof" now:
>
http://www.cdegroot.com/blog/wp-content/uploads/screenshot.png

> shows a
> screenshot with two debuggers that seem to be inside
> the same critical
> section...
>
> The VM's dump stack results in:
>
> SortedCollection>>add:
> [] in Delay>>schedule {[beingWaitedOn := true.
> resumptionTime := Time
> millisecondClockValue + dela...]}
> [] in Semaphore>>critical: {[[self wait]
> ifCurtailed: [self signal].
> recover := true. result := mutu...]}
> BlockContext>>ifCurtailed:
> Semaphore>>critical:
> Delay>>schedule
> Delay>>wait
> WorldState>>interCyclePause:
> WorldState>>doOneCycleFor:
> PasteUpMorph>>doOneCycle
> [] in Project class>>spawnNewProcess {[[World
> doOneCycle. Processor
> yield. false] whileFalse. nil]}
> [] in BlockContext>>newProcess {[self value.
> Processor terminateActive]}
>
> It *seems* that this only happens in wxSqueak. I
> have SysInternals'
> process explorer running, and the Squeak VM is
> running a bunch of
> threads but I'm not sure what these threads do. But
> apart from the
> guess that something like this could happen when
> multiple native
> threads start executing Squeak code, I'm at a loss
> what other cause
> could be behind this...
>
> HEEELP!!! ;-)