Smalltalk › Squeak › Squeak - Dev

[squeak-dev] DebuggerUnwindBug>>testUnwindDebuggerWithStep

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

NorbertHartl

[squeak-dev] DebuggerUnwindBug>>testUnwindDebuggerWithStep

This test case appears at some point to fail. It succeeds in 3.9.
I narrowed the problem to an update of 3.9.1 with update 7071
(Kernel-sd.151) which introduced it.

The piece of code that triggers it is:

Process>>terminate
...
suspendedContext ifNotNil: [
"Figure out if we are terminating the process while waiting in
Semaphore>>critical: In this case, pop the suspendedContext so that
we leave the ensure: block inside Semaphore>>critical: without
signaling the semaphore."
(inSema == true and:[
suspendedContext method == (
Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
suspendedContext := suspendedContext home.
].
...

I don't really understand the rationale behind doing this but it
seems that it conflicts with the test assumption:

DebuggerUnwindBug>>testUnwindDebuggerWithStep
...
debugger doStep.
"close debugger"
top delete.

"and see if unwind protection worked"
self assert: sema isSignaled.

As I don't really understand what happens and what should happen I would
be glad to hear some words of advice.

Norbert

Andreas.Raab

[squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

Hi Norbert -

Good find. The test is interesting since it illustrates behavior that
can *only* happen during simulation. Unless simulated, the return from
the preceding wait and the activation of the block are atomic so you'd
never be able to get into the spot that this particular test uses. The
test itself was intended to show a somewhat different problem but it
should be made to work since I don't see why one should't be able to
terminate the debugger in this spot and expect things to work. I'll have
to ponder this one a little since there are many implications there.

Cheers,
- Andreas

Norbert Hartl wrote:

> This test case appears at some point to fail. It succeeds in 3.9.
> I narrowed the problem to an update of 3.9.1 with update 7071
> (Kernel-sd.151) which introduced it.
>
> The piece of code that triggers it is:
>
> Process>>terminate
> ...
> suspendedContext ifNotNil: [
> "Figure out if we are terminating the process while waiting in
> Semaphore>>critical: In this case, pop the suspendedContext so that
> we leave the ensure: block inside Semaphore>>critical: without
> signaling the semaphore."
> (inSema == true and:[
> suspendedContext method == (
> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
> suspendedContext := suspendedContext home.
> ].
> ...
>
> I don't really understand the rationale behind doing this but it
> seems that it conflicts with the test assumption:
>
> DebuggerUnwindBug>>testUnwindDebuggerWithStep
> ...
> debugger doStep.
> "close debugger"
> top delete.
>
> "and see if unwind protection worked"
> self assert: sema isSignaled.
>
> As I don't really understand what happens and what should happen I would
> be glad to hear some words of advice.
>
> Norbert
>
>
>

Andreas.Raab

[squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

So having looked more closely into this issue I am by now convinced that
this is a bug in the simulation machinery. The point is that we're
simulating stepping out of a semaphore wait, e.g.,

sema := Semaphore new.
process := [sema wait] forkAt: Processor activePriority + 1.
ctx := process completeStep: process suspendedContext.

At this point we're out of the semaphore wait and consequently the
suspendingList of the process should be nil but it ain't. I've attached
two tests which illustrate the problem.

Cheers,
- Andreas

Andreas Raab wrote:

> Hi Norbert -
>
> Good find. The test is interesting since it illustrates behavior that
> can *only* happen during simulation. Unless simulated, the return from
> the preceding wait and the activation of the block are atomic so you'd
> never be able to get into the spot that this particular test uses. The
> test itself was intended to show a somewhat different problem but it
> should be made to work since I don't see why one should't be able to
> terminate the debugger in this spot and expect things to work. I'll have
> to ponder this one a little since there are many implications there.
>
> Cheers,
> - Andreas
>
> Norbert Hartl wrote:
>> This test case appears at some point to fail. It succeeds in 3.9.
>> I narrowed the problem to an update of 3.9.1 with update 7071
>> (Kernel-sd.151) which introduced it.
>> The piece of code that triggers it is:
>>
>> Process>>terminate
>> ...
>> suspendedContext ifNotNil: [
>> "Figure out if we are terminating the process while waiting in
>> Semaphore>>critical: In this case, pop the suspendedContext so that
>> we leave the ensure: block inside Semaphore>>critical: without
>> signaling the semaphore."
>> (inSema == true and:[
>> suspendedContext method == (
>> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
>> suspendedContext := suspendedContext home.
>> ].
>> ...
>>
>> I don't really understand the rationale behind doing this but it
>> seems that it conflicts with the test assumption:
>>
>> DebuggerUnwindBug>>testUnwindDebuggerWithStep
>> ...
>> debugger doStep.
>> "close debugger"
>> top delete.
>>
>> "and see if unwind protection worked"
>> self assert: sema isSignaled.
>>
>> As I don't really understand what happens and what should happen I would
>> be glad to hear some words of advice.
>>
>> Norbert
>>
>>
>>
>
>
>

SimulationBugs.st (1K) Download Attachment

stephane ducasse

Re: [squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

andreas

when you use simulation you mean the debugger execution?

Stef

On Jun 22, 2008, at 1:33 AM, Andreas Raab wrote:

> So having looked more closely into this issue I am by now convinced
> that this is a bug in the simulation machinery. The point is that
> we're simulating stepping out of a semaphore wait, e.g.,
>
> sema := Semaphore new.
> process := [sema wait] forkAt: Processor activePriority + 1.
> ctx := process completeStep: process suspendedContext.
>
> At this point we're out of the semaphore wait and consequently the
> suspendingList of the process should be nil but it ain't. I've
> attached two tests which illustrate the problem.
>
> Cheers,
> - Andreas
>
>
> Andreas Raab wrote:
>> Hi Norbert -
>> Good find. The test is interesting since it illustrates behavior
>> that can *only* happen during simulation. Unless simulated, the
>> return from the preceding wait and the activation of the block are
>> atomic so you'd never be able to get into the spot that this
>> particular test uses. The test itself was intended to show a
>> somewhat different problem but it should be made to work since I
>> don't see why one should't be able to terminate the debugger in
>> this spot and expect things to work. I'll have to ponder this one a
>> little since there are many implications there.
>> Cheers,
>> - Andreas
>> Norbert Hartl wrote:
>>> This test case appears at some point to fail. It succeeds in 3.9.
>>> I narrowed the problem to an update of 3.9.1 with update 7071
>>> (Kernel-sd.151) which introduced it.
>>> The piece of code that triggers it is:
>>>
>>> Process>>terminate
>>> ...
>>> suspendedContext ifNotNil: [
>>> "Figure out if we are terminating the process while waiting
>>> in Semaphore>>critical: In this case, pop the
>>> suspendedContext so that
>>> we leave the ensure: block inside Semaphore>>critical: without
>>> signaling the semaphore."
>>> (inSema == true and:[
>>> suspendedContext method == (
>>> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
>>> suspendedContext := suspendedContext home.
>>> ].
>>> ...
>>>
>>> I don't really understand the rationale behind doing this but it
>>> seems that it conflicts with the test assumption:
>>>
>>> DebuggerUnwindBug>>testUnwindDebuggerWithStep
>>> ...
>>> debugger doStep.
>>> "close debugger"
>>> top delete.
>>>
>>> "and see if unwind protection worked"
>>> self assert: sema isSignaled.
>>>
>>> As I don't really understand what happens and what should happen I
>>> would
>>> be glad to hear some words of advice.
>>>
>>> Norbert
>>>
>>>
>>>
>
> <SimulationBugs.st>

NorbertHartl

Re: [squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

In reply to this post by Andreas.Raab

On Sat, 2008-06-21 at 16:33 -0700, Andreas Raab wrote:

> So having looked more closely into this issue I am by now convinced that
> this is a bug in the simulation machinery. The point is that we're
> simulating stepping out of a semaphore wait, e.g.,
>
> sema := Semaphore new.
> process := [sema wait] forkAt: Processor activePriority + 1.
> ctx := process completeStep: process suspendedContext.
>
> At this point we're out of the semaphore wait and consequently the
> suspendingList of the process should be nil but it ain't. I've attached
> two tests which illustrate the problem.
>

After completeStep: the suspendingList gets into the
BlockContext>>newProcess context which is created at forkAt: time.
What does this

<primitive: 19> "Simulation guard"

do? Hmmm, looks quite confusing to me as the suspendingList is
only one element in size the whole time. Hmmm...

Anyway my conclusion is that the test in my first post can't work.

Any suggestions which side needs a change?

Norbert

> Cheers,
> - Andreas
>
>
> Andreas Raab wrote:
> > Hi Norbert -
> >
> > Good find. The test is interesting since it illustrates behavior that
> > can *only* happen during simulation. Unless simulated, the return from
> > the preceding wait and the activation of the block are atomic so you'd
> > never be able to get into the spot that this particular test uses. The
> > test itself was intended to show a somewhat different problem but it
> > should be made to work since I don't see why one should't be able to
> > terminate the debugger in this spot and expect things to work. I'll have
> > to ponder this one a little since there are many implications there.
> >
> > Cheers,
> > - Andreas
> >
> > Norbert Hartl wrote:
> >> This test case appears at some point to fail. It succeeds in 3.9.
> >> I narrowed the problem to an update of 3.9.1 with update 7071
> >> (Kernel-sd.151) which introduced it.
> >> The piece of code that triggers it is:
> >>
> >> Process>>terminate
> >> ...
> >> suspendedContext ifNotNil: [
> >> "Figure out if we are terminating the process while waiting in
> >> Semaphore>>critical: In this case, pop the suspendedContext so that
> >> we leave the ensure: block inside Semaphore>>critical: without
> >> signaling the semaphore."
> >> (inSema == true and:[
> >> suspendedContext method == (
> >> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
> >> suspendedContext := suspendedContext home.
> >> ].
> >> ...
> >>
> >> I don't really understand the rationale behind doing this but it
> >> seems that it conflicts with the test assumption:
> >>
> >> DebuggerUnwindBug>>testUnwindDebuggerWithStep
> >> ...
> >> debugger doStep.
> >> "close debugger"
> >> top delete.
> >>
> >> "and see if unwind protection worked"
> >> self assert: sema isSignaled.
> >>
> >> As I don't really understand what happens and what should happen I would
> >> be glad to hear some words of advice.
> >>
> >> Norbert
> >>
> >>
> >>
> >
> >
> >
>

Andreas.Raab

[squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

Norbert Hartl wrote:
> What does this
>
> <primitive: 19> "Simulation guard"
>
> do?

It does nothing. It only indicates that the system will crash if that
code ever gets simulated (usually due to atomicity violations).

> Hmmm, looks quite confusing to me as the suspendingList is
> only one element in size the whole time. Hmmm...

Yeah, indeed. That is interesting. I don't have the time to look at this
right now but it may actually be the solution to the problem. I think
that a strategically placed #suspend in completeStep: may solve this
problem. I'll have to think about this more ...

> Anyway my conclusion is that the test in my first post can't work.

Not sure how you come to this conclusion. The test *doesn't* work but
that indicates that a piece of the system is broken.

> Any suggestions which side needs a change?

Simulating "out of" a semaphore wait is broken. The debugger test is
still valid btw, since it illustrates the behavior in a practical manner.

Cheers,
- Andreas

NorbertHartl

Re: [squeak-dev] Re: DebuggerUnwindBug>>testUnwindDebuggerWithStep

On Mon, 2008-06-30 at 16:10 -0700, Andreas Raab wrote:

> Norbert Hartl wrote:
> > What does this
> >
> > <primitive: 19> "Simulation guard"
> >
> > do?
>
> It does nothing. It only indicates that the system will crash if that
> code ever gets simulated (usually due to atomicity violations).
>
> > Hmmm, looks quite confusing to me as the suspendingList is
> > only one element in size the whole time. Hmmm...
>
> Yeah, indeed. That is interesting. I don't have the time to look at this
> right now but it may actually be the solution to the problem. I think
> that a strategically placed #suspend in completeStep: may solve this
> problem. I'll have to think about this more ...
>

I had another look and I don't understand why the suspendingList has
to be nil after the completeStep:. It is just a step over the wait.
No signal happens nor any other cleanup. Or do you think completeStep:
should somehow detect that it steps over a wait?
> > Anyway my conclusion is that the test in my first post can't work.
>
> Not sure how you come to this conclusion. The test *doesn't* work but
> that indicates that a piece of the system is broken.
>
The test sets up a context with "sema wait". One step further it enters
the critical: context (stepping over sema wait). The next instruction
"top delete" sends terminate to the process. And Process>>terminate
does a "suspendedContext home" which prevents the sema from being
signalled (as the comment says). So the assumption of the test that
the sema has to be signalled at that time is wrong.

Norbert
> > Any suggestions which side needs a change?
>
> Simulating "out of" a semaphore wait is broken. The debugger test is
> still valid btw, since it illustrates the behavior in a practical manner.
>
> Cheers,
> - Andreas
>

Stan Shepherd

Re: [squeak-dev] DebuggerUnwindBug>>testUnwindDebuggerWithStep

In reply to this post by NorbertHartl

Hi, I just ran Squeak3.10.2-7179-basic 'out of the box' and it passed 2253 tests out of 2254. This is the one it fell over on. It seems a shame not to have all tests green. I can't find a Mantis report for this.

There was a mailing list thread about it in June.
http://www.nabble.com/-squeak-dev--DebuggerUnwindBug%3E%3EtestUnwindDebuggerWithStep-td18028890.html#a18210635

Is there a fix? If not I'll log a bug.

...Stan

NorbertHartl wrote

This test case appears at some point to fail. It succeeds in 3.9.
I narrowed the problem to an update of 3.9.1 with update 7071
(Kernel-sd.151) which introduced it.

The piece of code that triggers it is:

Process>>terminate
...
suspendedContext ifNotNil: [
"Figure out if we are terminating the process while waiting in
Semaphore>>critical: In this case, pop the suspendedContext so that
we leave the ensure: block inside Semaphore>>critical: without
signaling the semaphore."
(inSema == true and:[
suspendedContext method == (
Semaphore compiledMethodAt: #critical:) ]) ifTrue:[
suspendedContext := suspendedContext home.
].
...

I don't really understand the rationale behind doing this but it
seems that it conflicts with the test assumption:

DebuggerUnwindBug>>testUnwindDebuggerWithStep
...
debugger doStep.
"close debugger"
top delete.

"and see if unwind protection worked"
self assert: sema isSignaled.

As I don't really understand what happens and what should happen I would
be glad to hear some words of advice.

Norbert