This test case appears at some point to fail. It succeeds in 3.9.
I narrowed the problem to an update of 3.9.1 with update 7071 (Kernel-sd.151) which introduced it. The piece of code that triggers it is: Process>>terminate ... suspendedContext ifNotNil: [ "Figure out if we are terminating the process while waiting in Semaphore>>critical: In this case, pop the suspendedContext so that we leave the ensure: block inside Semaphore>>critical: without signaling the semaphore." (inSema == true and:[ suspendedContext method == ( Semaphore compiledMethodAt: #critical:) ]) ifTrue:[ suspendedContext := suspendedContext home. ]. ... I don't really understand the rationale behind doing this but it seems that it conflicts with the test assumption: DebuggerUnwindBug>>testUnwindDebuggerWithStep ... debugger doStep. "close debugger" top delete. "and see if unwind protection worked" self assert: sema isSignaled. As I don't really understand what happens and what should happen I would be glad to hear some words of advice. Norbert |
Hi Norbert -
Good find. The test is interesting since it illustrates behavior that can *only* happen during simulation. Unless simulated, the return from the preceding wait and the activation of the block are atomic so you'd never be able to get into the spot that this particular test uses. The test itself was intended to show a somewhat different problem but it should be made to work since I don't see why one should't be able to terminate the debugger in this spot and expect things to work. I'll have to ponder this one a little since there are many implications there. Cheers, - Andreas Norbert Hartl wrote: > This test case appears at some point to fail. It succeeds in 3.9. > I narrowed the problem to an update of 3.9.1 with update 7071 > (Kernel-sd.151) which introduced it. > > The piece of code that triggers it is: > > Process>>terminate > ... > suspendedContext ifNotNil: [ > "Figure out if we are terminating the process while waiting in > Semaphore>>critical: In this case, pop the suspendedContext so that > we leave the ensure: block inside Semaphore>>critical: without > signaling the semaphore." > (inSema == true and:[ > suspendedContext method == ( > Semaphore compiledMethodAt: #critical:) ]) ifTrue:[ > suspendedContext := suspendedContext home. > ]. > ... > > I don't really understand the rationale behind doing this but it > seems that it conflicts with the test assumption: > > DebuggerUnwindBug>>testUnwindDebuggerWithStep > ... > debugger doStep. > "close debugger" > top delete. > > "and see if unwind protection worked" > self assert: sema isSignaled. > > As I don't really understand what happens and what should happen I would > be glad to hear some words of advice. > > Norbert > > > |
So having looked more closely into this issue I am by now convinced that
this is a bug in the simulation machinery. The point is that we're simulating stepping out of a semaphore wait, e.g., sema := Semaphore new. process := [sema wait] forkAt: Processor activePriority + 1. ctx := process completeStep: process suspendedContext. At this point we're out of the semaphore wait and consequently the suspendingList of the process should be nil but it ain't. I've attached two tests which illustrate the problem. Cheers, - Andreas Andreas Raab wrote: > Hi Norbert - > > Good find. The test is interesting since it illustrates behavior that > can *only* happen during simulation. Unless simulated, the return from > the preceding wait and the activation of the block are atomic so you'd > never be able to get into the spot that this particular test uses. The > test itself was intended to show a somewhat different problem but it > should be made to work since I don't see why one should't be able to > terminate the debugger in this spot and expect things to work. I'll have > to ponder this one a little since there are many implications there. > > Cheers, > - Andreas > > Norbert Hartl wrote: >> This test case appears at some point to fail. It succeeds in 3.9. >> I narrowed the problem to an update of 3.9.1 with update 7071 >> (Kernel-sd.151) which introduced it. >> The piece of code that triggers it is: >> >> Process>>terminate >> ... >> suspendedContext ifNotNil: [ >> "Figure out if we are terminating the process while waiting in >> Semaphore>>critical: In this case, pop the suspendedContext so that >> we leave the ensure: block inside Semaphore>>critical: without >> signaling the semaphore." >> (inSema == true and:[ >> suspendedContext method == ( >> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[ >> suspendedContext := suspendedContext home. >> ]. >> ... >> >> I don't really understand the rationale behind doing this but it >> seems that it conflicts with the test assumption: >> >> DebuggerUnwindBug>>testUnwindDebuggerWithStep >> ... >> debugger doStep. >> "close debugger" >> top delete. >> >> "and see if unwind protection worked" >> self assert: sema isSignaled. >> >> As I don't really understand what happens and what should happen I would >> be glad to hear some words of advice. >> >> Norbert >> >> >> > > > SimulationBugs.st (1K) Download Attachment |
andreas
when you use simulation you mean the debugger execution? Stef On Jun 22, 2008, at 1:33 AM, Andreas Raab wrote: > So having looked more closely into this issue I am by now convinced > that this is a bug in the simulation machinery. The point is that > we're simulating stepping out of a semaphore wait, e.g., > > sema := Semaphore new. > process := [sema wait] forkAt: Processor activePriority + 1. > ctx := process completeStep: process suspendedContext. > > At this point we're out of the semaphore wait and consequently the > suspendingList of the process should be nil but it ain't. I've > attached two tests which illustrate the problem. > > Cheers, > - Andreas > > > Andreas Raab wrote: >> Hi Norbert - >> Good find. The test is interesting since it illustrates behavior >> that can *only* happen during simulation. Unless simulated, the >> return from the preceding wait and the activation of the block are >> atomic so you'd never be able to get into the spot that this >> particular test uses. The test itself was intended to show a >> somewhat different problem but it should be made to work since I >> don't see why one should't be able to terminate the debugger in >> this spot and expect things to work. I'll have to ponder this one a >> little since there are many implications there. >> Cheers, >> - Andreas >> Norbert Hartl wrote: >>> This test case appears at some point to fail. It succeeds in 3.9. >>> I narrowed the problem to an update of 3.9.1 with update 7071 >>> (Kernel-sd.151) which introduced it. >>> The piece of code that triggers it is: >>> >>> Process>>terminate >>> ... >>> suspendedContext ifNotNil: [ >>> "Figure out if we are terminating the process while waiting >>> in Semaphore>>critical: In this case, pop the >>> suspendedContext so that >>> we leave the ensure: block inside Semaphore>>critical: without >>> signaling the semaphore." >>> (inSema == true and:[ >>> suspendedContext method == ( >>> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[ >>> suspendedContext := suspendedContext home. >>> ]. >>> ... >>> >>> I don't really understand the rationale behind doing this but it >>> seems that it conflicts with the test assumption: >>> >>> DebuggerUnwindBug>>testUnwindDebuggerWithStep >>> ... >>> debugger doStep. >>> "close debugger" >>> top delete. >>> >>> "and see if unwind protection worked" >>> self assert: sema isSignaled. >>> >>> As I don't really understand what happens and what should happen I >>> would >>> be glad to hear some words of advice. >>> >>> Norbert >>> >>> >>> > > <SimulationBugs.st> |
In reply to this post by Andreas.Raab
On Sat, 2008-06-21 at 16:33 -0700, Andreas Raab wrote:
> So having looked more closely into this issue I am by now convinced that > this is a bug in the simulation machinery. The point is that we're > simulating stepping out of a semaphore wait, e.g., > > sema := Semaphore new. > process := [sema wait] forkAt: Processor activePriority + 1. > ctx := process completeStep: process suspendedContext. > > At this point we're out of the semaphore wait and consequently the > suspendingList of the process should be nil but it ain't. I've attached > two tests which illustrate the problem. > BlockContext>>newProcess context which is created at forkAt: time. What does this <primitive: 19> "Simulation guard" do? Hmmm, looks quite confusing to me as the suspendingList is only one element in size the whole time. Hmmm... Anyway my conclusion is that the test in my first post can't work. Any suggestions which side needs a change? Norbert > Cheers, > - Andreas > > > Andreas Raab wrote: > > Hi Norbert - > > > > Good find. The test is interesting since it illustrates behavior that > > can *only* happen during simulation. Unless simulated, the return from > > the preceding wait and the activation of the block are atomic so you'd > > never be able to get into the spot that this particular test uses. The > > test itself was intended to show a somewhat different problem but it > > should be made to work since I don't see why one should't be able to > > terminate the debugger in this spot and expect things to work. I'll have > > to ponder this one a little since there are many implications there. > > > > Cheers, > > - Andreas > > > > Norbert Hartl wrote: > >> This test case appears at some point to fail. It succeeds in 3.9. > >> I narrowed the problem to an update of 3.9.1 with update 7071 > >> (Kernel-sd.151) which introduced it. > >> The piece of code that triggers it is: > >> > >> Process>>terminate > >> ... > >> suspendedContext ifNotNil: [ > >> "Figure out if we are terminating the process while waiting in > >> Semaphore>>critical: In this case, pop the suspendedContext so that > >> we leave the ensure: block inside Semaphore>>critical: without > >> signaling the semaphore." > >> (inSema == true and:[ > >> suspendedContext method == ( > >> Semaphore compiledMethodAt: #critical:) ]) ifTrue:[ > >> suspendedContext := suspendedContext home. > >> ]. > >> ... > >> > >> I don't really understand the rationale behind doing this but it > >> seems that it conflicts with the test assumption: > >> > >> DebuggerUnwindBug>>testUnwindDebuggerWithStep > >> ... > >> debugger doStep. > >> "close debugger" > >> top delete. > >> > >> "and see if unwind protection worked" > >> self assert: sema isSignaled. > >> > >> As I don't really understand what happens and what should happen I would > >> be glad to hear some words of advice. > >> > >> Norbert > >> > >> > >> > > > > > > > |
Norbert Hartl wrote:
> What does this > > <primitive: 19> "Simulation guard" > > do? It does nothing. It only indicates that the system will crash if that code ever gets simulated (usually due to atomicity violations). > Hmmm, looks quite confusing to me as the suspendingList is > only one element in size the whole time. Hmmm... Yeah, indeed. That is interesting. I don't have the time to look at this right now but it may actually be the solution to the problem. I think that a strategically placed #suspend in completeStep: may solve this problem. I'll have to think about this more ... > Anyway my conclusion is that the test in my first post can't work. Not sure how you come to this conclusion. The test *doesn't* work but that indicates that a piece of the system is broken. > Any suggestions which side needs a change? Simulating "out of" a semaphore wait is broken. The debugger test is still valid btw, since it illustrates the behavior in a practical manner. Cheers, - Andreas |
On Mon, 2008-06-30 at 16:10 -0700, Andreas Raab wrote:
> Norbert Hartl wrote: > > What does this > > > > <primitive: 19> "Simulation guard" > > > > do? > > It does nothing. It only indicates that the system will crash if that > code ever gets simulated (usually due to atomicity violations). > > > Hmmm, looks quite confusing to me as the suspendingList is > > only one element in size the whole time. Hmmm... > > Yeah, indeed. That is interesting. I don't have the time to look at this > right now but it may actually be the solution to the problem. I think > that a strategically placed #suspend in completeStep: may solve this > problem. I'll have to think about this more ... > to be nil after the completeStep:. It is just a step over the wait. No signal happens nor any other cleanup. Or do you think completeStep: should somehow detect that it steps over a wait? > > Anyway my conclusion is that the test in my first post can't work. > > Not sure how you come to this conclusion. The test *doesn't* work but > that indicates that a piece of the system is broken. > The test sets up a context with "sema wait". One step further it enters the critical: context (stepping over sema wait). The next instruction "top delete" sends terminate to the process. And Process>>terminate does a "suspendedContext home" which prevents the sema from being signalled (as the comment says). So the assumption of the test that the sema has to be signalled at that time is wrong. Norbert > > Any suggestions which side needs a change? > > Simulating "out of" a semaphore wait is broken. The debugger test is > still valid btw, since it illustrates the behavior in a practical manner. > > Cheers, > - Andreas > |
In reply to this post by NorbertHartl
Hi, I just ran Squeak3.10.2-7179-basic 'out of the box' and it passed 2253 tests out of 2254. This is the one it fell over on. It seems a shame not to have all tests green. I can't find a Mantis report for this.
There was a mailing list thread about it in June. http://www.nabble.com/-squeak-dev--DebuggerUnwindBug%3E%3EtestUnwindDebuggerWithStep-td18028890.html#a18210635 Is there a fix? If not I'll log a bug. ...Stan
|
Free forum by Nabble | Edit this page |