Hello
Some time ago, I've stumbled upon a challenging bug in pharo. I
tried some things, but this bug still eludes me. Maybe someone
here has an idea?
The bug is that the value of `Processor activeProcess` is wrong
inside a process being stepped by a forked process.
In other words, let's say process D is the (frozen) process I am
debugging, and its code is to store the active process into some
variable with `p := Processor activeProcess`:
- If I step process D normally (with `D step`), then p is correct
and worth process D
- If I fork to create a process F that steps process D, then p is
incorrect and worth process F
You will find below the code of the two tests I am using to show
the bug, as well as a condensed version of my findings so far. If
you have any idea or lead as to where this bug could come from, I
would be very grateful.
Thomas Dupriez
-----
Here is the code of the failing test, where process F steps
process D:
```
testActiveProcessInProcessSteppedInForkedProcess
| s p D done |
s := Semaphore new. done := false.
"Create debugged process"
D := [p := Processor activeProcess. done := true]
newProcess name: 'D'; yourself.
"Until the execution of the debugged process is over,
create a forked process to step it"
[done]
whileFalse: [
[debuggedProcess step. s signal] forkNamed:
'F'.
s wait.
].
self assert: D identicalTo: p
```
And here is the passing test, where we step process D directly:
```
testActiveProcessInProcessDirectlyStepped
| s p D done |
s := Semaphore new. done := false.
"Create debugged process"
D := [p := Processor activeProcess. done := true]
newProcess name: 'D'; yourself.
"Until the execution of the debugged process is over,
step it directly"
[done]
whileFalse: [
debuggedProcess step.
].
self assert: D identicalTo: p
```
-----
Here are my findings so far:
The call chain of Process>>step is:
- Process>>step
- which calls Process>>evaluate:onBehalfOf:
- which calls BlockClosure>>ensure:
- which calls BlockClosure>>valueNoContextSwitch
1) Replacing the call to BlockClosure>>valueNoContextSwitch
with a call to BlockClosure>>value does not affect the
results of the test
2) Since #valueNoContextSwitch is a primitive, it cannot be
instrumented easily. I instrumented right before and after it gets
called in the code of BlockClosure>>ensure to check the
value of active process. No wrong value there, so the problem
appears inside the execution of #valueNoContextSwitch, and it
disappears before this method call returns.
3) The block being evaluated by #valueNoContextSwitch contains a
call to Context>>step, which ultimately calls
InstructionStream>>interpretNextV3PlusClosureInstructionFor:
(the method that read what the next bytecode is and applies it to
the execution it is stepping. I instrumented this method to log
the name of the active process, and the context being stepped
during the execution of both tests. The log show a difference
between the passing and failing test:
- Passing test: the active process is D for a long time, then
'Test execution watch dog" for a bit, and finally, it is "Morphic
UI Process". So everything looks in order: the active process is D
until the test ends and the UI process takes control back
- Failing test: The logged active process alternates between F and
D, and looks like this: (I put some F D patterns in bold for
readability) F D F D F D.....F D F F D F F
F D F D F D...F D D D D D r M M M M...
"M" is the morphic UI Process, "r" is a seemingly random process
whose name is "1006977792" in the log. I also logged the ast nodes
being stepped, but I don't really know how to exploit it.
4) I did some experiments by tweaking the tests and changing
which process creates D, which process steps F...and had
surprising results:
4-1) Original failing test
In the original failing test, the test process creates the
debugged process and a fork, and the fork steps the debugged
process (blue arrow). This test fails.
4-2) Original passing test
In the original passing test, the test process creates the
debugged process and steps it. This test passes.
4-3) Forked process creates AND steps the debugged process
If the forked process is the one to create the debugged process,
the test passes!
4-4) Forked process creates the debugged process, and
TestProcess steps it
So maybe the test passes whenever the debugged process is a
descendant of the process stepping it? No, 4-5) shows that it is
not necessary.
4-5) A forked process creates the debugged process. Another
forked process steps the debugged process