Processor>>activeProcess is wrong when a process is stepped by a forked process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Processor>>activeProcess is wrong when a process is stepped by a forked process

Thomas Dupriez-2

Hello

Some time ago, I've stumbled upon a challenging bug in pharo. I tried some things, but this bug still eludes me. Maybe someone here has an idea?

The bug is that the value of `Processor activeProcess` is wrong inside a process being stepped by a forked process.

In other words, let's say process D is the (frozen) process I am debugging, and its code is to store the active process into some variable with `p := Processor activeProcess`:
- If I step process D normally (with `D step`), then p is correct and worth process D
- If I fork to create a process F that steps process D, then p is incorrect and worth process F

You will find below the code of the two tests I am using to show the bug, as well as a condensed version of my findings so far. If you have any idea or lead as to where this bug could come from, I would be very grateful.

Thomas Dupriez

-----

Here is the code of the failing test, where process F steps process D:

```
testActiveProcessInProcessSteppedInForkedProcess
| s p D done |
    s := Semaphore new. done := false.
    "Create debugged process"
    D := [p := Processor activeProcess. done := true] newProcess name: 'D'; yourself.
    "Until the execution of the debugged process is over, create a forked process to step it"
    [done]
        whileFalse: [
            [debuggedProcess step. s signal] forkNamed: 'F'.
            s wait.
        ].
    self assert: D identicalTo: p
```

And here is the passing test, where we step process D directly:

```
testActiveProcessInProcessDirectlyStepped
| s p D done |
    s := Semaphore new. done := false.
    "Create debugged process"
    D := [p := Processor activeProcess. done := true] newProcess name: 'D'; yourself.
    "Until the execution of the debugged process is over, step it directly"
    [done]
        whileFalse: [
            debuggedProcess step.
        ].
    self assert: D identicalTo: p
```

-----

Here are my findings so far:

The call chain of Process>>step is:
- Process>>step
- which calls Process>>evaluate:onBehalfOf:
- which calls BlockClosure>>ensure:
- which calls BlockClosure>>valueNoContextSwitch

1) Replacing the call to BlockClosure>>valueNoContextSwitch with a call to BlockClosure>>value does not affect the results of the test

2) Since #valueNoContextSwitch is a primitive, it cannot be instrumented easily. I instrumented right before and after it gets called in the code of BlockClosure>>ensure to check the value of active process. No wrong value there, so the problem appears inside the execution of #valueNoContextSwitch, and it disappears before this method call returns.

3) The block being evaluated by #valueNoContextSwitch contains a call to Context>>step, which ultimately calls InstructionStream>>interpretNextV3PlusClosureInstructionFor: (the method that read what the next bytecode is and applies it to the execution it is stepping. I instrumented this method to log the name of the active process, and the context being stepped during the execution of both tests. The log show a difference between the passing and failing test:
- Passing test: the active process is D for a long time, then 'Test execution watch dog" for a bit, and finally, it is "Morphic UI Process". So everything looks in order: the active process is D until the test ends and the UI process takes control back
- Failing test: The logged active process alternates between F and D, and looks like this: (I put some F D patterns in bold for readability) F D F D F D.....F D F F D F F F D F D F D...F D D D D D r M M M M...
"M" is the morphic UI Process, "r" is a seemingly random process whose name is "1006977792" in the log. I also logged the ast nodes being stepped, but I don't really know how to exploit it.

4) I did some experiments by tweaking the tests and changing which process creates D, which process steps F...and had surprising results:
4-1) Original failing test
a

In the original failing test, the test process creates the debugged process and a fork, and the fork steps the debugged process (blue arrow). This test fails.
4-2) Original passing test
a

In the original passing test, the test process creates the debugged process and steps it. This test passes.
4-3) Forked process creates AND steps the debugged process
a
If the forked process is the one to create the debugged process, the test passes!
4-4) Forked process creates the debugged process, and TestProcess steps it
a
So maybe the test passes whenever the debugged process is a descendant of the process stepping it? No, 4-5) shows that it is not necessary.
4-5) A forked process creates the debugged process. Another forked process steps the debugged process
a


Reply | Threaded
Open this post in threaded view
|

Re: Processor>>activeProcess is wrong when a process is stepped by a forked process

Denis Kudriashov
Hi Thomas.

I think the problem is with concurrent code in your test. I do not see what wrong is going on but test passes when full stepping loop is extracted into single process:

 s := Semaphore new. done := false.
    "Create debugged process"
    debuggedProcess := [p := Processor activeProcess. done := true] newProcess name: 'D'; yourself.
    "Until the execution of the debugged process is over, create a forked process to step it"
[     [done]
        whileFalse: [
            debuggedProcess step.
        ].
s signal ] forkNamed: 'F'.
s wait.
debuggedProcess == p

вт, 21 янв. 2020 г. в 15:19, Thomas Dupriez <[hidden email]>:

Hello

Some time ago, I've stumbled upon a challenging bug in pharo. I tried some things, but this bug still eludes me. Maybe someone here has an idea?

The bug is that the value of `Processor activeProcess` is wrong inside a process being stepped by a forked process.

In other words, let's say process D is the (frozen) process I am debugging, and its code is to store the active process into some variable with `p := Processor activeProcess`:
- If I step process D normally (with `D step`), then p is correct and worth process D
- If I fork to create a process F that steps process D, then p is incorrect and worth process F

You will find below the code of the two tests I am using to show the bug, as well as a condensed version of my findings so far. If you have any idea or lead as to where this bug could come from, I would be very grateful.

Thomas Dupriez

-----

Here is the code of the failing test, where process F steps process D:

```
testActiveProcessInProcessSteppedInForkedProcess
| s p D done |
    s := Semaphore new. done := false.
    "Create debugged process"
    D := [p := Processor activeProcess. done := true] newProcess name: 'D'; yourself.
    "Until the execution of the debugged process is over, create a forked process to step it"
    [done]
        whileFalse: [
            [debuggedProcess step. s signal] forkNamed: 'F'.
            s wait.
        ].
    self assert: D identicalTo: p
```

And here is the passing test, where we step process D directly:

```
testActiveProcessInProcessDirectlyStepped
| s p D done |
    s := Semaphore new. done := false.
    "Create debugged process"
    D := [p := Processor activeProcess. done := true] newProcess name: 'D'; yourself.
    "Until the execution of the debugged process is over, step it directly"
    [done]
        whileFalse: [
            debuggedProcess step.
        ].
    self assert: D identicalTo: p
```

-----

Here are my findings so far:

The call chain of Process>>step is:
- Process>>step
- which calls Process>>evaluate:onBehalfOf:
- which calls BlockClosure>>ensure:
- which calls BlockClosure>>valueNoContextSwitch

1) Replacing the call to BlockClosure>>valueNoContextSwitch with a call to BlockClosure>>value does not affect the results of the test

2) Since #valueNoContextSwitch is a primitive, it cannot be instrumented easily. I instrumented right before and after it gets called in the code of BlockClosure>>ensure to check the value of active process. No wrong value there, so the problem appears inside the execution of #valueNoContextSwitch, and it disappears before this method call returns.

3) The block being evaluated by #valueNoContextSwitch contains a call to Context>>step, which ultimately calls InstructionStream>>interpretNextV3PlusClosureInstructionFor: (the method that read what the next bytecode is and applies it to the execution it is stepping. I instrumented this method to log the name of the active process, and the context being stepped during the execution of both tests. The log show a difference between the passing and failing test:
- Passing test: the active process is D for a long time, then 'Test execution watch dog" for a bit, and finally, it is "Morphic UI Process". So everything looks in order: the active process is D until the test ends and the UI process takes control back
- Failing test: The logged active process alternates between F and D, and looks like this: (I put some F D patterns in bold for readability) F D F D F D.....F D F F D F F F D F D F D...F D D D D D r M M M M...
"M" is the morphic UI Process, "r" is a seemingly random process whose name is "1006977792" in the log. I also logged the ast nodes being stepped, but I don't really know how to exploit it.

4) I did some experiments by tweaking the tests and changing which process creates D, which process steps F...and had surprising results:
4-1) Original failing test
a

In the original failing test, the test process creates the debugged process and a fork, and the fork steps the debugged process (blue arrow). This test fails.
4-2) Original passing test
a

In the original passing test, the test process creates the debugged process and steps it. This test passes.
4-3) Forked process creates AND steps the debugged process
a
If the forked process is the one to create the debugged process, the test passes!
4-4) Forked process creates the debugged process, and TestProcess steps it
a
So maybe the test passes whenever the debugged process is a descendant of the process stepping it? No, 4-5) shows that it is not necessary.
4-5) A forked process creates the debugged process. Another forked process steps the debugged process
a