Re: [Vm-dev] Difficult to debug VM crash with full blocks and Sista V1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Difficult to debug VM crash with full blocks and Sista V1

Phil B
It sounds like simulating a terrible (dropping packets) or painfully slow (kbps speeds) network connection might be a more controlled way to reproduce the issue:  https://apple.stackexchange.com/questions/24066/how-to-simulate-slow-internet-connections-on-the-mac

On Fri, Sep 13, 2019 at 11:16 PM Eliot Miranda <[hidden email]> wrote:
 
Hi All,

    there is a VM bug in 64-bit Spur with the Sista V1 bytecode set and full blocks.  The symptom is that when waiting for a remote Monticello repository to update and/or deliver a package version the system crashes in JITTED code after what appears to be some kind of wait.

This is a reliably occurring bug b ut maddeningly difficult to reproduce.  The bug reliably occurs when interacting with a remote rep[ository (e.g. http://source.squeak.org/VMMaker) when the server is "cold", and hence makes the image wait.  Every time I have tried to repeat the failing sequence the crash has not occurre3d, I think because the server is now "hot" and serves up the version quickly.  Today I even tried shutting down my machine for over an hour and rebooting.  But I could not get the crash to occur even though it seems to me that every time I try it the first time in the4 day it does crash.

This is an important bug to fix.  If it cannot be fixed then full blocks and Sista V1 are not ready for use in the upcoming Squeak release.  I am looking for help in debugging this.

- is anyone else uising the 64-bit VM with full blocks and Sista V1 who sees hard VM crashes?  If so, under what circumstances?

- is it possible to flush caches in the http://source.squeak.org/VMMaker server, or could people tolerate me rebooting the server?

- is there a way of introducing network delays in Mac OS that might help me induce the bug?

- can anyone think of any other strategies I might take to try and reproduce this?

I may have to try and reproduce e the bug in the simulator to have a chance of identifying the bug.  Does anyone have a good enough mental model of the Monticello server interaction and have energy to help me figure this one out?

Here is some information from the last crash I did see in the debugger (alas it is incomplete; there are a number of additional pieces of info I could have collected).

(lldb) thr b

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

  * frame #0: 0x000000010de5700a

    frame #1: 0x000000010dd7b174

    frame #2: 0x000000010dd45f1c

    frame #3: 0x000000010dd44534

    frame #4: 0x000000010dd44c60

(lldb) x/10i 0x000000010de5700a


(lldb) call printStackCallStackOf($rbp)

    0x7ffeefbdfc30 M Heap>upHeap: 0x11273ca90: a(n) Heap

    0x7ffeefbdfc68 M Heap>add: 0x11273ca90: a(n) Heap

    0x7ffeefbdfca0 M Delay class>scheduleDelay:from: 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfcf0 M Delay class>handleTimerEvent 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfd20 M Delay class>runTimerEventLoop 0x1123ebfb8: a(n) Delay class


(lldb) x/10i 0x000000010dd7b174

    0x10dd7b174: 48 8b 55 10  movq   0x10(%rbp), %rdx

    0x10dd7b178: 48 89 ec     movq   %rbp, %rsp

    0x10dd7b17b: 5d           popq   %rbp

    0x10dd7b17c: c2 10 00     retq   $0x10

    0x10dd7b17f: cc           int3

    0x10dd7b180: cc           int3

    0x10dd7b181: cc           int3

    0x10dd7b182: cc           int3

    0x10dd7b183: cc           int3

    0x10dd7b184: cc           int3

(lldb) print whereIs(0x000000010dd7b174)

(char *) $0 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010dd7b174)

       0x10dd7afc0 <->        0x10dd7b198: method:        0x112f23c10 selector:        0x112232c20 add:

(lldb) print whereIs(0x000000010de5700a)

(char *) $1 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010de5700a)

       0x10de56ba0 <->        0x10de57078: method:        0x1126ec218 prim 23856 selector:     0x7ffeefbf3d20


this method ends up being the fitted version of Delay class>> startTimerEventLoop
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Difficult to debug VM crash with full blocks and Sista V1

Nicola Mingotti

With high probability that GUI-tool is a wrapper to 'dummynet'. Anyhow, for sure it will be faster to deploy.

n.


On 9/13/19 9:32 PM, Phil B wrote:
It sounds like simulating a terrible (dropping packets) or painfully slow (kbps speeds) network connection might be a more controlled way to reproduce the issue:  https://apple.stackexchange.com/questions/24066/how-to-simulate-slow-internet-connections-on-the-mac

On Fri, Sep 13, 2019 at 11:16 PM Eliot Miranda <[hidden email]> wrote:
 
Hi All,

    there is a VM bug in 64-bit Spur with the Sista V1 bytecode set and full blocks.  The symptom is that when waiting for a remote Monticello repository to update and/or deliver a package version the system crashes in JITTED code after what appears to be some kind of wait.

This is a reliably occurring bug b ut maddeningly difficult to reproduce.  The bug reliably occurs when interacting with a remote rep[ository (e.g. http://source.squeak.org/VMMaker) when the server is "cold", and hence makes the image wait.  Every time I have tried to repeat the failing sequence the crash has not occurre3d, I think because the server is now "hot" and serves up the version quickly.  Today I even tried shutting down my machine for over an hour and rebooting.  But I could not get the crash to occur even though it seems to me that every time I try it the first time in the4 day it does crash.

This is an important bug to fix.  If it cannot be fixed then full blocks and Sista V1 are not ready for use in the upcoming Squeak release.  I am looking for help in debugging this.

- is anyone else uising the 64-bit VM with full blocks and Sista V1 who sees hard VM crashes?  If so, under what circumstances?

- is it possible to flush caches in the http://source.squeak.org/VMMaker server, or could people tolerate me rebooting the server?

- is there a way of introducing network delays in Mac OS that might help me induce the bug?

- can anyone think of any other strategies I might take to try and reproduce this?

I may have to try and reproduce e the bug in the simulator to have a chance of identifying the bug.  Does anyone have a good enough mental model of the Monticello server interaction and have energy to help me figure this one out?

Here is some information from the last crash I did see in the debugger (alas it is incomplete; there are a number of additional pieces of info I could have collected).

(lldb) thr b

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

  * frame #0: 0x000000010de5700a

    frame #1: 0x000000010dd7b174

    frame #2: 0x000000010dd45f1c

    frame #3: 0x000000010dd44534

    frame #4: 0x000000010dd44c60

(lldb) x/10i 0x000000010de5700a


(lldb) call printStackCallStackOf($rbp)

    0x7ffeefbdfc30 M Heap>upHeap: 0x11273ca90: a(n) Heap

    0x7ffeefbdfc68 M Heap>add: 0x11273ca90: a(n) Heap

    0x7ffeefbdfca0 M Delay class>scheduleDelay:from: 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfcf0 M Delay class>handleTimerEvent 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfd20 M Delay class>runTimerEventLoop 0x1123ebfb8: a(n) Delay class


(lldb) x/10i 0x000000010dd7b174

    0x10dd7b174: 48 8b 55 10  movq   0x10(%rbp), %rdx

    0x10dd7b178: 48 89 ec     movq   %rbp, %rsp

    0x10dd7b17b: 5d           popq   %rbp

    0x10dd7b17c: c2 10 00     retq   $0x10

    0x10dd7b17f: cc           int3

    0x10dd7b180: cc           int3

    0x10dd7b181: cc           int3

    0x10dd7b182: cc           int3

    0x10dd7b183: cc           int3

    0x10dd7b184: cc           int3

(lldb) print whereIs(0x000000010dd7b174)

(char *) $0 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010dd7b174)

       0x10dd7afc0 <->        0x10dd7b198: method:        0x112f23c10 selector:        0x112232c20 add:

(lldb) print whereIs(0x000000010de5700a)

(char *) $1 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010de5700a)

       0x10de56ba0 <->        0x10de57078: method:        0x1126ec218 prim 23856 selector:     0x7ffeefbf3d20


this method ends up being the fitted version of Delay class>> startTimerEventLoop
_,,,^..^,,,_
best, Eliot


    



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Difficult to debug VM crash with full blocks and Sista V1

Nicola Mingotti

I gave a super-fast & superficial read to the answers. The GUI tool should exist, I can google it.

The first answer was edited in 2017, the rest, as you say, is very old. E.G. they talk about 'ipfw' which is one of the FreeBSD firewalls and the main entry point to 'dummynet' but, if I 'apropos ipfw' in my macOS I find nothing.

If the GUI works well enough, use that. dummynet is powerful but requires several hours of study end experiments.

bye
n.



On 9/13/19 10:02 PM, Phil B wrote:
I'm sure you're right (assuming the GUI tool still exists... the top answer is quite old.)  Mainly I linked to the answer since it seems to cover a spectrum of approaches which might Eliot might find useful.

On Sat, Sep 14, 2019 at 12:52 AM Nicola Mingotti <[hidden email]> wrote:

With high probability that GUI-tool is a wrapper to 'dummynet'. Anyhow, for sure it will be faster to deploy.

n.


On 9/13/19 9:32 PM, Phil B wrote:
It sounds like simulating a terrible (dropping packets) or painfully slow (kbps speeds) network connection might be a more controlled way to reproduce the issue:  https://apple.stackexchange.com/questions/24066/how-to-simulate-slow-internet-connections-on-the-mac

On Fri, Sep 13, 2019 at 11:16 PM Eliot Miranda <[hidden email]> wrote:
 
Hi All,

    there is a VM bug in 64-bit Spur with the Sista V1 bytecode set and full blocks.  The symptom is that when waiting for a remote Monticello repository to update and/or deliver a package version the system crashes in JITTED code after what appears to be some kind of wait.

This is a reliably occurring bug b ut maddeningly difficult to reproduce.  The bug reliably occurs when interacting with a remote rep[ository (e.g. http://source.squeak.org/VMMaker) when the server is "cold", and hence makes the image wait.  Every time I have tried to repeat the failing sequence the crash has not occurre3d, I think because the server is now "hot" and serves up the version quickly.  Today I even tried shutting down my machine for over an hour and rebooting.  But I could not get the crash to occur even though it seems to me that every time I try it the first time in the4 day it does crash.

This is an important bug to fix.  If it cannot be fixed then full blocks and Sista V1 are not ready for use in the upcoming Squeak release.  I am looking for help in debugging this.

- is anyone else uising the 64-bit VM with full blocks and Sista V1 who sees hard VM crashes?  If so, under what circumstances?

- is it possible to flush caches in the http://source.squeak.org/VMMaker server, or could people tolerate me rebooting the server?

- is there a way of introducing network delays in Mac OS that might help me induce the bug?

- can anyone think of any other strategies I might take to try and reproduce this?

I may have to try and reproduce e the bug in the simulator to have a chance of identifying the bug.  Does anyone have a good enough mental model of the Monticello server interaction and have energy to help me figure this one out?

Here is some information from the last crash I did see in the debugger (alas it is incomplete; there are a number of additional pieces of info I could have collected).

(lldb) thr b

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

  * frame #0: 0x000000010de5700a

    frame #1: 0x000000010dd7b174

    frame #2: 0x000000010dd45f1c

    frame #3: 0x000000010dd44534

    frame #4: 0x000000010dd44c60

(lldb) x/10i 0x000000010de5700a


(lldb) call printStackCallStackOf($rbp)

    0x7ffeefbdfc30 M Heap>upHeap: 0x11273ca90: a(n) Heap

    0x7ffeefbdfc68 M Heap>add: 0x11273ca90: a(n) Heap

    0x7ffeefbdfca0 M Delay class>scheduleDelay:from: 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfcf0 M Delay class>handleTimerEvent 0x1123ebfb8: a(n) Delay class

    0x7ffeefbdfd20 M Delay class>runTimerEventLoop 0x1123ebfb8: a(n) Delay class


(lldb) x/10i 0x000000010dd7b174

    0x10dd7b174: 48 8b 55 10  movq   0x10(%rbp), %rdx

    0x10dd7b178: 48 89 ec     movq   %rbp, %rsp

    0x10dd7b17b: 5d           popq   %rbp

    0x10dd7b17c: c2 10 00     retq   $0x10

    0x10dd7b17f: cc           int3

    0x10dd7b180: cc           int3

    0x10dd7b181: cc           int3

    0x10dd7b182: cc           int3

    0x10dd7b183: cc           int3

    0x10dd7b184: cc           int3

(lldb) print whereIs(0x000000010dd7b174)

(char *) $0 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010dd7b174)

       0x10dd7afc0 <->        0x10dd7b198: method:        0x112f23c10 selector:        0x112232c20 add:

(lldb) print whereIs(0x000000010de5700a)

(char *) $1 = 0x00000001000f83ff " is in generated methods"

(lldb) call printCogMethodFor((void *)0x000000010de5700a)

       0x10de56ba0 <->        0x10de57078: method:        0x1126ec218 prim 23856 selector:     0x7ffeefbf3d20


this method ends up being the fitted version of Delay class>> startTimerEventLoop
_,,,^..^,,,_
best, Eliot