UnixProcess class>>forkSqueak is no longer working. The forked child process VM crashes with segmentation fault. Testing with VMs from bintray shows that version 5.0-202009300634 works, and any version 5.0-202010192227 or later fails. Stack dump sometimes (but not always) shows failure in aioPoll() for example: */usr/local/bin/../lib/squeak/5.0-202101160259/squeak(aioPoll+0x12e)[0x4bc0fe] I am not able to catch the failure in gdb because it happens in the child process. My initial guess is that it may be related to the epoll enhancements added in this time frame, because forking the VM requires initializing things like this in the new child VM process. — |
On Sun, 2021-01-24 at 18:59 -0800, David T Lewis wrote: > > */usr/local/bin/../lib/squeak/5.0- > 202101160259/squeak(aioPoll+0x12e)[0x4bc0fe] > > I am not able to catch the failure in gdb because it happens in the > child process. GDB can follow fork(), see (gdb) help set follow-fork-mode Set debugger response to a program call of fork or vfork. A fork or vfork creates a new process. follow-fork-mode can be: parent - the original process is debugged after a fork child - the new process is debugged after a fork The unfollowed process will continue to run. By default, the debugger will follow the parent process. HTH, Jan |
In reply to this post by David T Lewis
Thank you Jan! — |
In reply to this post by David T Lewis
The segfault happens in the child process that was forked by the forkSqueak prim. It occurs in the new epoll code. I don't yet see the cause (there is no obvious null pointer issue) but the gdb backtrace is: (gdb) bt — |
In reply to this post by David T Lewis
The problem is that the file descriptors and structures are shared between parent and child after fork. However, after the fork, the epoll structures point to data that belongs to the parent. At line 405 the child process tries to access that data, and I think that causes the segfault. — |
In reply to this post by David T Lewis
See it explained at https://copyconstruct.medium.com/the-method-to-epolls-madness-d9d2d6378642 Le sam. 30 janv. 2021 à 23:38, smalltalking <[hidden email]> a écrit : > The problem is that the file descriptors and structures are shared between > parent and child after fork. However, after the fork, the epoll structures > point to data that belongs to the parent. At line 405 the child process > tries to access that data, and I think that causes the segfault. > The child should close the inherited epoll file descriptor and recreate it > along with the necessary data structures. This can be done by a handler > registered with pthread_atfork(). > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/548#issuecomment-770291427>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAFRYIUUQU763FNFDLYNS73S4SC45ANCNFSM4WRCJIUQ> > . > — |
In reply to this post by David T Lewis
Update - The actual forkSqueak is working fine, but we get failures associated with aio handling for the socket connection to the X11 server. The child closes the socket and calls aioDisable for the socket fd to unregister it. When using epoll rather than generic aio event handling, this apparently affects the Linux kernel epoll registration for the socket fd (I am not sure if I understand this correctly, but this appears to be the case). The result seems to be failures in either the child or parent VM process, or both. The problem goes away if I #ifdef the call to aioDisable() in the forgetXDisplay() function. I am not sure if this is a proper fix or just a workaround kludge, but it does work. — |
In reply to this post by David T Lewis
The workaround (fix?) for forkSqueak is in pull request #550 — |
In reply to this post by David T Lewis
I opened a different PR to address the issue as recommended above: — |
Free forum by Nabble | Edit this page |