Anyone here interested in this crash? Is there a newer VM I
should test with? Thanks, -Martin -------- Forwarded Message --------
On the current Linux32 Pharo 7.0, (https://get.pharo.org/70+vm as of 30 minutes ago) the VM crashes with the message XIO: fatal IO error 14 (Bad address) on X server ":0" whenever I try to enlarge the Pharo window by dragging the corner, and get to a size of about 2Mpixels. This is with a virgin image with no windows open inside the main window. A workaround seems to be to open a System Browser before resizing the main window. This does not reproduce on Pharo 6.1 (https://get.pharo.org/). This *does* reproduce when running the Pharo7 image on the Pharo 6.1 VM, so it may not be entirely a VM problem. Most times, no crash dump is created, and nothing is logged to PharoDebug.log. Occasionally, especially when rapidly changing the size of the window, it segfaults and logs a dump. This may or may not be the same problem. This is less annoying since I found the workaround, but still seems to be something that should be fixed. Does anyone want further information from me in order to fix it? Regards, -Martin |
Hi Martin,
On Thu, Feb 22, 2018 at 10:45 AM, Martin McClure <[hidden email]> wrote:
Certainly try the most up-to-date Vm you can find. But this looks like some linux-specific, 32-bit specific bug, because no one else is reporting crashes like this. So what I would recommend is that you run from the command line under gdb and hence that you would be able to get a stack trace, and maybe even dig a little further. Such a crash should be due to something obvious, a null pointer, or a buffer overrun. And running under gdb should allow you to narrow in on the bug quite quickly. If the pharo vm is compiled with symbols then use the vm itself, otherwise build your own; you'll need symbols. If and when the bug is easy to reproduce you can switch to the debug vm (again you'll have to build it yourself; but builds these days are easy; checkout, cd, run a build script) and get more information. HTH
_,,,^..^,,,_ best, Eliot |
On 02/24/2018 01:15 PM, Eliot Miranda
wrote:
Hi Martin, Thanks for the hints. I can reproduce the problem on the latest VM (pharo.cog.spur_linux32x86_201802232356.tar.gz). The readily-reproducible problem isn't caught by GDB, the VM just
exits out from under GDB after printing XIO: fatal IO error 14 (Bad address) on X server ":0" But once I did get a segv instead of the usual error and exit. Stack is below. I don't know when I'll have time to build a debug VM, and don't know whether it would help given that the reproducible problem isn't caught by GDB. Any more hints on how to diagnose? Regards, -Martin
Reading symbols from ./pharo...done. |
On 02/25/2018 05:11 PM, Martin McClure wrote: > > I can reproduce the problem on the latest VM > (pharo.cog.spur_linux32x86_201802232356.tar.gz). > > The readily-reproducible problem isn't caught by GDB, the VM just > exits out from under GDB after printing > > XIO: fatal IO error 14 (Bad address) on X server ":0" > > But once I did get a segv instead of the usual error and exit. Stack > is below. I don't know when I'll have time to build a debug VM, and > don't know whether it would help given that the reproducible problem > isn't caught by GDB. > > Any more hints on how to diagnose? > info, the process just prints the error and exits. Also as conjectured, I can not reproduce the problem on a 64-bit VM. Regards, -Martin |
> On Feb 27, 2018, at 8:25 PM, Martin McClure <[hidden email]> wrote: > >> On 02/25/2018 05:11 PM, Martin McClure wrote: >> >> I can reproduce the problem on the latest VM >> (pharo.cog.spur_linux32x86_201802232356.tar.gz). >> >> The readily-reproducible problem isn't caught by GDB, the VM just >> exits out from under GDB after printing >> >> XIO: fatal IO error 14 (Bad address) on X server ":0" >> >> But once I did get a segv instead of the usual error and exit. Stack >> is below. I don't know when I'll have time to build a debug VM, and >> don't know whether it would help given that the reproducible problem >> isn't caught by GDB. >> >> Any more hints on how to diagnose? >> > I built a debug VM, and as expected running under GDB produced no new > info, the process just prints the error and exits. That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash. > > Also as conjectured, I can not reproduce the problem on a 64-bit VM. > > Regards, > > -Martin > |
On 02/28/2018 07:08 AM, Eliot Miranda
wrote:
I built a debug VM, and as expected running under GDB produced no new info, the process just prints the error and exits.That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash. By putting a breakpoint in exit I was able to get the stack below. I hope this gives you a clue as to where to look next. Once again, what I'm doing at the point of failure is dragging the corner of the X window to resize it larger. Regards, -Martin (gdb) break exit |
Hi Martin, I'm sorry, I have no specific ideas as I don't know squeak specifics. But generally speaking, when debugging X11, I ussually do following: 1) run the X client in "synchronous mode", i.e., XSynchronize(True) 2) trace and log requests/responses to/from an X server, I usually use `xtrace`. then, you should be able to pinpoint the exact request that generated the error. Once you know which request it is, you can make an educated guess what XLib function may have generated such a request. Then put a breakpoint in XLib and collect both C and smalltalk backtrace. This makes a good start for the debugging. Laborious indeed. Worked for me couple times. HTH, Jan P.S.: Are you running by chance under XWayland? If so, watch out especially for XGetImage() which does not work under XWayland. But I doubt this is the problem here. On Thu, 2018-03-01 at 22:05 -0800, Martin McClure wrote: > > On 02/28/2018 07:08 AM, Eliot Miranda wrote: > > > I built a debug VM, and as expected running under GDB produced no > > > new > > > info, the process just prints the error and exits. > > > > That's strange. Can you put a breakpoint in write or exit so that > > gdb does stop rather than exit? Martin, if I were trying t debug > > this I would be trying to get the error to occur within gdb said I > > could poke around. I don't know any better way if solving problems > > like this than by first because no able to examine the exception in > > situ. I get that it's frustrating but there's no magic > > bullet. One has the keep trying until one can find out what caused > > the crash. > > > > By putting a breakpoint in exit I was able to get the stack below. I > hope this gives you a clue as to where to look next. Once again, what > I'm doing at the point of failure is dragging the corner of the X > window > to resize it larger. > > Regards, > > -Martin > > (gdb) break exit > Breakpoint 1 at 0x1c2d0 > (gdb) run ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image > Starting program: > /home/martin/Repositories/opensmalltalk- > vm/build.linux32x86/pharo.cog.spur/build.debug/squeak > ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0xf7833b40 (LWP 26198)] > XIO: fatal IO error 14 (Bad address) on X server ":0" > after 2906 requests (2872 known processed) with 0 events > remaining. > > Thread 1 "squeak" hit Breakpoint 1, 0xf7d4b470 in exit () from > /lib32/libc.so.6 > (gdb) where > #0 0xf7d4b470 in exit () from /lib32/libc.so.6 > #1 0xf7950688 in _XDefaultIOError () from /usr/lib32/libX11.so.6 > #2 0xf79508ed in _XIOError () from /usr/lib32/libX11.so.6 > #3 0xf794df16 in _XEventsQueued () from /usr/lib32/libX11.so.6 > #4 0xf793f652 in XPending () from /usr/lib32/libX11.so.6 > #5 0xf7fc0743 in handleEvents () at > /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display- > X11/sqUnixX11.c:3952 > #6 0xf7fc077c in xHandler (fd=0x3, data=0x0, flags=0x2) > at > /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display- > X11/sqUnixX11.c:3964 > #7 0x5663f51c in aioPoll (microSeconds=0x0) at > /home/martin/Repositories/opensmalltalk- > vm/platforms/unix/vm/aio.c:292 > #8 0x5657271d in ioProcessEvents () at > /home/martin/Repositories/opensmalltalk- > vm/platforms/unix/vm/sqUnixMain.c:652 > #9 0x565e9d7f in checkForEventsMayContextSwitch > (mayContextSwitch=0x1) > at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:60739 > #10 0x565f0836 in handleStackOverflowOrEventAllowContextSwitch > (mayContextSwitch=0x1) > at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:63988 > #11 0x56591a1c in activateCoggedNewMethod (inInterpreter=0x0) > at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:14059 > #12 0x56598fc4 in executeNewMethod () at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:17329 > #13 0x56597216 in ceSendsupertonumArgs (selector=0x5758a480, > superNormalBar=0x1, rcvr=0x57b7e788, numArgs=0x0) > at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:16371 > #14 0x5680034a in ?? () > #15 0x5657789d in interpret () at > /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- > cointerp.c:2706 > #16 0x56576175 in main (argc=0x2, argv=0xffffc8c4, envp=0xffffc8d0) > at > /home/martin/Repositories/opensmalltalk- > vm/platforms/unix/vm/sqUnixMain.c:2099 > |
In reply to this post by Martin McClure-2
On 03/02/2018 08:23 AM, Eliot Miranda wrote: >>> That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash. >>> >> By putting a breakpoint in exit I was able to get the stack below. I >> hope this gives you a clue as to where to look next. Once again, what >> I'm doing at the point of failure is dragging the corner of the X >> window to resize it larger. >> > > Great. It's a problem in the X server, not in the VM (even if it's > the VM's fault). So any X11 experts want to weigh in on how to > proceed? I hate to say it but I would at least try restarting the X > server. I'm afraid that the problem does reproduce on a different system, configured similarly but with a slightly newer X server version (xorg-server-1.19.5-r1) that has been rebooted much more recently. -Martin |
In reply to this post by Jan Vrany
On 03/02/2018 12:49 AM, Jan Vrany wrote: > Hi Martin, > > I'm sorry, I have no specific ideas as I don't know squeak specifics. > > But generally speaking, when debugging X11, I ussually do > following: > > 1) run the X client in "synchronous mode", i.e., XSynchronize(True) > 2) trace and log requests/responses to/from an X server, I usually > use `xtrace`. > > then, you should be able to pinpoint the exact request that generated > the error. Once you know which request it is, you can make an educated > guess what XLib function may have generated such a request. Then put a > breakpoint in XLib and collect both C and smalltalk backtrace. > This makes a good start for the debugging. > > Laborious indeed. Worked for me couple times. > > HTH, Jan > > P.S.: Are you running by chance under XWayland? If so, watch out > especially for XGetImage() which does not work under XWayland. > But I doubt this is the problem here. Thanks for the hints, Jan. I'm not sure when I'll have time to dig in that deeply, but I'll try what you suggest if/when I do. I probably *am* running under Wayland -- it's a Gentoo KDE system, and it does seem to have the package kde-plasma/kwayland-integration installed, along with some other Wayland-related packages, so it seems entirely likely that the window manager, which would be the entity that I'm interacting with in dragging the corner of the outer Pharo window, is now written to Wayland. -Martin |
Free forum by Nabble | Edit this page |