I am seeing Squeak crash (segment fault) every four and a half days.
(actually 4.586041667 days, 110.065 hours, or 396234 seconds, give or take a few seconds). Doesn't matter if the image is idling or busy (mostly SMTP and Web serving). This has been happening for many months, fully repeatable, on 2 different machines. I've basically lived with it (and it's somewhat dampening effect on my confidence in Squeak for mission critical apps) because I haven't the slightest idea why it's happening, but after my last build, it has gone from 4.58 days to only 1.74 days (albeit based on only one sample so far). The image is 3.7 vintage, 24M given -memory 30M, recent vm (exact versions or any other details of interest on request), running on Debian on a XEN or UML virtual node. (I'm currently trying it on a real box to see if virtualization could somehow be implicated). I can't begin to fathom what event could possibly be happening in the VM that would be tied to this particular time interval, or anything special about the value itself. (I imagine that GC is the code that's running when it crashes). Any thoughts? |
[hidden email] wrote:
> I can't begin to fathom what event could possibly be happening in the > VM that would be tied to this particular time interval, or anything > special about the value itself. (I imagine that GC is the code that's > running when it crashes). Any thoughts? The main thought is that you shouldn't start pointing fingers unless you have at least _some_ evidence supporting your claims. It's easy to claim that it's caused by GC, or the network subsystem, or the timer code, or the OS signal handling or any number of random reasons if you have no idea what is going on. Things to do: - Tell us more about what you are actually running. Is this is a stock 3.7 VM and image? If not what packages have you loaded? Do you have specific dependencies on external (non-squeak) packages? Dependencies on external C libraries that could cause memory corruption? - Run the VM under gdb and let it crash. Try to investigate from there, in particular try to print the call stacks (don't remember what the magic invocation is). The VM implements both, printCallStack() to print the active call stack and printAllStacks() which prints all the call stacks (but I'm not sure which of those is supported in 3.7). Cheers, - Andreas |
In reply to this post by squeakdev
> - Run the VM under gdb and let it crash. Try to investigate from there,
> in particular try to print the call stacks (don't remember what the > magic invocation is). The VM implements both, printCallStack() to print > the active call stack and printAllStacks() which prints all the call > stacks (but I'm not sure which of those is supported in 3.7). It crashed in gdb but after only ~7 hours (so it's not clear this is the same issue). But here is the stack: Program received signal SIGPIPE, Broken pipe. 0xb7e8f2ce in __write_nocancel () from /lib/tls/libc.so.6 gdb>backtrace #0 0xb7e8f2ce in __write_nocancel () from /lib/tls/libc.so.6 #1 0x080e8936 in sqSocketSendDataBufCount (s=0xb5aaa624, buf=0xb5b3c3f4 "Content-Type: image/gif;\r\n\tname=\"orprxh.gif\"\r\nContent-ID: <orprxh.gif@2D967531.ECA1AA47>\r\nContent-Transfer-Encoding: base64\r\n\r\nR0lGODlh+gHYAbMAAAAADoQAAACCAAAAfNNhDXyOxvj56/ALCgcN///2/w", 'A' <repeats 15 times>..., bufSize=2077) at /home/ajr/Squeak-3.9-7/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c:1067 #2 0x080e661c in primitiveSocketSendDataBufCount () at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/intplugins/SocketPlugin/SocketPlugin.c:1046 #3 0x0805bb77 in dispatchFunctionPointer (aFunctionPointer=0x80e64a0) at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:3949 #4 0x08064305 in primitiveExternalCall () at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:14208 #5 0x0805bb77 in dispatchFunctionPointer (aFunctionPointer=0x8064230) at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:3949 #6 0x0806d796 in interpret () at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:7756 #7 0x0805a5c9 in main (argc=1953394499, argv=0x0, envp=0x65707954) at /home/ajr/Squeak-3.9-7/platforms/unix/vm/sqUnixMain.c:1388 |
Ok, assuming the C code is this 3.7 VM code below I'll note that the
backtrace shows some interesting memory addresses > (s=0xb5aaa624, > buf=0xb5b3c3f4 "Content-Type: > image/gif;\r\n\tname=\"orprxh.gif\"\r\nContent-ID: > <orprxh.gif@2D967531.ECA1AA47>\r\nContent-Transfer-Encoding: > base64\r\n\r\nR0lGODlh+gHYAbMAAAAADoQAAACCAAAAfNNhDXyOxvj56/ > ALCgcN///2/w", > 'A' <repeats 15 times>..., bufSize=2077) which implies that buf is 0xb5b3c3f4. However with a 3.7 VM when object memory goes over the 0x80000000 you are hosed because of signed versus unsigned arithmetic issues with in the VM. In fact currently I'll bet this is still an issue with even a 3.8 VM because I can't say I've seen any proof there has been a systematic effort to ensure memory address doesn't accidentally become signed integers somewhere in the VM or platform specific files. This might be the reason for your failures, if the VM loads object memory above the 0x80000000 I would have thought the VM would crash immediately, if below and then grows to expand over the limit then crashing would occcur later. On the other hand I'm not sure why you would get the SIGPIPE failure and why that would cause the VM to crash in libc. That sounds like an operating system problem you should google on. int sqSocketSendDataBufCount(SocketPtr s, int buf, int bufSize) { int nsent= 0; if (!socketValid(s)) return -1; if (UDPSocketType == s->socketType) { /* --- UDP --- */ FPRINTF((stderr, "UDP sendData(%d, %d)\n", SOCKET(s), bufSize)); if ((nsent= sendto(SOCKET(s), (void *)buf, bufSize, 0, (struct sockaddr *)&SOCKETPEER(s), sizeof(SOCKETPEER(s)))) <= 0) { if (errno == EWOULDBLOCK) /* asynchronous write in progress */ return 0; FPRINTF((stderr, "UDP send failed\n")); SOCKETERROR(s)= errno; return 0; } } else { /* --- TCP --- */ FPRINTF((stderr, "TCP sendData(%d, %d)\n", SOCKET(s), bufSize)); if ((nsent= write(SOCKET(s), (char *)buf, bufSize)) <= 0) { if ((nsent == -1) && (errno == EWOULDBLOCK)) { FPRINTF((stderr, "TCP sendData(%d, %d) -> %d [blocked]", SOCKET(s), bufSize, nsent)); return 0; } else { /* error: most likely "connection closed by peer" */ SOCKETSTATE(s)= OtherEndClosed; SOCKETERROR(s)= errno; FPRINTF((stderr, "TCP write failed -> %d", errno)); return 0; } } } /* write completed synchronously */ FPRINTF((stderr, "sendData(%d) done = %d\n", SOCKET(s), nsent)); return nsent; } On 8-Nov-06, at 5:04 AM, [hidden email] wrote: >> - Run the VM under gdb and let it crash. Try to investigate from >> there, >> in particular try to print the call stacks (don't remember what the >> magic invocation is). The VM implements both, printCallStack() to >> the active call stack and printAllStacks() which prints all the call >> stacks (but I'm not sure which of those is supported in 3.7). > > It crashed in gdb but after only ~7 hours (so it's not clear this is > the same issue). But here is the stack: > > Program received signal SIGPIPE, Broken pipe. > 0xb7e8f2ce in __write_nocancel () from /lib/tls/libc.so.6 > > gdb>backtrace > > #0 0xb7e8f2ce in __write_nocancel () from /lib/tls/libc.so.6 > #1 0x080e8936 in sqSocketSendDataBufCount (s=0xb5aaa624, > buf=0xb5b3c3f4 "Content-Type: > image/gif;\r\n\tname=\"orprxh.gif\"\r\nContent-ID: > <orprxh.gif@2D967531.ECA1AA47>\r\nContent-Transfer-Encoding: > base64\r\n\r\nR0lGODlh+gHYAbMAAAAADoQAAACCAAAAfNNhDXyOxvj56/ > ALCgcN///2/w", > 'A' <repeats 15 times>..., bufSize=2077) at > /home/ajr/Squeak-3.9-7/platforms/unix/plugins/SocketPlugin/ > sqUnixSocket.c:1067 > #2 0x080e661c in primitiveSocketSendDataBufCount () > at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/intplugins/ > SocketPlugin/SocketPlugin.c:1046 > #3 0x0805bb77 in dispatchFunctionPointer (aFunctionPointer=0x80e64a0) > at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:3949 > #4 0x08064305 in primitiveExternalCall () at > /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:14208 > #5 0x0805bb77 in dispatchFunctionPointer (aFunctionPointer=0x8064230) > at /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:3949 > #6 0x0806d796 in interpret () at > /home/ajr/Squeak-3.9-7/platforms/unix/src/vm/interp.c:7756 > #7 0x0805a5c9 in main (argc=1953394499, argv=0x0, envp=0x65707954) > at /home/ajr/Squeak-3.9-7/platforms/unix/vm/sqUnixMain.c:1388 > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Free forum by Nabble | Edit this page |