I'm have some problems getting OSProcess 4.3 to run reliably, I'm using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These are the latest versions availible via SqueakMap. Most of my testing was done in my standard development image built on the Squeak 3.9 developer image using an Exupery VM. Exupery was loaded but not running. I'm trying to get Exupery's stress test to pass which runs almost all the tests in the image 3 times. Up until now the OSProcess tests have been useful for flushing out context switching bugs. Any idea how to build a VM with OSProcess that will run 3.9 images reliably? With both AioPlugin and OSProcessPlugin installed running: OSPipeTestCase buildSuite run May lock up the image consuming 100% CPU. This only happens when AioPlugin has been built and installed. It doesn't always happen. Cannot find new threads: generic error ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag Hitting Alt-. sometimes locks up the image after bringing up a notifier. When the image has locked up interrupting with gdb shows: (gdb) p printCallStack () 1715565776 >idleProcess 1715534844 [] in >startUp 1715534936 [] in BlockContext>newProcess $5 = 10 (gdb) where #0 0xffffe405 in __kernel_vsyscall () #1 0xf7f0d08d in select () from /lib/libc.so.6 #2 0x0806ab6c in aioPoll (microSeconds=96000) at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226 #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds ( microSeconds=96000) at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-X11/sqUnixX11.c:2304 #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000) at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:477 #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:19142 #6 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805bde0) at gnu-interp.c:4093 #7 0x08066c57 in interpret () at gnu-interp.c:9080 #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0) at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 Running: 300 timesRepeat: [CommandShellTestCase run: #testPipeline73] Causes the image to lock up consuming 0% CPU. This happens even when AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image with just OSProcess loaded running the latest Linux VM from squeak.org. My Squeak images normally idle consuming a few percent CPU due to polling. Cannot find new threads: generic error ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID 0xffffe405 in __kernel_vsyscall () (gdb) where #0 0xffffe405 in __kernel_vsyscall () #1 0xf7e39143 in read () from /lib/libc.so.6 #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6 #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6 #4 0xf7dee3cd in __underflow () from /lib/libc.so.6 #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6 #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6 #7 0xf7de19f0 in fread () from /lib/libc.so.6 #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1, byteArrayIndex=0x66412814 "", startIndex=0) at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247 #9 0xf7bb12a4 in primitiveFileRead () at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/FilePlugin.c:641 #10 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0xf7bb1160) at gnu-interp.c:4093 #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540 #12 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805c240) at gnu-interp.c:4093 #13 0x08066c57 in interpret () at gnu-interp.c:9080 #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0) at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 (gdb) p printCallStack () 1715621928 StandardFileStream>basicNext 1715621836 StandardFileStream>next 1715622132 [] in OSPipe>next 1715621628 BlockContext>on:do: 1715621536 OSPipe>next 1715622040 [] in OSPipe>next: 1715621444 Interval>do: 1715621352 OSPipe>next: 1715621720 [] in OSPipe>upToEnd 1715621236 BlockContext>repeat 1715621144 OSPipe>upToEnd 1715620776 [] in PipeableEvaluator>? 1715620636 PipeableEvaluator>blockValue 1715620868 [] in PipeableEvaluator>value 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor: 1715620524 BlockContext>on:do: 1715620432 PipeableEvaluator>handleRuntimeErrorFor: 1715565264 PipeableEvaluator>value 1715550668 CommandShell>pipeProxy:toCommandList: 1715423904 CommandShell>pipeline: 1715423788 CommandShellTestCase>testPipeline73 Here it looks like the primitive is blocking waiting for input that doesn't come even though it looks to me like it's blocking on a non-blocking file. Bryce |
Hi Bryce,
I'm not sure what the problem is but here are some tips that may help. The undefined references to primitiveGetThreadID and primitiveTestEndOfFileFlag are references to primitives that exist in a new version of OSPP that has not been released on SqueakMap. I do not think they are directly related to your problem. You can find new versions of OSPP on SqueakSource http://kilana.unibe.ch:8888/OSProcessPlugin, but you should treat this as experimental because it includes my attempt to implement signal handling properly in a pthread environment. I do *not* know if I've gotten this right (I'm testing on Linux with a single threaded VM). On the other hand, if you find any reason to think that your problem is related to pthreads , you may want to try the newer version and see if it helps. That said, that actual lockup appears to be happening in AioPlugin, which invokes the aio functions in the VM. The aio functions depend on select(), and it looks like this is where you are hanging up. I certainly can't think of anything in the 3.9 image (versus 3.8) that could be triggering it, so it seems more likely to be something in the VM, either the AIO plugin or the underlying aio functions. On the other hand, maybe the VM is legitimately hanging up on a blocking read. There are blocking reads in my Aio unit tests for pipes, so this is certainly possible. Also, the unit tests in OSProcess 4.3 have changed to do more testing of pipes, so this may be where the problem lies. If a Squeak VM hangs up on a blocking read, it will be solidly wedged until it gets some input. On a Linux system, I can break this condition loose by finding the pid of the Squeak process, going to /proc/<pid>/fd, and force-feeding data into the file descriptor files until something breaks loose. If your system (which OS is it?) has a similar proc file system, you may be able to do this also. If that is the source of the problem, you will find the Squeak resumes right where it left off, and you may be able to determine which unit test method is running and causing the problem. I'm short of time until the weekend, but I'll see if I can reproduce the problem on my end. Dave On Mon, Jan 29, 2007 at 10:45:57PM +0000, [hidden email] wrote: > > I'm have some problems getting OSProcess 4.3 to run reliably, I'm > using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These are the > latest versions availible via SqueakMap. > > Most of my testing was done in my standard development image built on > the Squeak 3.9 developer image using an Exupery VM. Exupery was loaded > but not running. I'm trying to get Exupery's stress test to pass which > runs almost all the tests in the image 3 times. Up until now the > OSProcess tests have been useful for flushing out context switching > bugs. > > Any idea how to build a VM with OSProcess that will run 3.9 images > reliably? > > > With both AioPlugin and OSProcessPlugin installed running: > > OSPipeTestCase buildSuite run > > May lock up the image consuming 100% CPU. This only happens when > AioPlugin has been built and installed. It doesn't always happen. > > Cannot find new threads: generic error > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > Hitting Alt-. sometimes locks up the image after bringing up a > notifier. > > When the image has locked up interrupting with gdb shows: > (gdb) p printCallStack () > 1715565776 >idleProcess > 1715534844 [] in >startUp > 1715534936 [] in BlockContext>newProcess > $5 = 10 > (gdb) where > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7f0d08d in select () from /lib/libc.so.6 > #2 0x0806ab6c in aioPoll (microSeconds=96000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226 > #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds ( > microSeconds=96000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-X11/sqUnixX11.c:2304 > #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:477 > #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:19142 > #6 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805bde0) > at gnu-interp.c:4093 > #7 0x08066c57 in interpret () at gnu-interp.c:9080 > #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0) > at > /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > > Running: > > 300 timesRepeat: [CommandShellTestCase run: #testPipeline73] > > Causes the image to lock up consuming 0% CPU. This happens even when > AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image > with just OSProcess loaded running the latest Linux VM from > squeak.org. My Squeak images normally idle consuming a few percent CPU > due to polling. > > Cannot find new threads: generic error > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > 0xffffe405 in __kernel_vsyscall () > (gdb) where > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7e39143 in read () from /lib/libc.so.6 > #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6 > #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6 > #4 0xf7dee3cd in __underflow () from /lib/libc.so.6 > #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6 > #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6 > #7 0xf7de19f0 in fread () from /lib/libc.so.6 > #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1, > byteArrayIndex=0x66412814 "", startIndex=0) > at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247 > #9 0xf7bb12a4 in primitiveFileRead () > at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/FilePlugin.c:641 > #10 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0xf7bb1160) > at gnu-interp.c:4093 > #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540 > #12 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805c240) > at gnu-interp.c:4093 > #13 0x08066c57 in interpret () at gnu-interp.c:9080 > #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > (gdb) p printCallStack () > 1715621928 StandardFileStream>basicNext > 1715621836 StandardFileStream>next > 1715622132 [] in OSPipe>next > 1715621628 BlockContext>on:do: > 1715621536 OSPipe>next > 1715622040 [] in OSPipe>next: > 1715621444 Interval>do: > 1715621352 OSPipe>next: > 1715621720 [] in OSPipe>upToEnd > 1715621236 BlockContext>repeat > 1715621144 OSPipe>upToEnd > 1715620776 [] in PipeableEvaluator>? > 1715620636 PipeableEvaluator>blockValue > 1715620868 [] in PipeableEvaluator>value > 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor: > 1715620524 BlockContext>on:do: > 1715620432 PipeableEvaluator>handleRuntimeErrorFor: > 1715565264 PipeableEvaluator>value > 1715550668 CommandShell>pipeProxy:toCommandList: > 1715423904 CommandShell>pipeline: > 1715423788 CommandShellTestCase>testPipeline73 > > Here it looks like the primitive is blocking waiting for input that > doesn't come even though it looks to me like it's blocking on a > non-blocking file. > > Bryce |
On Jan 30, 2007, at 4:27 AM, David T. Lewis wrote: > That said, that actual lockup appears to be happening in AioPlugin, > which invokes the aio functions in the VM. The aio functions depend > on select(), and it looks like this is where you are hanging up. > I certainly can't think of anything in the 3.9 image (versus 3.8) > that could be triggering it, so it seems more likely to be something > in the VM, either the AIO plugin or the underlying aio functions. I noted in the 3.9 unix source code has HAVE_NANOSLEEP defined and when the VM sleeps we call nanosleep() then aioPoll twice with the value of zero. But then aioPoll() returns early since the value is zero and it never invokes select(). Perhaps some unix folks can comment if there are any interesting side effects of this behaviour, such as never calling select() I'll note issues with MC servers based on 3.9 where they hang, and a swirl of a mouse on a VNC display of their X-11 display restores functionality. int aioSleep(int microSeconds) { #if defined(HAVE_NANOSLEEP) if (microSeconds < (1000000/60)) /* < 1 timeslice? */ { if (!aioPoll(0)) { struct timespec rqtp= { 0, microSeconds * 1000 }; struct timespec rmtp; nanosleep(&rqtp, &rmtp); /* EINTR here, but likely we want to wake up? */ microSeconds= 0; /* poll but don't block */ } } #endif return aioPoll(microSeconds); } /* answer whether i/o becomes possible within the given number of microSeconds */ int aioPoll(int microSeconds) { int fd, ms; fd_set rd, wr, ex; FPRINTF((stderr, "aioPoll(%d)\n", microSeconds)); DO_TICK(); /* get out early if there is no pending i/o and no need to relinquish cpu */ if ((maxFd == 0) && (microSeconds == 0)) return 0; rd= rdMask; wr= wrMask; ex= exMask; ms= ioMSecs(); for (;;) { struct timeval tv; int n, now; tv.tv_sec= microSeconds / 1000000; tv.tv_usec= microSeconds % 1000000; n= select(maxFd, &rd, &wr, &ex, &tv); -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by David T. Lewis
2007/1/30, David T. Lewis <[hidden email]>:
> Hi Bryce, > > I'm not sure what the problem is but here are some tips that may help. > > The undefined references to primitiveGetThreadID and primitiveTestEndOfFileFlag > are references to primitives that exist in a new version of OSPP that has > not been released on SqueakMap. I do not think they are directly related > to your problem. You can find new versions of OSPP on SqueakSource > http://kilana.unibe.ch:8888/OSProcessPlugin, Please stop using kilana. SqueakSource is no longer hosted on kilana, only a proxy remains. Please use the host independend http://www.squeaksource.com/OSProcessPlugin instead. Thanks Philippe > but you should treat this > as experimental because it includes my attempt to implement signal > handling properly in a pthread environment. I do *not* know if I've > gotten this right (I'm testing on Linux with a single threaded VM). > On the other hand, if you find any reason to think that your problem > is related to pthreads , you may want to try the newer version > and see if it helps. > > That said, that actual lockup appears to be happening in AioPlugin, > which invokes the aio functions in the VM. The aio functions depend > on select(), and it looks like this is where you are hanging up. > I certainly can't think of anything in the 3.9 image (versus 3.8) > that could be triggering it, so it seems more likely to be something > in the VM, either the AIO plugin or the underlying aio functions. > > On the other hand, maybe the VM is legitimately hanging up on > a blocking read. There are blocking reads in my Aio unit tests > for pipes, so this is certainly possible. Also, the unit tests > in OSProcess 4.3 have changed to do more testing of pipes, so > this may be where the problem lies. > > If a Squeak VM hangs up on a blocking read, it will be solidly > wedged until it gets some input. On a Linux system, I can break > this condition loose by finding the pid of the Squeak process, > going to /proc/<pid>/fd, and force-feeding data into the file > descriptor files until something breaks loose. If your system > (which OS is it?) has a similar proc file system, you may be > able to do this also. If that is the source of the problem, > you will find the Squeak resumes right where it left off, and > you may be able to determine which unit test method is running > and causing the problem. > > I'm short of time until the weekend, but I'll see if I can > reproduce the problem on my end. > > Dave > > > On Mon, Jan 29, 2007 at 10:45:57PM +0000, [hidden email] wrote: > > > > I'm have some problems getting OSProcess 4.3 to run reliably, I'm > > using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These are the > > latest versions availible via SqueakMap. > > > > Most of my testing was done in my standard development image built on > > the Squeak 3.9 developer image using an Exupery VM. Exupery was loaded > > but not running. I'm trying to get Exupery's stress test to pass which > > runs almost all the tests in the image 3 times. Up until now the > > OSProcess tests have been useful for flushing out context switching > > bugs. > > > > Any idea how to build a VM with OSProcess that will run 3.9 images > > reliably? > > > > > > With both AioPlugin and OSProcessPlugin installed running: > > > > OSPipeTestCase buildSuite run > > > > May lock up the image consuming 100% CPU. This only happens when > > AioPlugin has been built and installed. It doesn't always happen. > > > > Cannot find new threads: generic error > > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > > > Hitting Alt-. sometimes locks up the image after bringing up a > > notifier. > > > > When the image has locked up interrupting with gdb shows: > > (gdb) p printCallStack () > > 1715565776 >idleProcess > > 1715534844 [] in >startUp > > 1715534936 [] in BlockContext>newProcess > > $5 = 10 > > (gdb) where > > #0 0xffffe405 in __kernel_vsyscall () > > #1 0xf7f0d08d in select () from /lib/libc.so.6 > > #2 0x0806ab6c in aioPoll (microSeconds=96000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226 > > #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds ( > > microSeconds=96000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-X11/sqUnixX11.c:2304 > > #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:477 > > #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:19142 > > #6 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805bde0) > > at gnu-interp.c:4093 > > #7 0x08066c57 in interpret () at gnu-interp.c:9080 > > #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0) > > at > > /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > > > > > Running: > > > > 300 timesRepeat: [CommandShellTestCase run: #testPipeline73] > > > > Causes the image to lock up consuming 0% CPU. This happens even when > > AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image > > with just OSProcess loaded running the latest Linux VM from > > squeak.org. My Squeak images normally idle consuming a few percent CPU > > due to polling. > > > > Cannot find new threads: generic error > > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > > > 0xffffe405 in __kernel_vsyscall () > > (gdb) where > > #0 0xffffe405 in __kernel_vsyscall () > > #1 0xf7e39143 in read () from /lib/libc.so.6 > > #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6 > > #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6 > > #4 0xf7dee3cd in __underflow () from /lib/libc.so.6 > > #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6 > > #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6 > > #7 0xf7de19f0 in fread () from /lib/libc.so.6 > > #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1, > > byteArrayIndex=0x66412814 "", startIndex=0) > > at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247 > > #9 0xf7bb12a4 in primitiveFileRead () > > at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/FilePlugin.c:641 > > #10 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0xf7bb1160) > > at gnu-interp.c:4093 > > #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540 > > #12 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805c240) > > at gnu-interp.c:4093 > > #13 0x08066c57 in interpret () at gnu-interp.c:9080 > > #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > (gdb) p printCallStack () > > 1715621928 StandardFileStream>basicNext > > 1715621836 StandardFileStream>next > > 1715622132 [] in OSPipe>next > > 1715621628 BlockContext>on:do: > > 1715621536 OSPipe>next > > 1715622040 [] in OSPipe>next: > > 1715621444 Interval>do: > > 1715621352 OSPipe>next: > > 1715621720 [] in OSPipe>upToEnd > > 1715621236 BlockContext>repeat > > 1715621144 OSPipe>upToEnd > > 1715620776 [] in PipeableEvaluator>? > > 1715620636 PipeableEvaluator>blockValue > > 1715620868 [] in PipeableEvaluator>value > > 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor: > > 1715620524 BlockContext>on:do: > > 1715620432 PipeableEvaluator>handleRuntimeErrorFor: > > 1715565264 PipeableEvaluator>value > > 1715550668 CommandShell>pipeProxy:toCommandList: > > 1715423904 CommandShell>pipeline: > > 1715423788 CommandShellTestCase>testPipeline73 > > > > Here it looks like the primitive is blocking waiting for input that > > doesn't come even though it looks to me like it's blocking on a > > non-blocking file. > > > > Bryce > > |
In reply to this post by johnmci
On Tue, Jan 30, 2007 at 10:57:17AM -0800, John M McIntosh wrote:
> > On Jan 30, 2007, at 4:27 AM, David T. Lewis wrote: > > >That said, that actual lockup appears to be happening in AioPlugin, > >which invokes the aio functions in the VM. The aio functions depend > >on select(), and it looks like this is where you are hanging up. > >I certainly can't think of anything in the 3.9 image (versus 3.8) > >that could be triggering it, so it seems more likely to be something > >in the VM, either the AIO plugin or the underlying aio functions. > > I noted in the 3.9 unix source code has HAVE_NANOSLEEP defined and > when the VM sleeps we > call nanosleep() then aioPoll twice with the value of zero. But then > aioPoll() returns early since the value is > zero and it never invokes select(). > > Perhaps some unix folks can comment if there are any interesting side > effects of this behaviour, such as never calling select() It looks OK to me, aioPoll() only bypasses the select() if there are no file descriptors being watched. /* get out early if there is no pending i/o and no need to relinquish cpu */ if ((maxFd == 0) && (microSeconds == 0)) return 0; > > I'll note issues with MC servers based on 3.9 where they hang, and > a swirl of a mouse on a VNC display of their X-11 display restores > functionality. hmmm.... Dave |
On Jan 30, 2007, at 7:18 PM, David T. Lewis wrote: > It looks OK to me, aioPoll() only bypasses the select() if there > are no > file descriptors being watched. > > /* get out early if there is no pending i/o and no need to > relinquish cpu */ > if ((maxFd == 0) && (microSeconds == 0)) > return 0; mmmm && must mean something heh? sigh However I do know that the select() does terminate on EINTR, but don't understand the implications of that... -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Bryce Kampjes
Hi Bryce,
I have been trying to isolate the problem (problems?) that you found here. I have not identified it yet, but I do have some information to pass along. The problem is intermittent but reproducable. You reported that this fails: "(1 to: 300) do: [:i | Transcript show: i; cr. CommandShellTestCase run: #testPipeline73]" which I adapted as follows for testing: "OSProcess debugMessage: 'start 300 iterations'. (1 to: 300) do: [:i | OSProcess debugMessage: i asString. CommandShell new pipeline: 'ls /no/such/file | stdout nextPutAll: stdin upToEnd !']" The failure mode is that the image hangs completely, and the VM appears to be blocked on a low level file read. This is probably actually a file descriptor for an OS pipe (pipe readers are set non-blocking by OSProcess, but possibly there is a race condition that I have not identified). I tested image versions and OSProcess versions, and found: - The problem is not related to image version 3.8 versus 3.9 (fails on both) - The problem is not related to OSProcess version 4.3 versus 4.0.1 (fails on both) I tested VM and OSPP plugin versions and found: - The combination of recent VM and OSPP fails: Squeak3.8 of ''5 May 2005'' [latest update: #6665] UnixOSProcessPlugin 4 July 2004 (e) version 3.3 - An older combination of VM and OSPP does not fail: Squeak3.7beta of ''1 April 2004'' [latest update: #5923] UnixOSProcessPlugin 4 July 2004 (e) version 3.3 So far this suggests a problem associated with some change in the VM and/or plugins. I have not yet successfully built a combination of newer VM with the older OSPP (due to various annoyances in the build, and I'm out of spare time for now). Hopefully if I can do this, the problem can be narrowed down to either the VM or the plugins (but note that this would not necessarily mean the that VM or plugin has a fault; it could still be a race condition in the image that is just aggrivated by some change in the VM). There was also some discussion in this thread of possible aio problems. This particular issue does not seem to be related to aoi in the VM or to the AIO plugin. Hopefully more to follow later, Dave On Mon, Jan 29, 2007 at 10:45:57PM +0000, [hidden email] wrote: > > I'm have some problems getting OSProcess 4.3 to run reliably, I'm > using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These are the > latest versions availible via SqueakMap. > > Most of my testing was done in my standard development image built on > the Squeak 3.9 developer image using an Exupery VM. Exupery was loaded > but not running. I'm trying to get Exupery's stress test to pass which > runs almost all the tests in the image 3 times. Up until now the > OSProcess tests have been useful for flushing out context switching > bugs. > > Any idea how to build a VM with OSProcess that will run 3.9 images > reliably? > > > With both AioPlugin and OSProcessPlugin installed running: > > OSPipeTestCase buildSuite run > > May lock up the image consuming 100% CPU. This only happens when > AioPlugin has been built and installed. It doesn't always happen. > > Cannot find new threads: generic error > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > Hitting Alt-. sometimes locks up the image after bringing up a > notifier. > > When the image has locked up interrupting with gdb shows: > (gdb) p printCallStack () > 1715565776 >idleProcess > 1715534844 [] in >startUp > 1715534936 [] in BlockContext>newProcess > $5 = 10 > (gdb) where > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7f0d08d in select () from /lib/libc.so.6 > #2 0x0806ab6c in aioPoll (microSeconds=96000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226 > #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds ( > microSeconds=96000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-X11/sqUnixX11.c:2304 > #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:477 > #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:19142 > #6 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805bde0) > at gnu-interp.c:4093 > #7 0x08066c57 in interpret () at gnu-interp.c:9080 > #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0) > at > /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > > Running: > > 300 timesRepeat: [CommandShellTestCase run: #testPipeline73] > > Causes the image to lock up consuming 0% CPU. This happens even when > AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image > with just OSProcess loaded running the latest Linux VM from > squeak.org. My Squeak images normally idle consuming a few percent CPU > due to polling. > > Cannot find new threads: generic error > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > 0xffffe405 in __kernel_vsyscall () > (gdb) where > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7e39143 in read () from /lib/libc.so.6 > #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6 > #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6 > #4 0xf7dee3cd in __underflow () from /lib/libc.so.6 > #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6 > #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6 > #7 0xf7de19f0 in fread () from /lib/libc.so.6 > #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1, > byteArrayIndex=0x66412814 "", startIndex=0) > at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247 > #9 0xf7bb12a4 in primitiveFileRead () > at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/FilePlugin.c:641 > #10 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0xf7bb1160) > at gnu-interp.c:4093 > #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540 > #12 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805c240) > at gnu-interp.c:4093 > #13 0x08066c57 in interpret () at gnu-interp.c:9080 > #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0) > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > (gdb) p printCallStack () > 1715621928 StandardFileStream>basicNext > 1715621836 StandardFileStream>next > 1715622132 [] in OSPipe>next > 1715621628 BlockContext>on:do: > 1715621536 OSPipe>next > 1715622040 [] in OSPipe>next: > 1715621444 Interval>do: > 1715621352 OSPipe>next: > 1715621720 [] in OSPipe>upToEnd > 1715621236 BlockContext>repeat > 1715621144 OSPipe>upToEnd > 1715620776 [] in PipeableEvaluator>? > 1715620636 PipeableEvaluator>blockValue > 1715620868 [] in PipeableEvaluator>value > 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor: > 1715620524 BlockContext>on:do: > 1715620432 PipeableEvaluator>handleRuntimeErrorFor: > 1715565264 PipeableEvaluator>value > 1715550668 CommandShell>pipeProxy:toCommandList: > 1715423904 CommandShell>pipeline: > 1715423788 CommandShellTestCase>testPipeline73 > > Here it looks like the primitive is blocking waiting for input that > doesn't come even though it looks to me like it's blocking on a > non-blocking file. > > Bryce |
Hi Dave,
Thanks, knowing for sure that the testPipeline bug is a real bug not due to something strange in my VMs (including the stock Linux one from squeak.org) is useful. It means that for my testing of Exupery, it's OK to work around it. I was running a test for 20 minutes that should have been pointless but was causing lockups removing it from the 2 hour stress test with a clear conscience is great. I'd previously told the stress test to ignore these classes UnixProcessTestCase, AioEventHandlerTestCase, and UnixProcessFileLockingTestCase. So it's possible that the bug has been there for a while especially if UnixProcessUnixFileLockingTestCase was UnixProcessUnixFileLockingTestCase. It's also worth noting that Exupery's stress test first runs each test class while profiling. If there's a timing issue then it may be effected by the profiler. To work around the problems I've modified Exupery's stress test to ignore UnixProcessUnixFileLockingTestCase, AbstractUnixProcessFileLockingTestCase, and ExuperyStoryTests as well as the eight classes that it's aways ignored. The AIO issues only came up when I installed that module trying to get the right configuration to sort out the testPipeline bug. I hadn't investigated that to the same extent as testPipeline. >From where I stand, it's probably better to just fix the bug in the current development version if it'll be released in the next few months than worry about fixing it in the current versions. Bryce David T. Lewis writes: > Hi Bryce, > > I have been trying to isolate the problem (problems?) that you found here. > I have not identified it yet, but I do have some information to pass along. > > The problem is intermittent but reproducable. You reported that this fails: > > "(1 to: 300) do: [:i | Transcript show: i; cr. CommandShellTestCase run: #testPipeline73]" > > which I adapted as follows for testing: > > "OSProcess debugMessage: 'start 300 iterations'. > (1 to: 300) do: [:i | > OSProcess debugMessage: i asString. > CommandShell new pipeline: 'ls /no/such/file | stdout nextPutAll: stdin upToEnd !']" > > The failure mode is that the image hangs completely, and the VM appears > to be blocked on a low level file read. This is probably actually a file > descriptor for an OS pipe (pipe readers are set non-blocking by OSProcess, > but possibly there is a race condition that I have not identified). > > I tested image versions and OSProcess versions, and found: > - The problem is not related to image version 3.8 versus 3.9 (fails on both) > - The problem is not related to OSProcess version 4.3 versus 4.0.1 (fails on both) > > I tested VM and OSPP plugin versions and found: > - The combination of recent VM and OSPP fails: > Squeak3.8 of ''5 May 2005'' [latest update: #6665] > UnixOSProcessPlugin 4 July 2004 (e) version 3.3 > - An older combination of VM and OSPP does not fail: > Squeak3.7beta of ''1 April 2004'' [latest update: #5923] > UnixOSProcessPlugin 4 July 2004 (e) version 3.3 > > So far this suggests a problem associated with some change in the VM > and/or plugins. I have not yet successfully built a combination of newer > VM with the older OSPP (due to various annoyances in the build, and I'm > out of spare time for now). Hopefully if I can do this, the problem can > be narrowed down to either the VM or the plugins (but note that this > would not necessarily mean the that VM or plugin has a fault; it could > still be a race condition in the image that is just aggrivated by some > change in the VM). > > There was also some discussion in this thread of possible aio problems. > This particular issue does not seem to be related to aoi in the VM or > to the AIO plugin. > > Hopefully more to follow later, > > Dave > > On Mon, Jan 29, 2007 at 10:45:57PM +0000, [hidden email] wrote: > > > > I'm have some problems getting OSProcess 4.3 to run reliably, I'm > > using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These are the > > latest versions availible via SqueakMap. > > > > Most of my testing was done in my standard development image built on > > the Squeak 3.9 developer image using an Exupery VM. Exupery was loaded > > but not running. I'm trying to get Exupery's stress test to pass which > > runs almost all the tests in the image 3 times. Up until now the > > OSProcess tests have been useful for flushing out context switching > > bugs. > > > > Any idea how to build a VM with OSProcess that will run 3.9 images > > reliably? > > > > > > With both AioPlugin and OSProcessPlugin installed running: > > > > OSPipeTestCase buildSuite run > > > > May lock up the image consuming 100% CPU. This only happens when > > AioPlugin has been built and installed. It doesn't always happen. > > > > Cannot find new threads: generic error > > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag > > > > Hitting Alt-. sometimes locks up the image after bringing up a > > notifier. > > > > When the image has locked up interrupting with gdb shows: > > (gdb) p printCallStack () > > 1715565776 >idleProcess > > 1715534844 [] in >startUp > > 1715534936 [] in BlockContext>newProcess > > $5 = 10 > > (gdb) where > > #0 0xffffe405 in __kernel_vsyscall () > > #1 0xf7f0d08d in select () from /lib/libc.so.6 > > #2 0x0806ab6c in aioPoll (microSeconds=96000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226 > > #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds ( > > microSeconds=96000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-X11/sqUnixX11.c:2304 > > #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:477 > > #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:19142 > > #6 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805bde0) > > at gnu-interp.c:4093 > > #7 0x08066c57 in interpret () at gnu-interp.c:9080 > > #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0) > > at > > /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > > > > > Running: > > > > 300 timesRepeat: [CommandShellTestCase run: #testPipeline73] > > > > Causes the image to lock up consuming 0% CPU. This happens even when > > AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image > > with just OSProcess loaded running the latest Linux VM from > > squeak.org. My Squeak images normally idle consuming a few percent CPU > > due to polling. > > > > Cannot find new threads: generic error > > ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88): > > /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID > > > > 0xffffe405 in __kernel_vsyscall () > > (gdb) where > > #0 0xffffe405 in __kernel_vsyscall () > > #1 0xf7e39143 in read () from /lib/libc.so.6 > > #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6 > > #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6 > > #4 0xf7dee3cd in __underflow () from /lib/libc.so.6 > > #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6 > > #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6 > > #7 0xf7de19f0 in fread () from /lib/libc.so.6 > > #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1, > > byteArrayIndex=0x66412814 "", startIndex=0) > > at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247 > > #9 0xf7bb12a4 in primitiveFileRead () > > at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/FilePlugin.c:641 > > #10 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0xf7bb1160) > > at gnu-interp.c:4093 > > #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540 > > #12 0x080535f7 in dispatchFunctionPointer (aFunctionPointer=0x805c240) > > at gnu-interp.c:4093 > > #13 0x08066c57 in interpret () at gnu-interp.c:9080 > > #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0) > > at /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390 > > (gdb) p printCallStack () > > 1715621928 StandardFileStream>basicNext > > 1715621836 StandardFileStream>next > > 1715622132 [] in OSPipe>next > > 1715621628 BlockContext>on:do: > > 1715621536 OSPipe>next > > 1715622040 [] in OSPipe>next: > > 1715621444 Interval>do: > > 1715621352 OSPipe>next: > > 1715621720 [] in OSPipe>upToEnd > > 1715621236 BlockContext>repeat > > 1715621144 OSPipe>upToEnd > > 1715620776 [] in PipeableEvaluator>? > > 1715620636 PipeableEvaluator>blockValue > > 1715620868 [] in PipeableEvaluator>value > > 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor: > > 1715620524 BlockContext>on:do: > > 1715620432 PipeableEvaluator>handleRuntimeErrorFor: > > 1715565264 PipeableEvaluator>value > > 1715550668 CommandShell>pipeProxy:toCommandList: > > 1715423904 CommandShell>pipeline: > > 1715423788 CommandShellTestCase>testPipeline73 > > > > Here it looks like the primitive is blocking waiting for input that > > doesn't come even though it looks to me like it's blocking on a > > non-blocking file. > > > > Bryce |
Free forum by Nabble | Edit this page |