I'm getting errors trying to file OSProcess (the st file from
OSProcess-dtl.56.mcz) into Cuis. I am trying to do this so I can run magma tests. I don't know if there's any chance of this working. In the image Smalltalk listBuiltinModules #('ExuperyPlugin 27 February 2009 (i)') Smalltalk listLoadedModules. #('UnixOSProcessPlugin 27 February 2009 (e)' 'SoundPlugin 27 February 2009 (e)' 'BitBltPlugin 8 April 2009 (e)' 'LargeIntegers v1.5 27 February 2009 (e)' 'MiscPrimitivePlugin 27 February 2009 (e)' 'SecurityPlugin 27 February 2009 (e)' 'FilePlugin 27 February 2009 (e)') So it looks as if the necessary plugin is there (Smalltalk vmVersion 'Pharo0.1 of 16 May 2008 [latest update: #10074]') I think there's a small bug in OSProcessAccessor class>>forThisOSProcess. I changed the last 2 lines to be oldAccessor ifNotNil: [oldAccessor release; finalize.]. ^ ThisOSProcessAccessor := self concreteClass basicNew initialize] I added the ifNotNil: test; without it release fails because it is sent to an undefined object. The ifNotNil: test probably belongs earlier in the code, but this was enough to get it to run. However, the final initialize fails. It calls UnixOSProcessAccessor>>grimReaperProcess, which fails because grimReaper is nil. I notice the very last OSProcess changeset touched this method. I'm not sure if the problem is in OSProcess or in Cuis missing something that is expected. Or maybe I should try a different VM? Anybody know what to do? I'm running on Linux. Ross P.S. http://wiki.squeak.org/squeak/708 on OSProcess includes a seemingly dated reference to a sar file. It might need an update. |
On Thu, 2010-09-02 at 14:20 -0700, Ross Boylan wrote:
> I'm getting errors trying to file OSProcess (the st file from > OSProcess-dtl.56.mcz) into Cuis. I am trying to do this so I can run > magma tests. > > I don't know if there's any chance of this working. In the image > Smalltalk listBuiltinModules #('ExuperyPlugin 27 February 2009 (i)') > Smalltalk listLoadedModules. #('UnixOSProcessPlugin 27 February 2009 (e)' 'SoundPlugin 27 February 2009 (e)' 'BitBltPlugin 8 April 2009 (e)' 'LargeIntegers v1.5 27 February 2009 (e)' 'MiscPrimitivePlugin 27 February 2009 (e)' 'SecurityPlugin 27 February 2009 (e)' 'FilePlugin 27 February 2009 (e)') > So it looks as if the necessary plugin is there (Smalltalk vmVersion > 'Pharo0.1 of 16 May 2008 [latest update: #10074]') > > I think there's a small bug in OSProcessAccessor > class>>forThisOSProcess. I changed the last 2 lines to be > oldAccessor ifNotNil: > [oldAccessor release; finalize.]. > ^ ThisOSProcessAccessor := self concreteClass basicNew initialize] > I added the ifNotNil: test; without it release fails because it is sent > to an undefined object. The ifNotNil: test probably belongs earlier in > the code, but this was enough to get it to run. > > However, the final initialize fails. It calls > UnixOSProcessAccessor>>grimReaperProcess, which fails because grimReaper > is nil. I notice the very last OSProcess changeset touched this method. > I think they should be grimReaperProcess "This is a process which waits for the death of a child OSProcess, and informs any dependents of the change. Use SIGCHLD events if possible, otherwise a Delay to poll for exiting child processes." | event processSynchronizationDelay | ^ self canAccessSystem ifTrue: [event := (self canAccessSystem and: [self canForwardExternalSignals]) ifTrue: [self sigChldSemaphore] ifFalse: [Delay forMilliseconds: 200]. processSynchronizationDelay := Delay forMilliseconds: 20. grimReaper ifNil: [grimReaper := [[event wait. processSynchronizationDelay wait. "Avoids lost signals in heavy process switching" self changed: #childProcessStatus] repeat] fork.]. "name selected to look reasonable in the process browser" grimReaper name: ((ReadStream on: grimReaper hash asString) next: 5) , ': the child OSProcess watcher'] ifFalse: [nil] I moved a parentheses from after 'OSProcess watcher' on the 3rd to last line to after the fork. I also added "." after the ]. The second problem is that fork in Cuis does not return an object. This is perhaps related to the bug that inspired the latest OSProcess release. There's a very informative comment on the Cuis fork method: fork "Create and schedule a Process running the code in the receiver." "jmv - Do NOT answer the new process. See http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-February/124960.html Most times, this methods returns before resuming the new process (if priority of new process is less or equal than current). But it might return afterwards. This means it is very dangerous to use the returned process in code that stores it in some variable and checks for nil to start a new one. If this methods happens to return after the new process is forked, chances are the code that starts all this runs again, that variable is nil, and a second process is forked, perhaps breaking some shared state. This kind of bug is hard to spot and debug. Callers wanting the new process object, should call #newProcess, store the answer, and then #resume. A way to ensure this bug will not ever happen again is just to answer nil" self newProcess resume. ^nil I think the fix is to do that (newProcess and resume), but perhaps this means some of the syncronization code in the current grimReaperProcess can go. Ross P.S. The original error arose when grimReaper name: failed. |
I did this
grimReaperProcess "This is a process which waits for the death of a child OSProcess, and informs any dependents of the change. Use SIGCHLD events if possible, otherwise a Delay to poll for exiting child processes." | event processSynchronizationDelay | ^ self canAccessSystem ifTrue: [event := (self canAccessSystem and: [self canForwardExternalSignals]) ifTrue: [self sigChldSemaphore] ifFalse: [Delay forMilliseconds: 200]. processSynchronizationDelay := Delay forMilliseconds: 20. grimReaper ifNil: [grimReaper := [[event wait. processSynchronizationDelay wait. "Avoids lost signals in heavy process switching" self changed: #childProcessStatus] repeat] newProcess. grimReaper resume.]. "name selected to look reasonable in the process browser" grimReaper name: ((ReadStream on: grimReaper hash asString) next: 5) , ': the child OSProcess watcher'] ifFalse: [nil] I got a "code simulation error" every time the debugger tried to execute newProcess, but by restarting the calling method and hitting proceed I seemed to complete the filein. If I understand the comment on fork in Cuis, this should work, and be safer, in all flavors of squeak. Ross |
I started Cuis's Sunit tester and selected the Unix test classes with
OSProcess in them. The image crashed immediately when I ran the tests. There was no debug log, but the terminal showed XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 94458 requests (94404 known processed) with 0 events remaining. squeak: ../../src/xcb_io.c:182: process_responses: Assertion `((int) (((dpy->last_request_read)) - ((dpy->request))) <= 0)' failed. Ross |
On Thu, Sep 02, 2010 at 03:05:24PM -0700, Ross Boylan wrote:
> I started Cuis's Sunit tester and selected the Unix test classes with > OSProcess in them. The image crashed immediately when I ran the tests. > There was no debug log, but the terminal showed > XIO: fatal IO error 11 (Resource temporarily unavailable) on X server > ":0.0" > after 94458 requests (94404 known processed) with 0 events > remaining. > squeak: ../../src/xcb_io.c:182: process_responses: Assertion `((int) > (((dpy->last_request_read)) - ((dpy->request))) <= 0)' failed. Ross, Thanks for your work on this. I'll add your bug fixes (from previous posts) to OSProcess this weekend. With respect to the failure above, what VM are you running? I suspect that the timer changes in Cog may not sit well with the #forkSqueak in OSProcess (I'm afraid I have not checked yet, sorry). The reason I suspect this is that #forkSqueak has to shut off all timer signals while doing the fork, and reenable them afterwards in both the parent and child Squeak. The underlying timer notifications work differently in Cog, so it would not surprise me to see problems. You might try running some of the tests individually and avoid tests that are doing #forkSqueak. If my guess is right here, the rest of OSProcess will probably work fine as long as you don't try to fork a running copy of Squeak with #forkSqueak. Dave |
On Thu, 2010-09-02 at 18:23 -0400, David T. Lewis wrote:
> On Thu, Sep 02, 2010 at 03:05:24PM -0700, Ross Boylan wrote: > > I started Cuis's Sunit tester and selected the Unix test classes with > > OSProcess in them. The image crashed immediately when I ran the tests. > > There was no debug log, but the terminal showed > > XIO: fatal IO error 11 (Resource temporarily unavailable) on X server > > ":0.0" > > after 94458 requests (94404 known processed) with 0 events > > remaining. > > squeak: ../../src/xcb_io.c:182: process_responses: Assertion `((int) > > (((dpy->last_request_read)) - ((dpy->request))) <= 0)' failed. > > Ross, > > Thanks for your work on this. I'll add your bug fixes (from previous > posts) to OSProcess this weekend. > > With respect to the failure above, what VM are you running? Smalltalk vmVersion 'Pharo0.1 of 16 May 2008 [latest update: #10074]' It's an earlyish closure enabled VM, I think identical to the squeak development VM around the same time. If I understand your remarks (below), you are saying some of OSProcess may not work with Cog; since I'm not using Cog, something else must be going on. I tried running the same tests in squeak 4.1 (same VM, though) and it also crashed. Ther terminal showed a dump of my /etc/hosts file and then squeak: ../../src/xcb_io.c:182: process_responses: Assertion `((int) (((dpy->last_request_read)) - ((dpy->request))) <= 0)' failed. i.e., the same error as in Cuis. It's possible that the tests or the order they were run in differed between Cuis and squeak, since the unit test GUI is a bit different and I might have selected slightly different tests (probably more in squeak 4.1, for which I filtered on *Process* and then removed the test class with win32 in the name). I'm wondering if the fact that I'm using a random binary VM, rather than one built specifically for my system, might be causing trouble. I tried with squeak4.1 and vmVersion 'Squeak3.10.2 of ''5 June 2008'' [latest update: #7179]', which probably was built specifically for Debian. All 131 tests passed. Unfortunately, I don't think that VM is closure enabled. Ross > I suspect > that the timer changes in Cog may not sit well with the #forkSqueak > in OSProcess (I'm afraid I have not checked yet, sorry). The reason > I suspect this is that #forkSqueak has to shut off all timer signals > while doing the fork, and reenable them afterwards in both the parent > and child Squeak. The underlying timer notifications work differently > in Cog, so it would not surprise me to see problems. You might try > running some of the tests individually and avoid tests that are > doing #forkSqueak. If my guess is right here, the rest of OSProcess > will probably work fine as long as you don't try to fork a running > copy of Squeak with #forkSqueak. > > Dave |
On Thu, 2010-09-02 at 15:45 -0700, Ross Boylan wrote:
> I tried with squeak4.1 and vmVersion 'Squeak3.10.2 of ''5 June > 2008'' [latest update: #7179]', which probably was built specifically > for Debian. All 131 tests passed. Unfortunately, I don't think that > VM > is closure enabled. I believe the vmVersion string is misleading. The Debian package version is 1:4.0.3.2202-2, i.e. 4.0. I also see $ /usr/lib/squeak/4.0.3-2202/squeakvm -version 4.0.3-2202 #1 XShm Sat Apr 17 18:57:13 UTC 2010 gcc 4.4.3 Linux biber 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 i686 GNU/Linux plugin path: /usr/lib/squeak/4.0.3-2202/ [default: /usr/lib/squeak/4.0.3-2202/] so I think the version string just wasn't updated. Also, I think images that require (good) closures won't even start without the right VM. I ran again with this VM and Cuis2.6. Here's an updated version of one of the problem methods: forThisOSProcess "Answer a single instance corresponding to the OS process in which this Smalltalk image is executing." "OSProcessAccessor forThisOSProcess" (ThisOSProcessAccessor notNil and: [ThisOSProcessAccessor isResponsibleForThisPlatform]) ifTrue: [^ ThisOSProcessAccessor]. ThisOSProcessAccessor ifNotNil: [ ThisOSProcessAccessor changed: #invalidProcessAccessor. ThisOSProcessAccessor release; finalize.]. "We are running on a different platform, so start a new accessor" ^ ThisOSProcessAccessor := self concreteClass basicNew initialize Unless there is some subtlety that requires assigning ThisOSProcessAccessor, I think this is better than my earlier attempt. I discovered one more problem when I tried to save the image. ifNotNilDo: is not understood in Cuis. I hope ifNotNil: is portable, since I used it above. Here's a further rewrite: ThisOSProcess class>>startUp: startUp: resuming "Initialize my singleton instance, and the singleton instance of my OSProcessAccessor. On Unix, set the signal handler in my process accessor to respond to externally generated sigchld signals. This must be done after each image restart in order to call a primitive which informs the VM of the identity of the semaphore to signal. When not running on a Unix system, the primitive fails and the method has no effect. Notify dependents of the singleton instance if the image has restarted in a different OS process (this is not the case when #startUp is called after a simple image save). The notification is done in the initialization of my OSProcessAccessor." (Smalltalk at: #AioEventHandler) ifNotNil: [ :aio | aio startUp: resuming ]. self initializeThisOSProcess After all that, I can run the tests (again, excluding UnixProcessWin32FileLockingTestCase). 43 of 127 fail. However, every one of the handful I check failed because of ifNotNilDo: When I added an implementation of that, I got 1 failure and 12 errors. Many of the errors failed because tearDown | d | OSProcessAccessor emulateWin32FileLocking: self initialCompatibilitySetting. d := Delay forMilliseconds: 50. self fileStream close. self remoteProcess ifNotNilDo: [:p | p terminate. [p isComplete] whileFalse: [d wait]. self remoteProcess: nil] has p nil and p terminate failed. Almost certainly some setup code assumed fork returned a process when it didn't. The failure is in UnixProcessUnixFileLockingTestCase(AbstractUnixProcessFileLockingTestCase)>>testCooperatingProcesses04 with the whole doRemote: and block highlighted in the debugger. This is probably from the same root cause. Ross |
In reply to this post by Ross Boylan-2
On 02/09/2010 06:55 p.m., Ross Boylan wrote:
> > I got a "code simulation error" every time the debugger tried to execute > newProcess, but by restarting the calling method and hitting proceed I > seemed to complete the filein. That just means that you can not run that code in the debugger. You can halt later. Cheers, Juan Vuletich |
In reply to this post by Ross Boylan-2
On 02/09/2010 06:48 p.m., Ross Boylan wrote:
> On Thu, 2010-09-02 at 14:20 -0700, Ross Boylan wrote: >> I'm getting errors trying to file OSProcess (the st file from >> OSProcess-dtl.56.mcz) into Cuis. I am trying to do this so I can run >> magma tests. >> >> I don't know if there's any chance of this working. In the image >> Smalltalk listBuiltinModules #('ExuperyPlugin 27 February 2009 (i)') >> Smalltalk listLoadedModules. #('UnixOSProcessPlugin 27 February 2009 (e)' 'SoundPlugin 27 February 2009 (e)' 'BitBltPlugin 8 April 2009 (e)' 'LargeIntegers v1.5 27 February 2009 (e)' 'MiscPrimitivePlugin 27 February 2009 (e)' 'SecurityPlugin 27 February 2009 (e)' 'FilePlugin 27 February 2009 (e)') >> So it looks as if the necessary plugin is there (Smalltalk vmVersion >> 'Pharo0.1 of 16 May 2008 [latest update: #10074]') >> >> I think there's a small bug in OSProcessAccessor >> class>>forThisOSProcess. I changed the last 2 lines to be >> oldAccessor ifNotNil: >> [oldAccessor release; finalize.]. >> ^ ThisOSProcessAccessor := self concreteClass basicNew initialize] >> I added the ifNotNil: test; without it release fails because it is sent >> to an undefined object. The ifNotNil: test probably belongs earlier in >> the code, but this was enough to get it to run. >> >> However, the final initialize fails. It calls >> UnixOSProcessAccessor>>grimReaperProcess, which fails because grimReaper >> is nil. I notice the very last OSProcess changeset touched this method. >> > I think there are 2 separate problems. First, the parentheses are off. > I think they should be > grimReaperProcess > "This is a process which waits for the death of a child OSProcess, and > informs any dependents of the change. Use SIGCHLD events if possible, > otherwise a Delay to poll for exiting child processes." > > | event processSynchronizationDelay | > ^ self canAccessSystem > ifTrue: > [event := (self canAccessSystem and: [self canForwardExternalSignals]) > ifTrue: [self sigChldSemaphore] > ifFalse: [Delay forMilliseconds: 200]. > processSynchronizationDelay := Delay forMilliseconds: 20. > grimReaper ifNil: > [grimReaper := > [[event wait. > processSynchronizationDelay wait. "Avoids lost signals in heavy process switching" > self changed: #childProcessStatus] repeat] fork.]. > "name selected to look reasonable in the process browser" > grimReaper name: ((ReadStream on: grimReaper hash asString) next: 5) > , ': the child OSProcess watcher'] > ifFalse: > [nil] > I moved a parentheses from after 'OSProcess watcher' on the 3rd to last > line to after the fork. I also added "." after the ]. > > The second problem is that fork in Cuis does not return an object. This > is perhaps related to the bug that inspired the latest OSProcess > release. There's a very informative comment on the Cuis fork method: > fork > "Create and schedule a Process running the code in the receiver." > > "jmv - Do NOT answer the new process. > > See http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-February/124960.html > > Most times, this methods returns before resuming the new process (if priority of new process is less > or equal than current). But it might return afterwards. > > This means it is very dangerous to use the returned process in code that stores it in some variable > and checks for nil to start a new one. If this methods happens to return after the new process is forked, > chances are the code that starts all this runs again, that variable is nil, and a second process is forked, > perhaps breaking some shared state. This kind of bug is hard to spot and debug. > > Callers wanting the new process object, should call #newProcess, store the answer, and then #resume. > > A way to ensure this bug will not ever happen again is just to answer nil" > > self newProcess resume. > ^nil > > I think the fix is to do that (newProcess and resume), but perhaps this > means some of the syncronization code in the current grimReaperProcess > can go. > > Ross > > P.S. The original error arose when grimReaper name: failed. I believe it would be better to have #fork and friends return nil also in Squeak. It is much easier to fix senders... Another alternative, even more radical is to have #fork and friends do self error: 'Never call this method. Use #newProcess and #resume instead' to make senders easier to spot. You needed to figure yourself that the answer of #fork was nil. The argument against this is that there are "legitimate" senders of #fork, i.e. code that does not use the returned value... Cheers, Juan Vuletich |
> I believe it would be better to have #fork and friends return nil also
> in Squeak. -1 I do use the return value of #fork ... Stef |
On 03/09/2010 12:18 p.m., Stéphane Rollandin wrote:
>> I believe it would be better to have #fork and friends return nil also >> in Squeak. > > -1 > > I do use the return value of #fork ... > > Stef Why not change senders to the simple and safer #newProcess / #resume sequence? It only took me about 15 minutes to change all of them in Cuis... Cheers, Juan Vuletich |
In reply to this post by David T. Lewis
ExternalWindowsOSProcess>>value is
value "Start the external process" | procInfo mainThread | self isNotYetRunning ifTrue: [procInfo := OSProcess accessor primCommand: self commandLine. procInfo isNil ifTrue: [self initialStdErr nextPutAll: 'cannot execute ', self commandLine; cr. self exitStatus: #cannotExecuteCommandLine. "FIXME: Close the OSPipes now, otherwise the image will block on a read" self closeStreams. [self complete] fork "defer execution so OSPipes stay in place for now"] ifFalse: [self pid: (procInfo at: 3). self handle: (procInfo at: 1). mainThread := WindowsThread threadID: (procInfo at: 4) handle: (procInfo at: 2) running: true. self threads add: mainThread. self running. OSProcess thisOSProcess registerChildProcess: self. "FIXME: Close the initial pipe handles. For now, I have not implemented passing these to the child, and there is no support yet for nonblocking Windows OS pipes. Once those are available, this method needs to change to support." self closeStreams]]. Because of the possible race in fork (and Cuis's changing it to return nil to avoid the race) I reworked part of the code to procInfo isNil ifTrue: [|proc| self initialStdErr nextPutAll: 'cannot execute ', self commandLine; cr. self exitStatus: #cannotExecuteCommandLine. "FIXME: Close the OSPipes now, otherwise the image will block on a read" self closeStreams. proc :=[self complete] newProcess. proc resume. proc "defer execution so OSPipes stay in place for now"] However, I don't understand what the "defer execution" comment is about. The execution doesn't look deferred in the original, which appears to start the process right away (if one is unaware of the race condition). The comment looks even stranger in the new code. BTW this is Win32 code, and so it's not going to run when I do tests on Linux. The result of the original "[self complete] fork" is a possible return value of the method. I don't know that anything uses that return value, but it seemed safest to assure it was a process and not momentarily nil. Ross |
On Fri, Sep 03, 2010 at 11:50:10AM -0700, Ross Boylan wrote:
> ExternalWindowsOSProcess>>value is > value > "Start the external process" > > | procInfo mainThread | > self isNotYetRunning ifTrue: > [procInfo := OSProcess accessor primCommand: self commandLine. > procInfo isNil > ifTrue: > [self initialStdErr nextPutAll: 'cannot execute ', self commandLine; cr. > self exitStatus: #cannotExecuteCommandLine. > "FIXME: Close the OSPipes now, otherwise the image will block on a read" > self closeStreams. > [self complete] fork "defer execution so OSPipes stay in place for now"] > ifFalse: > [self pid: (procInfo at: 3). > self handle: (procInfo at: 1). > mainThread := WindowsThread > threadID: (procInfo at: 4) > handle: (procInfo at: 2) > running: true. > self threads add: mainThread. > self running. > OSProcess thisOSProcess registerChildProcess: self. > "FIXME: Close the initial pipe handles. For now, I have not implemented > passing these to the child, and there is no support yet for nonblocking > Windows OS pipes. Once those are available, this method needs to change > to support." > self closeStreams]]. > Because of the possible race in fork (and Cuis's changing it to return > nil to avoid the race) I reworked part of the code to > procInfo isNil > ifTrue: > [|proc| > self initialStdErr nextPutAll: 'cannot execute ', self commandLine; cr. > self exitStatus: #cannotExecuteCommandLine. > "FIXME: Close the OSPipes now, otherwise the image will block on a read" > self closeStreams. > proc :=[self complete] newProcess. > proc resume. > proc "defer execution so OSPipes stay in place for now"] > However, I don't understand what the "defer execution" comment is about. > The execution doesn't look deferred in the original, which appears to > start the process right away (if one is unaware of the race condition). > The comment looks even stranger in the new code. I don't recall why I needed to put the #complete in a background process, although it probably had something to do with my incomplete implementation of pipes on Windows. In any case, the use of #fork should not be a concern here, as we are not making use of the return value. > > BTW this is Win32 code, and so it's not going to run when I do tests on > Linux. > > The result of the original "[self complete] fork" is a possible return > value of the method. I don't know that anything uses that return value, > but it seemed safest to assure it was a process and not momentarily nil. > > Ross > |
Free forum by Nabble | Edit this page |