Smalltalk › Squeak › Squeak - Dev

OSProcess and Cuis: successes and failures, cumulative changes

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Ross Boylan-2

OSProcess and Cuis: successes and failures, cumulative changes

I've attached all the changes I've made to OSProcess. This includes new
work that replaces #fork with #newProcess and #resume for those
situations in which the return value might be used.

It mostly does not have changes from #ifNotNilDo: to #ifNotNil:. So to
run it in Cuis requires implementing ifNotNilDo:, or making the
remaining changes from ifNotNilDo: to ifNotNil:. I think the
changeset's one change to ifNotNil: is in a method that's activated
during filein.

Despite the work on fork, I'm still getting about the same number of
test failures in Cuis 2.6 (recall OSProcess unaltered passed all tests
in squeak 4.1). Adding in UnixProcessWin32FileLockingTestCase, there
are 167 tests with 2 failures and 24 errors. All problems are in
UnixProcess{Unix,Win32}FileLockingTestCase.

I'm also starting to suspect that the apparent errors with the process p
nil were artifacts of the debugging process. Typically the error is in
tearDown

| d |
OSProcessAccessor emulateWin32FileLocking: self initialCompatibilitySetting.
d := Delay forMilliseconds: 50.
self fileStream close.
self remoteProcess ifNotNilDo:
[:p | p terminate.
[p isComplete] whileFalse: [d wait].
self remoteProcess: nil]

In p terminate, p is undefined and terminate is not understood. This
really doesn't make sense, since the block is only executed ifNotNilDo:
and when I trace into it things seem OK. The final step sets the remote
process to nil; perhaps by the time I bring up the debugger the old
remote process is gone.

Here's the code I put in ProtoObject
ifNotNilDo: ifNotNilBlock
"RB adds for back compatibility"
"Evaluate the block, unless I'm == nil (q.v.)"

^ ifNotNilBlock valueWithPossibleArgs: {self}

OSProcess-rb.1.cs (6K) Download Attachment

David T. Lewis

Re: OSProcess and Cuis: successes and failures, cumulative changes

Ross,

Thanks very much for this. I just started looking at your updates and
the change set will help a lot. FYI, it turns out that #ifNotNil: with
a block parameter is not portable to Squeak 3.6 so I will probably recode
a few things so that both Cuis and Squeak 3.6 will still be happy.

Thanks a lot,

Dave

On Fri, Sep 03, 2010 at 01:10:42PM -0700, Ross Boylan wrote:

> I've attached all the changes I've made to OSProcess. This includes new
> work that replaces #fork with #newProcess and #resume for those
> situations in which the return value might be used.
>
> It mostly does not have changes from #ifNotNilDo: to #ifNotNil:. So to
> run it in Cuis requires implementing ifNotNilDo:, or making the
> remaining changes from ifNotNilDo: to ifNotNil:. I think the
> changeset's one change to ifNotNil: is in a method that's activated
> during filein.
>
> Despite the work on fork, I'm still getting about the same number of
> test failures in Cuis 2.6 (recall OSProcess unaltered passed all tests
> in squeak 4.1). Adding in UnixProcessWin32FileLockingTestCase, there
> are 167 tests with 2 failures and 24 errors. All problems are in
> UnixProcess{Unix,Win32}FileLockingTestCase.
>
> I'm also starting to suspect that the apparent errors with the process p
> nil were artifacts of the debugging process. Typically the error is in
> tearDown
>
> | d |
> OSProcessAccessor emulateWin32FileLocking: self initialCompatibilitySetting.
> d := Delay forMilliseconds: 50.
> self fileStream close.
> self remoteProcess ifNotNilDo:
> [:p | p terminate.
> [p isComplete] whileFalse: [d wait].
> self remoteProcess: nil]
>
> In p terminate, p is undefined and terminate is not understood. This
> really doesn't make sense, since the block is only executed ifNotNilDo:
> and when I trace into it things seem OK. The final step sets the remote
> process to nil; perhaps by the time I bring up the debugger the old
> remote process is gone.
>
> Here's the code I put in ProtoObject
> ifNotNilDo: ifNotNilBlock
> "RB adds for back compatibility"
> "Evaluate the block, unless I'm == nil (q.v.)"
>
> ^ ifNotNilBlock valueWithPossibleArgs: {self}

Ross Boylan-2

Re: OSProcess and Cuis: successes and failures, cumulative changes

On Fri, 2010-09-03 at 16:22 -0400, David T. Lewis wrote:
> > Here's the code I put in ProtoObject
> > ifNotNilDo: ifNotNilBlock
> > "RB adds for back compatibility"
> > "Evaluate the block, unless I'm == nil (q.v.)"
> >
> > ^ ifNotNilBlock valueWithPossibleArgs: {self}
>
Duh. I also need, in UndefinedObject in Cuis
ifNotNilDo: aBlock
"RB back compatibility"
^ self

Of course the right way is to change the OSProcess code to use something
more portable than ifNotNilDo:.

With that change, and still more tests added, I have 3 failures out of
190 tests run. They are in
UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses0[12]
and the comparable 02 (but not 01) method for Win32 version.

Unfortunately, the debugger one can bring up on the failures isn't very
informative about the exact source of the failure.

Ross

David T. Lewis

Re: OSProcess and Cuis: successes and failures, cumulative changes

On Fri, Sep 03, 2010 at 02:35:53PM -0700, Ross Boylan wrote:

> On Fri, 2010-09-03 at 16:22 -0400, David T. Lewis wrote:
> > > Here's the code I put in ProtoObject
> > > ifNotNilDo: ifNotNilBlock
> > > "RB adds for back compatibility"
> > > "Evaluate the block, unless I'm == nil (q.v.)"
> > >
> > > ^ ifNotNilBlock valueWithPossibleArgs: {self}
> >
> Duh. I also need, in UndefinedObject in Cuis
> ifNotNilDo: aBlock
> "RB back compatibility"
> ^ self
>
> Of course the right way is to change the OSProcess code to use something
> more portable than ifNotNilDo:.

Hi Ross,

I have updated both OSProcess and CommandShell on SqueakSource to address
all of the #ifNotNilDo: issues for Cuis, as well as the #fork issues that
you previously addressed. I think that I've captured all of your fixes.
Thank you *very* much for this.

I did not actually load into Cuis to check, but a lot of the issues
were in the unit tests so I expect they will run properly now (knock
wood).

And I have to give a grudging nod of appreciation to Juan for forcing
me to address the #ifNotNilDo: concerns. At first I was annoyed that I
had to rewrite a lot of code, but I have to admit that by the time I
was done I had cleaned out a lot of ugly cruft from OSProcess, so
I guess once again Juan has got it right :)

Dave

>
> With that change, and still more tests added, I have 3 failures out of
> 190 tests run. They are in
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses0[12]
> and the comparable 02 (but not 01) method for Win32 version.
>
> Unfortunately, the debugger one can bring up on the failures isn't very
> informative about the exact source of the failure.
>
> Ross
>

Ross Boylan-2

Re: OSProcess and Cuis: successes and failures, cumulative changes

On Fri, 2010-09-03 at 19:55 -0400, David T. Lewis wrote:

> On Fri, Sep 03, 2010 at 02:35:53PM -0700, Ross Boylan wrote:
> > On Fri, 2010-09-03 at 16:22 -0400, David T. Lewis wrote:
> > > > Here's the code I put in ProtoObject
> > > > ifNotNilDo: ifNotNilBlock
> > > > "RB adds for back compatibility"
> > > > "Evaluate the block, unless I'm == nil (q.v.)"
> > > >
> > > > ^ ifNotNilBlock valueWithPossibleArgs: {self}
> > >
> > Duh. I also need, in UndefinedObject in Cuis
> > ifNotNilDo: aBlock
> > "RB back compatibility"
> > ^ self
> >
> > Of course the right way is to change the OSProcess code to use something
> > more portable than ifNotNilDo:.
>
>
> Hi Ross,
>
> I have updated both OSProcess and CommandShell on SqueakSource to address
> all of the #ifNotNilDo: issues for Cuis, as well as the #fork issues that
> you previously addressed. I think that I've captured all of your fixes.
> Thank you *very* much for this.
>
> I did not actually load into Cuis to check, but a lot of the issues
> were in the unit tests so I expect they will run properly now (knock
> wood).
>
> And I have to give a grudging nod of appreciation to Juan for forcing
> me to address the #ifNotNilDo: concerns. At first I was annoyed that I
> had to rewrite a lot of code, but I have to admit that by the time I
> was done I had cleaned out a lot of ugly cruft from OSProcess, so
> I guess once again Juan has got it right :)
>
> Dave

The latest changeset files in cleanly to a new Cuis2.7 image (in
particular, I did not add any hacks for ifNotNilDo:). All 190 tests
passed the first 2 or 3 times I ran the tests, but, after running the
2.6 tests below, I got various failures in 2.7.
UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses02 failed
once, and then
UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04 the next
time.

The 2.6 and 2.7 tests were in different directories. In the initial 2.7
tests only that image was up, while in all later cases both were up.
The 2.7 image seems to do somewhat better than 2.6. When I start from a
fresh Cuis2.6 image, I get non-deterministic failures. The first test
suite run hung up, from memory at UnixProcessTestCase>>testCatAFile.
The last two runs failed in
UnixProcessWin32FileLockingTestCase>>testCooperatingProcesses04.

I shut down the 2.7 image and tried running just the 2.6. There was one
failure, in
UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04.

Juan, do you have any ideas what changes between 2.6 and 2.7 might be
relevant, or what might be causing these failure patterns?

Ross

David T. Lewis

Re: OSProcess and Cuis: successes and failures, cumulative changes

On Fri, Sep 03, 2010 at 07:42:51PM -0700, Ross Boylan wrote:

> On Fri, 2010-09-03 at 19:55 -0400, David T. Lewis wrote:
> > On Fri, Sep 03, 2010 at 02:35:53PM -0700, Ross Boylan wrote:
> > > On Fri, 2010-09-03 at 16:22 -0400, David T. Lewis wrote:
> > > > > Here's the code I put in ProtoObject
> > > > > ifNotNilDo: ifNotNilBlock
> > > > > "RB adds for back compatibility"
> > > > > "Evaluate the block, unless I'm == nil (q.v.)"
> > > > >
> > > > > ^ ifNotNilBlock valueWithPossibleArgs: {self}
> > > >
> > > Duh. I also need, in UndefinedObject in Cuis
> > > ifNotNilDo: aBlock
> > > "RB back compatibility"
> > > ^ self
> > >
> > > Of course the right way is to change the OSProcess code to use something
> > > more portable than ifNotNilDo:.
> >
> >
> > Hi Ross,
> >
> > I have updated both OSProcess and CommandShell on SqueakSource to address
> > all of the #ifNotNilDo: issues for Cuis, as well as the #fork issues that
> > you previously addressed. I think that I've captured all of your fixes.
> > Thank you *very* much for this.
> >
> > I did not actually load into Cuis to check, but a lot of the issues
> > were in the unit tests so I expect they will run properly now (knock
> > wood).
> >
> > And I have to give a grudging nod of appreciation to Juan for forcing
> > me to address the #ifNotNilDo: concerns. At first I was annoyed that I
> > had to rewrite a lot of code, but I have to admit that by the time I
> > was done I had cleaned out a lot of ugly cruft from OSProcess, so
> > I guess once again Juan has got it right :)
> >
> > Dave
> The latest changeset files in cleanly to a new Cuis2.7 image (in
> particular, I did not add any hacks for ifNotNilDo:). All 190 tests
> passed the first 2 or 3 times I ran the tests, but, after running the
> 2.6 tests below, I got various failures in 2.7.
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses02 failed
> once, and then
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04 the next
> time.
>
> The 2.6 and 2.7 tests were in different directories. In the initial 2.7
> tests only that image was up, while in all later cases both were up.
> The 2.7 image seems to do somewhat better than 2.6. When I start from a
> fresh Cuis2.6 image, I get non-deterministic failures. The first test
> suite run hung up, from memory at UnixProcessTestCase>>testCatAFile.
> The last two runs failed in
> UnixProcessWin32FileLockingTestCase>>testCooperatingProcesses04.
>
> I shut down the 2.7 image and tried running just the 2.6. There was one
> failure, in
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04.
>
> Juan, do you have any ideas what changes between 2.6 and 2.7 might be
> relevant, or what might be causing these failure patterns?

It is very possible that the problems are either in my unit tests, or
in OSProcess/CommandShell. Many of the tests involve cooperating OS
processes and are timing dependent. It would be good to understand what
has changed between 2.6 and 2.7, but my working assumption would be
that any changes in Cuis might make bugs in OSProcess/CommandShell
more probable, but are not necessarily the root cause. The file locking
tests are an example of tests that rely on cooperating OS processes,
and that could fail due to timing issues of one sort of another.

I note that CommandShellTestCase in particular is a real torture
test for process scheduling, and Pharo images have tended to choke
on these tests. Again, I don't assume that Pharo is the root cause,
just that the behavior seems to be different.

Dave

Ross Boylan-2

Re: OSProcess: successes and failures

On Fri, 2010-09-03 at 23:11 -0400, David T. Lewis wrote:

> > Juan, do you have any ideas what changes between 2.6 and 2.7 might
> be
> > relevant, or what might be causing these failure patterns?
>
> It is very possible that the problems are either in my unit tests, or
> in OSProcess/CommandShell. Many of the tests involve cooperating OS
> processes and are timing dependent. It would be good to understand
> what
> has changed between 2.6 and 2.7, but my working assumption would be
> that any changes in Cuis might make bugs in OSProcess/CommandShell
> more probable, but are not necessarily the root cause. The file
> locking
> tests are an example of tests that rely on cooperating OS processes,
> and that could fail due to timing issues of one sort of another.
>
> I note that CommandShellTestCase in particular is a real torture
> test for process scheduling, and Pharo images have tended to choke
> on these tests. Again, I don't assume that Pharo is the root cause,
> just that the behavior seems to be different.

I ran only the OSProcess tests; I did not even file the CommandShell
stuff into the image.

I didn't mean to imply that I thought Cuis was necessarily the problem.
There does seem to be a difference between 2.6 and 2.7, since 2.7 is
sometimes all green but 2.6 never has been. The changes in Cuis might
be a clue to where the problem lies.

In fact, I get failures in a stock squeak 4.1 image also.
UnixProcessAccessorTestCase>>testIsWritableForUserInGroup (1 error on
try 1)
UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04 (1
failure on try 2)
try 3: same results as try 2.

All tests use the same squeak 4.0 VM.

Tests that fail (as opposed to "error") don't seem to produce very
useful trace information in Cuis; when I click on the failed test I get
a stack trace that doesn't seem to have the error in the stack. Can
anyone provide any insight into what's going on, or how to debug this
further?

I'm not sure how to retrieve the failure stack in squeak. After some
fiddling I got a debugger to come up; it appears this may have been
triggered by a fresh run of the test. The failure occurred near the end
of testCooperatingProcesses04 at
self assert: result = 'THIS IS A TEST 44'
The actual value of result was 'THIS IS 22TEST 44'.

Because the tests and the problem are timing sensitive, simply stepping
in with the debugger doesn't seem to work well. I got a lot of errors
when I did so, I assume because various timeouts expired.

Ross

Juan Vuletich-4

Re: OSProcess and Cuis: successes and failures, cumulative changes

In reply to this post by Ross Boylan-2

Ross Boylan wrote:

> On Fri, 2010-09-03 at 19:55 -0400, David T. Lewis wrote:
>
>> On Fri, Sep 03, 2010 at 02:35:53PM -0700, Ross Boylan wrote:
>>
>>> On Fri, 2010-09-03 at 16:22 -0400, David T. Lewis wrote:
>>>
>>>>> Here's the code I put in ProtoObject
>>>>> ifNotNilDo: ifNotNilBlock
>>>>> "RB adds for back compatibility"
>>>>> "Evaluate the block, unless I'm == nil (q.v.)"
>>>>>
>>>>> ^ ifNotNilBlock valueWithPossibleArgs: {self}
>>>>>
>>> Duh. I also need, in UndefinedObject in Cuis
>>> ifNotNilDo: aBlock
>>> "RB back compatibility"
>>> ^ self
>>>
>>> Of course the right way is to change the OSProcess code to use something
>>> more portable than ifNotNilDo:.
>>>
>> Hi Ross,
>>
>> I have updated both OSProcess and CommandShell on SqueakSource to address
>> all of the #ifNotNilDo: issues for Cuis, as well as the #fork issues that
>> you previously addressed. I think that I've captured all of your fixes.
>> Thank you *very* much for this.
>>
>> I did not actually load into Cuis to check, but a lot of the issues
>> were in the unit tests so I expect they will run properly now (knock
>> wood).
>>
>> And I have to give a grudging nod of appreciation to Juan for forcing
>> me to address the #ifNotNilDo: concerns. At first I was annoyed that I
>> had to rewrite a lot of code, but I have to admit that by the time I
>> was done I had cleaned out a lot of ugly cruft from OSProcess, so
>> I guess once again Juan has got it right :)
>>
>> Dave
>>
> The latest changeset files in cleanly to a new Cuis2.7 image (in
> particular, I did not add any hacks for ifNotNilDo:). All 190 tests
> passed the first 2 or 3 times I ran the tests, but, after running the
> 2.6 tests below, I got various failures in 2.7.
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses02 failed
> once, and then
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04 the next
> time.
>
> The 2.6 and 2.7 tests were in different directories. In the initial 2.7
> tests only that image was up, while in all later cases both were up.
> The 2.7 image seems to do somewhat better than 2.6. When I start from a
> fresh Cuis2.6 image, I get non-deterministic failures. The first test
> suite run hung up, from memory at UnixProcessTestCase>>testCatAFile.
> The last two runs failed in
> UnixProcessWin32FileLockingTestCase>>testCooperatingProcesses04.
>
> I shut down the 2.7 image and tried running just the 2.6. There was one
> failure, in
> UnixProcessUnixFileLockingTestCase>>testCooperatingProcesses04.
>
> Juan, do you have any ideas what changes between 2.6 and 2.7 might be
> relevant, or what might be causing these failure patterns?
>
> Ross

I don't think that any difference between Cuis 2.6 and 2.7 could be
related to this. Most changes were in UI. The are two sets of kernel
changes. One is in numbers, where Cuis 2.7 is much closer to Squeak
(lots of cool stuff from Nicolas), and all the tests pass. The other is
Compiler / Decompiler / Debugger, and again, what I did, essentially, is
to update Cuis to latest Squeak's. I think it is quite unlikely that
there is any relation with OSProcess. You can check the tests to see if
they use the Compiler or Decompiler, though.

I'd say that the OS scheduling of OS processes could be the cause of
non-deterministic failures. Especially if you're running 2 images at the
same time. I'd try to isolate possible causes, and make the process
reproducible. I.e., set up all the images to test (various Squeak,
Pharo, Cuis, etc), use the same vm, do a reboot of the os, do the tests
on one image, writing on paper each step so it can be reproduced exactly
(noting event other running applications on the machine, which should be
kept to an absolute minimum). Write down the results. Then reboot the
machine and do exactly the same on the next image. And so on. I know,
this is terribly tedious. But proper testing requires a lot of discipline.

The alternative, is not testing, but debugging. Then you need to
understand many internals of OSProcess and the tests being run. It will
be quite some work too. But you'll end understanding it all, and fixing
any issues.

Good luck!

Cheers,
Juan Vuletich