[vwnc] ExternalProcess problems

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] ExternalProcess problems

cdavidshaffer
Platform: VW7.6, Gentoo linux 2.6.27-r8

I'm completely befuddled by this but I've managed to reproduce it in a
very simple way.  There are three processes involved:

Parent (visualworks)
  Child (visualworks)
     Grandchild (netcat -- but anything that can listen on a socket is
probably fine)

Child is spawned by Parent using ExternalProcess class>>shOne:.  Child
spawns Grandchild via ExternalProcess
class>>execute:arguments:do:errorStreamDo:.  Grandchild opens a TCP
server socket and waits for a connection.  Meanwhile Child exits via
ObjectMemory quit.  The fact that Child has exited can be verified by
grepping through 'ps aux'.  It is gone, beyond all doubt.  At this point
Parent should (in my opinion which seems to differ from reality) return
from shOne:.  In reality, however, it sits waiting for Grandchild to
exit.  Killing Grandchild frees up Parent (shOne: returns).

This is very strange behavior and is causing me lockups galore (in my
case Granchild is firefox launched to run SeasideTesting tests in an
automated test environment, Child is the image-under-test and Parent is
an image building toolset).

Attached is a complete set of scripts to reproduce the problem.  You may
have to touch up the paths a bit for your system.  Run the 'run'
script.  You'll note (in headless-transcript.log) that the child has
exited (verify with ps if you like) but Parent is waiting.  Killing the
nc instance frees up the Parent.

I'm really clueless on how to fix this (or who the culprit is).  My
guess, after looking through some strace output was that the problem in
in the use of clone().  In my case I can't modify the Grandchild so I'm
stuck trying to work around these problems in VW or somehow wrapping the
Grandchild (to insulate Child and Parent from its behavior) but I
haven't hit on a winning wrapper yet.  At the moment I'm working around
it by having Child kill Grandchild before it exits but this is
suboptimal (I end up killing firefox for no good reason).  Has anyone
else hit this problem?

David


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

ext.tar.gz (780 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

thomas.hawker
This sounds like Unix behavior.  Have you checked the descriptions of how fork() works?  Have you tried setting the SIGCHLD signal to SIGIGNORE, if VW will let you?  I'm assuming that you don't care if/when the child exits...

Cheers!
 
Tom Hawker
--------------------------
Senior Framework Developer
--------------------------
Home +1 (408) 274-4128
Office +1 (408) 576-6591
Mobile +1 (408) 835-3643
 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of C. David Shaffer
Sent: Tuesday, September 22, 2009 2:27 PM
To: [hidden email]
Subject: [vwnc] ExternalProcess problems

Platform: VW7.6, Gentoo linux 2.6.27-r8

I'm completely befuddled by this but I've managed to reproduce it in a
very simple way.  There are three processes involved:

Parent (visualworks)
  Child (visualworks)
     Grandchild (netcat -- but anything that can listen on a socket is
probably fine)

Child is spawned by Parent using ExternalProcess class>>shOne:.  Child
spawns Grandchild via ExternalProcess
class>>execute:arguments:do:errorStreamDo:.  Grandchild opens a TCP
server socket and waits for a connection.  Meanwhile Child exits via
ObjectMemory quit.  The fact that Child has exited can be verified by
grepping through 'ps aux'.  It is gone, beyond all doubt.  At this point
Parent should (in my opinion which seems to differ from reality) return
from shOne:.  In reality, however, it sits waiting for Grandchild to
exit.  Killing Grandchild frees up Parent (shOne: returns).

This is very strange behavior and is causing me lockups galore (in my
case Granchild is firefox launched to run SeasideTesting tests in an
automated test environment, Child is the image-under-test and Parent is
an image building toolset).

Attached is a complete set of scripts to reproduce the problem.  You may
have to touch up the paths a bit for your system.  Run the 'run'
script.  You'll note (in headless-transcript.log) that the child has
exited (verify with ps if you like) but Parent is waiting.  Killing the
nc instance frees up the Parent.

I'm really clueless on how to fix this (or who the culprit is).  My
guess, after looking through some strace output was that the problem in
in the use of clone().  In my case I can't modify the Grandchild so I'm
stuck trying to work around these problems in VW or somehow wrapping the
Grandchild (to insulate Child and Parent from its behavior) but I
haven't hit on a winning wrapper yet.  At the moment I'm working around
it by having Child kill Grandchild before it exits but this is
suboptimal (I end up killing firefox for no good reason).  Has anyone
else hit this problem?

David


IMPORTANT NOTICE
Email from OOCL is confidential and may be legally privileged.  If it is not
intended for you, please delete it immediately unread.  The internet
cannot guarantee that this communication is free of viruses, interception
or interference and anyone who communicates with us by email is taken
to accept the risks in doing so.  Without limitation, OOCL and its affiliates
accept no liability whatsoever and howsoever arising in connection with
the use of this email.  Under no circumstances shall this email constitute
a binding agreement to carry or for provision of carriage services by OOCL,
which is subject to the availability of carrier's equipment and vessels and
the terms and conditions of OOCL's standard bill of lading which is also
available at http://www.oocl.com.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

Alan Knight-2
In reply to this post by cdavidshaffer
I can't say that I understand the implications of everything, but why not just try using the lower-level #execute:arguments:do:errorStreamDo:, which lets you give it blocks that deal with the input and output streams, rather than blocking the calling process. It might just leave you with blocks stuck waiting on things that haven't closed yet, but that might be better than blocking your calling process.

At 05:26 PM 2009-09-22, C. David Shaffer wrote:
Platform: VW7.6, Gentoo linux 2.6.27-r8

I'm completely befuddled by this but I've managed to reproduce it in a
very simple way.  There are three processes involved:

Parent (visualworks)
  Child (visualworks)
     Grandchild (netcat -- but anything that can listen on a socket is
probably fine)

Child is spawned by Parent using ExternalProcess class>>shOne:.  Child
spawns Grandchild via ExternalProcess
class>>execute:arguments:do:errorStreamDo:.  Grandchild opens a TCP
server socket and waits for a connection.  Meanwhile Child exits via
ObjectMemory quit.  The fact that Child has exited can be verified by
grepping through 'ps aux'.  It is gone, beyond all doubt.  At this point
Parent should (in my opinion which seems to differ from reality) return
from shOne:.  In reality, however, it sits waiting for Grandchild to
exit.  Killing Grandchild frees up Parent (shOne: returns).

This is very strange behavior and is causing me lockups galore (in my
case Granchild is firefox launched to run SeasideTesting tests in an
automated test environment, Child is the image-under-test and Parent is
an image building toolset).

Attached is a complete set of scripts to reproduce the problem.  You may
have to touch up the paths a bit for your system.  Run the 'run'
script.  You'll note (in headless-transcript.log) that the child has
exited (verify with ps if you like) but Parent is waiting.  Killing the
nc instance frees up the Parent.

I'm really clueless on how to fix this (or who the culprit is).  My
guess, after looking through some strace output was that the problem in
in the use of clone().  In my case I can't modify the Grandchild so I'm
stuck trying to work around these problems in VW or somehow wrapping the
Grandchild (to insulate Child and Parent from its behavior) but I
haven't hit on a winning wrapper yet.  At the moment I'm working around
it by having Child kill Grandchild before it exits but this is
suboptimal (I end up killing firefox for no good reason).  Has anyone
else hit this problem?

David



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
In reply to this post by thomas.hawker
[hidden email] wrote:
> This sounds like Unix behavior.  Have you checked the descriptions of how fork() works?  Have you tried setting the SIGCHLD signal to SIGIGNORE, if VW will let you?  I'm assuming that you don't care if/when the child exits...
>
> Cheers!
>  
> Tom Hawker
>  
Sorry for sound dense but could you be more specific about Unix behavior
and fork()?  I've used fork() and clone() pretty extensively in the C
world and never hit this particular problem.  It seems connected to the
network layer in a subtle way (although I could be wrong about that!)  I
can't ignore the child exit in Parent since I want its execution to be
synchronous to Child.  Bash, for example, doesn't seem to have this problem:

trap 'echo saw exit' SIGCHLD && bash -c 'nc -l -p 7755 &'

produces a similar situation but the echo /is/ invoked by the parent
shell even though nc is left running the background.  I don't see the VW
behavior as typical at all.

David

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

Eliot Miranda-2
In reply to this post by cdavidshaffer
David,

    can you try and localise the problem further?  If you e.g. kick VW's reaper process in Parent (see ExternalProcess class>>startReaper) by signalling the status-changed semaphore does VW see the child's exit?  If so, either the SIGCHLD is getting lost somehow or the reaper process isn't getting to run (unlikely; it's quite high priority).  

Perhaps more than one process is waiting on the child.  I see that in UnixProcess>>releaseHandle the exit semaphore is only signalled once.  IMO it should read something like

releaseHandle
"Break the reference to the (presumably non-existant) process,
awaken any waiters."

| s |
super releaseHandle.
s := exitSemaphore.
exitSemaphore := nil.
s == nil ifFalse:
            [[s signal. s isEmpty] whileFalse]


On Tue, Sep 22, 2009 at 2:26 PM, C. David Shaffer <[hidden email]> wrote:
Platform: VW7.6, Gentoo linux 2.6.27-r8

I'm completely befuddled by this but I've managed to reproduce it in a
very simple way.  There are three processes involved:

Parent (visualworks)
 Child (visualworks)
    Grandchild (netcat -- but anything that can listen on a socket is
probably fine)

Child is spawned by Parent using ExternalProcess class>>shOne:.  Child
spawns Grandchild via ExternalProcess
class>>execute:arguments:do:errorStreamDo:.  Grandchild opens a TCP
server socket and waits for a connection.  Meanwhile Child exits via
ObjectMemory quit.  The fact that Child has exited can be verified by
grepping through 'ps aux'.  It is gone, beyond all doubt.  At this point
Parent should (in my opinion which seems to differ from reality) return
from shOne:.  In reality, however, it sits waiting for Grandchild to
exit.  Killing Grandchild frees up Parent (shOne: returns).

This is very strange behavior and is causing me lockups galore (in my
case Granchild is firefox launched to run SeasideTesting tests in an
automated test environment, Child is the image-under-test and Parent is
an image building toolset).

Attached is a complete set of scripts to reproduce the problem.  You may
have to touch up the paths a bit for your system.  Run the 'run'
script.  You'll note (in headless-transcript.log) that the child has
exited (verify with ps if you like) but Parent is waiting.  Killing the
nc instance frees up the Parent.

I'm really clueless on how to fix this (or who the culprit is).  My
guess, after looking through some strace output was that the problem in
in the use of clone().  In my case I can't modify the Grandchild so I'm
stuck trying to work around these problems in VW or somehow wrapping the
Grandchild (to insulate Child and Parent from its behavior) but I
haven't hit on a winning wrapper yet.  At the moment I'm working around
it by having Child kill Grandchild before it exits but this is
suboptimal (I end up killing firefox for no good reason).  Has anyone
else hit this problem?

David


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
Eliot Miranda wrote:
> David,
>
>     can you try and localise the problem further?  If you e.g. kick
> VW's reaper process in Parent (see ExternalProcess class>>startReaper)
> by signalling the status-changed semaphore does VW see the child's
> exit?  If so, either the SIGCHLD is getting lost somehow or the reaper
> process isn't getting to run (unlikely; it's quite high priority).
1) As you suggested, I stored the reapers semaphore in a global and
signaled it after a delay.  No progress.  Attached is the patched
startReaper.  I added UnixProcess startReaper to my parent startup
script which now looks like:

Transcript cr; show: 'Launching child...'; cr.
UnixProcess startReaper.
[(Delay forSeconds: 5) wait.
Transcript show: 'Kicking semaphore.'; cr.
(Smalltalk at: #JackTheReaper) signal] fork.
ExternalProcess shOne: 'visual /usr/local/vw7.6nc/image/visualnc.im
-headless -fileIn UnixProcess-releaseHandle.st -fileIn child.st'.
Transcript show: 'Child returned'; cr.
ObjectMemory quit.


I see the "Kicking semaphore" message but the image remains hung.


2) An strace of parent and offspring is here:

http://cdshaffer.com/david/strace.log

I believe Parent pid = 3878, Child pid = 3879 and Grandchild pid = 3880.

At around line 3672 I see Parent receive SIGCHLD and call waitpid on it.

>
> Perhaps more than one process is waiting on the child.  I see that in
> UnixProcess>>releaseHandle the exit semaphore is only signalled once.
>  IMO it should read something like
>
> releaseHandle
> "Break the reference to the (presumably non-existant) process,
> awaken any waiters."
>
> | s |
> super releaseHandle.
> s := exitSemaphore.
> exitSemaphore := nil.
> s == nil ifFalse:
>             [[s signal. s isEmpty] whileFalse]
>
For good measure I applied this as a patch to Parent and Child
(attached) during the above tests.

David


<?xml version="1.0"?>

<st-source>
<time-stamp>From VisualWorks® NonCommercial, 7.6 of March 3, 2008 on September 22, 2009 at 4:27:36 pm</time-stamp>


<methods>
<class-id>OS.ExternalProcess class</class-id> <category>process reaping</category>

<body package="OS-ExternalProcess" selector="startReaper">startReaper
        "Start the child-reap process."
        "ExternalProcess defaultClass startReaper"

        | sem |
        self stopReaper.
        sem := Semaphore new.
        Transcript show: 'Storing reaper'; cr.
        Smalltalk at: #JackTheReaper put: sem.
        self setStatusChangedSemaphore: sem.
        Reaper :=
                        [[sem wait.
                        self reapSome] repeat] forkAt: Processor lowIOPriority.
        Reaper setIsSystemProcess.
        Reaper name: 'ExternalProcessReaper'.
</body>
</methods>

</st-source>

<?xml version="1.0"?>

<st-source>
<time-stamp>From VisualWorks® NonCommercial, 7.6 of March 3, 2008 on September 22, 2009 at 4:19:05 pm</time-stamp>


<methods>
<class-id>OS.UnixProcess</class-id> <category>private-initialize/release</category>

<body package="OS-ExternalProcess" selector="releaseHandle">releaseHandle
"Break the reference to the (presumably non-existant) process,
awaken any waiters."

| s |
super releaseHandle.
s := exitSemaphore.
exitSemaphore := nil.
s == nil ifFalse:
            [[s signal. s isEmpty] whileFalse]</body>
</methods>

</st-source>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
Let me add that it seems to have nothing to do with network I/O (sorry
for the misdirection, my test case for eliminating general processes as
a problem was flawed so I thought it was network connected).  So, if you
modify child to use:

ExternalProcess
  execute: 'sleep'
  arguments: #('500')
  do: [:in :out | in close. out close]
  errorStreamDo: [:err | err close].

you will have the same problem.

David

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

Eliot Miranda-2
In reply to this post by cdavidshaffer


On Tue, Sep 22, 2009 at 4:43 PM, C. David Shaffer <[hidden email]> wrote:
Eliot Miranda wrote:
> David,
>
>     can you try and localise the problem further?  If you e.g. kick
> VW's reaper process in Parent (see ExternalProcess class>>startReaper)
> by signalling the status-changed semaphore does VW see the child's
> exit?  If so, either the SIGCHLD is getting lost somehow or the reaper
> process isn't getting to run (unlikely; it's quite high priority).
1) As you suggested, I stored the reapers semaphore in a global and
signaled it after a delay.  No progress.  Attached is the patched
startReaper.  I added UnixProcess startReaper to my parent startup
script which now looks like:

Transcript cr; show: 'Launching child...'; cr.
UnixProcess startReaper.
[(Delay forSeconds: 5) wait.
Transcript show: 'Kicking semaphore.'; cr.
(Smalltalk at: #JackTheReaper) signal] fork.
ExternalProcess shOne: 'visual /usr/local/vw7.6nc/image/visualnc.im
-headless -fileIn UnixProcess-releaseHandle.st -fileIn child.st'.
Transcript show: 'Child returned'; cr.
ObjectMemory quit.

You need to kick  JackTheReaper _after_ spawning the child.  Kicking it before there is a process to reap won't achieve anything.


I see the "Kicking semaphore" message but the image remains hung.


2) An strace of parent and offspring is here:

http://cdshaffer.com/david/strace.log

I believe Parent pid = 3878, Child pid = 3879 and Grandchild pid = 3880.

At around line 3672 I see Parent receive SIGCHLD and call waitpid on it.

Then verify whether the semaphore is signalled or not.  If it is not then either a) there is some problem in the VM which causes  the signal not to be translated into a signal of the reaper semaphore or b) an image bug where the wrong semaphore is being registered with the VM.

>
> Perhaps more than one process is waiting on the child.  I see that in
> UnixProcess>>releaseHandle the exit semaphore is only signalled once.
>  IMO it should read something like
>
> releaseHandle
> "Break the reference to the (presumably non-existant) process,
> awaken any waiters."
>
> | s |
> super releaseHandle.
> s := exitSemaphore.
> exitSemaphore := nil.
> s == nil ifFalse:
>             [[s signal. s isEmpty] whileFalse]
>
For good measure I applied this as a patch to Parent and Child
(attached) during the above tests.

David


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
Eliot Miranda wrote:

>
>
>     Transcript cr; show: 'Launching child...'; cr.
>     UnixProcess startReaper.
>     [(Delay forSeconds: 5) wait.
>     Transcript show: 'Kicking semaphore.'; cr.
>     (Smalltalk at: #JackTheReaper) signal] fork.
>     ExternalProcess shOne: 'visual
>     /usr/local/vw7.6nc/image/visualnc.im <http://visualnc.im>
>     -headless -fileIn UnixProcess-releaseHandle.st -fileIn child.st
>     <http://child.st>'.
>     Transcript show: 'Child returned'; cr.
>     ObjectMemory quit.
>
>
> You need to kick  JackTheReaper _after_ spawning the child.  Kicking
> it before there is a process to reap won't achieve anything.
Maybe you didn't see the Delay in the forked block?  It should be long
enough for the ExternalProcess to be launched.  I can't actually fork
the block after I call shOne: since that call never returns.
>
> Then verify whether the semaphore is signalled or not.  If it is not
> then either a) there is some problem in the VM which causes  the
> signal not to be translated into a signal of the reaper semaphore or
> b) an image bug where the wrong semaphore is being registered with the VM.
>
Thanks for walking be through it.  Yes, the signal is being delivered to
to the image.  The attached patch prints the proper "process done"
message as a result of the signal.

As Alan suggested, it looks like the image is waiting for a SIGIO that
is never delivered.  Killing the Grandchild causes it.  Seems like an
odd connection with Linux async IO and clone()???

David

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
C. David Shaffer wrote:
>
> The attached patch prints the proper "process done"
> message as a result of the signal.
>  
gosh darn it.

-attached.


<?xml version="1.0"?>

<st-source>
<time-stamp>From VisualWorks® NonCommercial, 7.6 of March 3, 2008 on September 22, 2009 at 7:10:00 pm</time-stamp>


<methods>
<class-id>OS.UnixProcess</class-id> <category>private-initialize/release</category>

<body package="OS-ExternalProcess" selector="done:with:">done: status with: usig
        "Handle a process which has exited."
        "Record the status (non-zero usig means terminated due to signal)
        and cut the process loose (doesn't need watching anymore)."
        Transcript show: 'Process done ' , self key printString; cr.
        exitStatus := usig = 0
                                ifTrue: [status]
                                ifFalse: [usig negated].
        self releaseHandle</body>
</methods>

</st-source>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
jas
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

jas
In reply to this post by cdavidshaffer
At 02:26 PM 9/22/2009, C. David Shaffer wrote:

>Platform: VW7.6, Gentoo linux 2.6.27-r8
>
>I'm completely befuddled by this but I've managed to reproduce it in a
>very simple way.  There are three processes involved:
>
>Parent (visualworks)
>  Child (visualworks)
>     Grandchild (netcat -- but anything that can listen on a socket is
>probably fine)
>
>Child is spawned by Parent using ExternalProcess class>>shOne:.


And so waits for offspring to finish.
So far so good.


>Child spawns Grandchild via
>ExternalProcess class>>execute:arguments:do:errorStreamDo:.


I'm a little rusty here, but this sounds like a fork?


>Grandchild opens a TCP server socket and waits for a connection.
>Meanwhile Child exits via ObjectMemory quit.


Ok, so it must be a fork.


>The fact that Child has exited can be verified by
>grepping through 'ps aux'.  It is gone, beyond all doubt.


Ok so far.


>At this point Parent should
>(in my opinion which seems to differ from reality)
>return from shOne:.


Hmmm.  I'd expect parent
                (which is waiting
                 for children
                  -- ALL children --
                 to exit
                )
would continue waiting until grandchild completes.
Unless/until the grandchild is emancipated,
which is *not* the usual thing.

Your later example, using bash as child
and nc as grandchild, is different.

IIRC, you've got bash doing a 'background fork' of nc,
(the & at the end), which emancipates the grandchild,
leaving the parent waiting on bash and only bash,
whereas the vw case does a normal fork, and when
your child exits, the parent reacquires responsibility
for all the child's unemancipated offspring.

Since the parent is waiting for all offspring
to finish, and the offspring haven't finished,
the parent continues waiting.

A simple fix would be to <bash> your example
into working the way you intend:

        Parent spawns child via #cshOne:.
        Child spawns grandchild via something like

                cshOne: 'bash -c ''originalGrandchild &'''

Child waits on bash, which does a background
fork of the grandchild and exits, allowing
child to exit, allowing parent to proceed.

Or, you could look up process groups,
and methods for escaping therefrom,
and find a way to convey emancipation
after a fork, from within VW.


CAVEAT - I don't know if your OS variant
actually does anything about process groups.


Regards,


-cstb


> In reality, however, it sits waiting for Grandchild to
>exit.  Killing Grandchild frees up Parent (shOne: returns).


Makes sense to me.
But then, so do I.  ;-)
YMMV...

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

Reinout Heeck-2
cstb wrote:
>
> Or, you could look up process groups,
> and methods for escaping therefrom,
> and find a way to convey emancipation
> after a fork, from within VW.
>  
I toyed with that that ages ago (vw3), see #setsid here:
 
http://web.archive.org/web/20041010012021/wiki.cs.uiuc.edu/VisualWorks/Running+in+the+background+on+Unix



R
-


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
In reply to this post by jas



Makes sense to me.
But then, so do I.  ;-)
YMMV... 

  
:-)

Sorry for posting in HTML...need a fixed width font.

For further clarification here's the output of ps axjf for this cluster of processes.  I'm running in strace as you can see.

 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
 5282  5317  5317  5317 pts/5     5694 Ss    1000   0:00  \_ bash
 5317  5694  5694  5317 pts/5     5694 S+    1000   0:00      \_ /bin/sh ./run
 5694  5695  5694  5317 pts/5     5694 S+    1000   0:00          \_ strace -f visual /usr/local/vw7.6nc/image/visualnc.im -run
 5695  5696  5694  5317 pts/5     5694 S+    1000   0:00              \_ visual /usr/local/vw7.6nc/image/visualnc.im -runtime -
    1  5701  5701  5701 ?           -1 Ss    1000   0:00 sleep 500


Note the sleep line.

David


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

Eliot Miranda-2
In reply to this post by cdavidshaffer


On Tue, Sep 22, 2009 at 7:20 PM, C. David Shaffer <[hidden email]> wrote:
Eliot Miranda wrote:
>
>
>     Transcript cr; show: 'Launching child...'; cr.
>     UnixProcess startReaper.
>     [(Delay forSeconds: 5) wait.
>     Transcript show: 'Kicking semaphore.'; cr.
>     (Smalltalk at: #JackTheReaper) signal] fork.
>     ExternalProcess shOne: 'visual
>     /usr/local/vw7.6nc/image/visualnc.im <http://visualnc.im>
>     -headless -fileIn UnixProcess-releaseHandle.st -fileIn child.st
>     <http://child.st>'.
>     Transcript show: 'Child returned'; cr.
>     ObjectMemory quit.
>
>
> You need to kick  JackTheReaper _after_ spawning the child.  Kicking
> it before there is a process to reap won't achieve anything.
Maybe you didn't see the Delay in the forked block?

Doh!  I find increasingly I scan emails without properly reading them and post really stupid replies as a result.  Must try to do better.  Sorry.

 
 It should be long
enough for the ExternalProcess to be launched.  I can't actually fork
the block after I call shOne: since that call never returns.
>
> Then verify whether the semaphore is signalled or not.  If it is not
> then either a) there is some problem in the VM which causes  the
> signal not to be translated into a signal of the reaper semaphore or
> b) an image bug where the wrong semaphore is being registered with the VM.
>
Thanks for walking be through it.  Yes, the signal is being delivered to
to the image.  The attached patch prints the proper "process done"
message as a result of the signal.

As Alan suggested, it looks like the image is waiting for a SIGIO that
is never delivered.  Killing the Grandchild causes it.  Seems like an
odd connection with Linux async IO and clone()???

David

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] ExternalProcess problems

cdavidshaffer
In reply to this post by cdavidshaffer


[hidden email] wrote:
> FWIW, isn't clone() the same as BSD's vfork() - a virtually efficient fork() call that doesn't copy memory in the expectation that it will be immediately followed by an exec*() call?
>
> Cheers!
>  
I don't know vfork() but my guess is "maybe."  clone() is like fork()
but the resulting process shares a good bit with the parent (including
memory, signal handlers/masks etc).  The reason I /thought/ the
distinction might be important is that a clone() process does not
necessarily produce SIGCHLD to its parent when it exits.  It can
produce, in fact, any signal or no signal at all (the signal one of the
arguments to clone()).  I thought this might be related to the problem I
was having but in going back and forth with Eliot in this thread I now I
see that it isn't.

The problem I'm having seems to be connected to I/O handles in some
way.  Process groups/sessions were suggested as possibly connected but I
still don't see that one.  What seems to happen is that VW creates what
it calls a "pipe" between parent and child (I put it in quotes because
it doesn't seem to be a UNIX pipe...have to dig deeper on this one).  In
the VW VM's usual style, asnc-I/O is used on this pipe.  When Child
exits, Linux never delivers SIGIO (or even SIGPIPE) to Parent.  This is
why Parent gets stuck.  I have verified that Parent thinks Child has
exited (UnixProcess>>isActive produces false) but it sits blocked in the
read.  I think I can reproduce this without async-IO in a simple
clone()/exec C program but I haven't hit it yet.

This might seems like a silly little corner case to a lot of people but
I can think of lots of server arrangements for which this would cause
problems.  This connection to the Grandchild is subtle, undocumented and
probably platform-dependent behavior.  For example, a lot of people
assumed parent and Child and Grandchild were in the same session when,
in fact, the VW VM calls setsid() right after clone().  I'm hoping to
correct it by making the parent-child arrangement more explicit (and
hence probably more platform dependent but definitely less subtle).

Reinout's sample code has made me less afraid to throw some manual C
callouts into the mix to try to patch things up once I understand the
problem better.

Thanks for everyone's suggestions!  Please keep them coming as you think
of possibilities...

David



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc