Re: [Vm-dev] [OSProcess] forking and file descriptors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Max Leske
(Resending with proper subject…)

On 08 Jan 2015, at 20:48, [hidden email] wrote:

Date: Thu, 8 Jan 2015 16:56:30 +0100
From: Henrik Johansen <[hidden email]>
Subject: [squeak-dev] Re: [Vm-dev] [OSProcess] forking and file
descriptors


On 08 Jan 2015, at 11:37 , Max Leske <[hidden email]> wrote:


Hi

We currently use ImageSegment to create snapshots of our object graphs. To ensure consistency (and for performance reasons) we create a fork of the image and then run the segment creation in the fork. We’ve always had minor issues with TCP sockets but they are pretty rare and have never corrupted any data (we close the TCP connections in the child).

Recently however, we created a new application which also makes heavy use of a database and now it seems that forking creates a real problem. In anticipation of possible problems I opted to destroy all sockets (with Socket>>destroy) in the fork, thinking that, since all file descriptors are copies of the ones in the parent process, the sockets in the parent process should be unaffected [1], [2].
With that mechanism in place however, we are seeing very weird things, such as multiples sockets in the parent (!) having the same file handle (which leads to the wrong data being read from the database and, in turn, corrupt objects).

AFAICT, the OSProcess plugin doesn’t offer any way of dealing with such problems so I was wondering if anybody has had any experience with these kinds of issues and whether there is some kind of best practice.

I am aware that the most simple option is to close the sockets in the parent before forking, but that will mean that we would have to wait for all database connections to finish executing, then blocking them to prevent new connections to the database. Depending on the time a query takes (which may well be a couple of seconds in our case) clients would need to wait for quite a long time before their request can be answered (and this scenario of course assumes that we only close the database sockets and leave the TCP sockets open…).

So under the condition that I need to fork that image, what is the best way to deal with open file descriptors?


Thanks for your time.
Max

[1] http://man7.org/linux/man-pages/man2/fork.2.html
[2] http://man7.org/linux/man-pages/man2/clone.2.html

Well...
If I understand the source correctly (at least on Unix, https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c <https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c>)
The socketHandle in a Socket instance is a pointer to a private (platform-specific) struct.
That struct again has a handle to the native socket, which I assume is what gets copied when you fork a process?

Socket >> primDestroySocket frees the memory pointed to by socketHandle.

Hm… that gives me an idea: assume that everything works as advertised and that the child process gets copies of the socket descriptors (which can be closed safely without intefering with the parent). If I’m right, the Socket instances in the image hold on to the address of the *parent* handle (in an inst var). So now, when I close a socket with #primSocketDestroy:, the handle passed to the plugin will be the handle of the parent socket (although it sounds strange that the child should be able to close a file descriptor of its parent…).

That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.

Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).

So, are you using clone or fork to create a fork of the image? 

OSProcess uses plain fork() (in forkSqueak()). That’s what I use from the image.

If their memory is shared (clone) instead of copied (fork), you might be kicking the feet out from under the parent image as well, so to speak…

From my understanding the file descriptors should be copied (fork). So that shouldn’t happen (but see above…).


Thanks Henry.


Cheers,
Henry



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Max Leske
(Resending with proper subject…)

On 08 Jan 2015, at 20:48, [hidden email] wrote:

From: "David T. Lewis" <[hidden email]>
Subject: [squeak-dev] Re: [Vm-dev] [OSProcess] forking and file
descriptors

On Thu, Jan 08, 2015 at 11:37:48AM +0100, Max Leske wrote:

Hi

We currently use ImageSegment to create snapshots of our object graphs. To ensure consistency (and for performance reasons) we create a fork of the image and then run the segment creation in the fork. We???ve always had minor issues with TCP sockets but they are pretty rare and have never corrupted any data (we close the TCP connections in the child).

Recently however, we created a new application which also makes heavy use of a database and now it seems that forking creates a real problem. In anticipation of possible problems I opted to destroy all sockets (with Socket>>destroy) in the fork, thinking that, since all file descriptors are copies of the ones in the parent process, the sockets in the parent process should be unaffected [1], [2].
With that mechanism in place however, we are seeing very weird things, such as multiples sockets in the parent (!) having the same file handle (which leads to the wrong data being read from the database and, in turn, corrupt objects).

AFAICT, the OSProcess plugin doesn???t offer any way of dealing with such problems so I was wondering if anybody has had any experience with these kinds of issues and whether there is some kind of best practice.

I am aware that the most simple option is to close the sockets in the parent before forking, but that will mean that we would have to wait for all database connections to finish executing, then blocking them to prevent new connections to the database. Depending on the time a query takes (which may well be a couple of seconds in our case) clients would need to wait for quite a long time before their request can be answered (and this scenario of course assumes that we only close the database sockets and leave the TCP sockets open???).

So under the condition that I need to fork that image, what is the best way to deal with open file descriptors?


Thanks for your time.
Max

[1] http://man7.org/linux/man-pages/man2/fork.2.html
[2] http://man7.org/linux/man-pages/man2/clone.2.html

The only file descriptor (socket or file) that is directly controlled by
forkSqueak is the socket connection to the X11 display. That is done via
the XDisplayControlPlugin. Everything else needs to be handled by the
image (including the changes file BTW).

True, I’d never thought of that…


For connections such as those to a database, I think that you would want to
maintain complete control of this in your image, such that you would ensure
that you have one and only one of the images interacting with any given socket.

You can probably do this either before or after forking, whichever might
make more sense to you. If you handle it after the forkSqueak, use the
result of the forkSqueak to determine which image is the child and which
is the parent.

That’s what I’m trying to do. The problem I face is that the database queries run in a separate process (Smalltalk process in the VM). Once I’m in the child process, there’s always a small window in which the scheduler might run that process before I can terminate it or do something else.

My understanding from the C code (from reading man fork(2) and man clone(2)) is that sending Socket>>destroy (which does a close()) shouldn’t have any effect on the socket in the parent. My guess is that things get hairy when the process with the database queries gets processor time before I can destroy the sockets. When that happens, I suddenly have two processes reading from AND writing to the same socket (which can’t be good…).

I just realized, there are BlockClosure>>valueUnpreemptively and BlockClosure>>valueUninterruptably. Maybe a can wrap the the code in the fork into such a block somehow to prevent the other processes from running?

Thanks Dave.


I don't know if this helps, if not please keep askng questions.

Dave



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Henrik Sperre Johansen
In reply to this post by Max Leske

> On 09 Jan 2015, at 9:32 , Max Leske <[hidden email]> wrote:
>
>

> That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.
>
> Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).

Just beware you might run into the issue that resuming a processes waiting for a semaphore will proceed as if the semaphore were signalled,
Can't tell offhand if that would actually be a problem in this case, or if the affected processes would promptly resume waiting after socket read/writes may initiate with no data.

Another (probably non-portable, which would be painful if not consistent across platforms) option:
Forget about image-side handling, and alter the SocketPlugin to set FD_CLOEXEC if available when opening sockets. (It's in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html , but that's rather new)
On newer Linuxen, you also have SOCK_CLOEXEC for socket(), which opens/sets in an atomic operation, but the race condition avoided by that is hardly relevant in our case.

Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Max Leske
In reply to this post by Max Leske

On 09 Jan 2015, at 10:25, [hidden email] wrote:

Date: Fri, 9 Jan 2015 10:24:42 +0100
From: Henrik Johansen <[hidden email]>
Subject: Re: [squeak-dev] [Vm-dev] [OSProcess] forking and file
descriptors


On 09 Jan 2015, at 9:32 , Max Leske <[hidden email]> wrote:



That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.

Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).

Just beware you might run into the issue that resuming a processes waiting for a semaphore will proceed as if the semaphore were signalled,
Can't tell offhand if that would actually be a problem in this case, or if the affected processes would promptly resume waiting after socket read/writes may initiate with no data.

I wouldn’t resume any of those processes if I can help it. After creating the segment the child kills itself. Thanks for pointing that out though.


Another (probably non-portable, which would be painful if not consistent across platforms) option: 
Forget about image-side handling, and alter the SocketPlugin to set FD_CLOEXEC if available when opening sockets. (It's in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html , but that's rather new) 
On newer Linuxen, you also have SOCK_CLOEXEC for socket(), which opens/sets in an atomic operation, but the race condition avoided by that is hardly relevant in our case.

True. But I really don’t want to maintain a branch of OSProcess :)


Cheers,
Henry 



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Henrik Sperre Johansen

On 09 Jan 2015, at 10:44 , Max Leske <[hidden email]> wrote:


On 09 Jan 2015, at 10:25, [hidden email] wrote:

Date: Fri, 9 Jan 2015 10:24:42 +0100
From: Henrik Johansen <[hidden email]>
Subject: Re: [squeak-dev] [Vm-dev] [OSProcess] forking and file
descriptors


On 09 Jan 2015, at 9:32 , Max Leske <[hidden email]> wrote:



That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.

Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).

Just beware you might run into the issue that resuming a processes waiting for a semaphore will proceed as if the semaphore were signalled,
Can't tell offhand if that would actually be a problem in this case, or if the affected processes would promptly resume waiting after socket read/writes may initiate with no data.

I wouldn’t resume any of those processes if I can help it. After creating the segment the child kills itself. Thanks for pointing that out though.

I was thinking you'd need to suspend the processes in the parent, before forking off a child, (to ensure there will be no possibility of a process running before it's suspended in the child) then resume after forking?



Another (probably non-portable, which would be painful if not consistent across platforms) option: 
Forget about image-side handling, and alter the SocketPlugin to set FD_CLOEXEC if available when opening sockets. (It's in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html , but that's rather new) 
On newer Linuxen, you also have SOCK_CLOEXEC for socket(), which opens/sets in an atomic operation, but the race condition avoided by that is hardly relevant in our case.

True. But I really don’t want to maintain a branch of OSProcess :)
It wouldn't be a branch of OSProcess, but an update to the platform SocketPlugin sources.

I was probably wrong and it's not so new (wouldn't it be nice if all API's had the same structure as Postgres, where checking previous versions is *really* easy?), the 2004 version of the standard also includes it. A cursory check indicates at least OSX/*BSD/*Solaris would support it as well. 
Doing the equivalent on Windows seems to have certain caveats though... 

Cheers,
Henry



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

Max Leske
In reply to this post by Max Leske

On 09 Jan 2015, at 10:55, [hidden email] wrote:

Date: Fri, 9 Jan 2015 10:55:22 +0100
From: Henrik Johansen <[hidden email]>
Subject: Re: [squeak-dev] Re: [Vm-dev] [OSProcess] forking and file
descriptors

On 09 Jan 2015, at 10:44 , Max Leske <[hidden email]> wrote:


On 09 Jan 2015, at 10:25, [hidden email] <[hidden email]> wrote:

Date: Fri, 9 Jan 2015 10:24:42 +0100
From: Henrik Johansen <[hidden email] <[hidden email]>>
Subject: Re: [squeak-dev] [Vm-dev] [OSProcess] forking and file
descriptors


On 09 Jan 2015, at 9:32 , Max Leske <[hidden email] <[hidden email]>> wrote:



That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.

Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).

Just beware you might run into the issue that resuming a processes waiting for a semaphore will proceed as if the semaphore were signalled,
Can't tell offhand if that would actually be a problem in this case, or if the affected processes would promptly resume waiting after socket read/writes may initiate with no data.

I wouldn’t resume any of those processes if I can help it. After creating the segment the child kills itself. Thanks for pointing that out though.

I was thinking you'd need to suspend the processes in the parent, before forking off a child, (to ensure there will be no possibility of a process running before it's suspended in the child) then resume after forking?

I’m hoping that I can get around that by using #valueUninterruptibly (or similar). It would be much nicer to leave the parent as it is and do everything related to the snapshot in the child. But maybe I’ll have to suspend the processes in the parent in the end.




Another (probably non-portable, which would be painful if not consistent across platforms) option: 
Forget about image-side handling, and alter the SocketPlugin to set FD_CLOEXEC if available when opening sockets. (It's in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html> , but that's rather new) 
On newer Linuxen, you also have SOCK_CLOEXEC for socket(), which opens/sets in an atomic operation, but the race condition avoided by that is hardly relevant in our case.

True. But I really don’t want to maintain a branch of OSProcess :)
It wouldn't be a branch of OSProcess, but an update to the platform SocketPlugin sources.

I was probably wrong and it's not so new (wouldn't it be nice if all API's had the same structure as Postgres, where checking previous versions is *really* easy?), the 2004 version of the standard also includes it. A cursory check indicates at least OSX/*BSD/*Solaris would support it as well. 
Doing the equivalent on Windows seems to have certain caveats though... 
http://stackoverflow.com/questions/12058911/can-tcp-socket-handles-be-set-not-inheritable

I’m not entirely sure that what you are suggesting will work, at least not with FD_CLOEXEC. man fcntl says:

File descriptor flags

       The following commands manipulate the flags associated with a file
       descriptor.  Currently, only one such flag is defined: 
FD_CLOEXEC
,
       the close-on-exec flag.  If the 
FD_CLOEXEC 
bit is 0, the file
       descriptor will remain open across an 
execve(2)
, otherwise it will be
       closed.

IIRC fork doesn’t use exec, so this flag doesn’t change anything for my scenario. [1] seems to confirm this.

Cheers,
Max



Cheers,
Henry



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [OSProcess] forking and file descriptors

David T. Lewis
In reply to this post by Max Leske
On Fri, Jan 09, 2015 at 09:32:53AM +0100, Max Leske wrote:
>
> OSProcess uses plain fork() (in forkSqueak()). That???s what I use from the image.
>
> > If their memory is shared (clone) instead of copied (fork), you might be kicking the feet out from under the parent image as well, so to speak???
>
> From my understanding the file descriptors should be copied (fork). So that shouldn???t happen (but see above???).
>

The method comment in UnixOSProcessPlugin>>forkSqueak may be helpful, so I
will copy it here:


forkSqueak
        "Fork a child process, and continue running squeak in the child process.
        Answer the result of the fork() call, either the child pid or zero.

        After calling fork(), two OS processes exist, one of which is the child of the other. On
        systems which implement copy-on-write memory management, and which support the
        fork() system call, both processes will be running Smalltalk images, and will be sharing
        the same memory space. In the original OS process, the resulting value of pid is the
        process id of the child process (a non-zero integer). In the child process, the value of
        pid is zero.

        The child recreates sufficient external resources to continue running. This is done by
        attaching to a new X session. The child is otherwise a copy of the parent process, and
        will continue executing the Smalltalk image at the same point as its parent. The return
        value of this primitive may be used by the two running Smalltalk images to determine
        which is the parent and which is the child.

        The child should not depend on using existing connections to external resources. For
        example, the child may lose its connections to stdin, stdout, and stderr after its parent
        exits.

        The new child image does not start itself from the image in the file system; rather it is
        a clone of the parent image as it existed at the time of primitiveForkSqueak. For this
        reason, the parent and child should agree in advance as to whom is allowed to save the
        image to the file system, otherwise one Smalltalk may overwrite the image of the other.

        This is a simple call to fork(), rather than the more common idiom of vfork() followed
        by exec(). The vfork() call cannot be used here because it is designed to be followed by
        an exec(), and its semantics require the parent process to wait for the child to exit. See
        the BSD programmers documentation for details."

        | pid intervalTimer saveIntervalTimer |
        <export: true>
        <returnTypeC: 'pid_t'>
        <var: 'pid' type: 'pid_t'>
        <var: 'intervalTimer' type: 'struct itimerval'>
        <var: 'saveIntervalTimer' type: 'struct itimerval'>

        "Turn off the interval timer. If this is not done, then the program which we exec in
        the child process will receive a timer interrupt, and will not know how to handle it."
        self cCode: 'intervalTimer.it_interval.tv_sec = 0'.
        self cCode: 'intervalTimer.it_interval.tv_usec = 0'.
        self cCode: 'intervalTimer.it_value.tv_sec = 0'.
        self cCode: 'intervalTimer.it_value.tv_usec = 0'.
        self cCode: 'setitimer (ITIMER_REAL, &intervalTimer, &saveIntervalTimer)'.
        pid := self fork.

        "Enable the timer again before resuming Smalltalk."
        self cCode: 'setitimer (ITIMER_REAL, &saveIntervalTimer, 0L)'.
        ^ pid