Look at the log, the "controlSocket read error" happens occasionally
(transient network errors), and the logsender normally recovers from that.
But what is different in this case "NetSAccept failed", and the logsender
does not recover from that.
The first question is why NetSAccept fails, and whether that is a bug in
the logsender, or a resource problem on the server itself.
The second question is why the whole logsender dies, and why it does not
retry and re-enter the main server loop after some delay.
The third question is how one would monitor a logsender and keep it
running. Because the logsender process is not a descendant of the
startlogsender process, tools like Daemontools, Upstart, or Systemd can
not be used to transform the startlogsender invocation into a process that
is monitored the traditional way.
Would it be OK to just re-run startlogsender every now and then, and if
logsender is running, that will fail, or if logsender died, that would
start a new one?
Re: Logsender dies - how would one monitor and restart it?
On the topic of running a logsender and logreceiver under Unix process
supervisors such as Daemontools:
It seems that startlogsender is just a thin wrapper that forks, background
itself, and then executes $GEMSTONE/bin/gem logsender with the remaining
That kind of backgrounding is problematic when running under process
supervisors, since they tend to assume that the service they manage has
died and needs restarting when the process they spawned quits.
One way around that is just to run "$GEMSTONE/bin/gem logsender ..."
directly under the process supervisor, but that would be an undocumented
usage of GemStone binaries.
Am I correct in my understanding that this is all startlogsender does, or
is there crucial logic that is sidestepped when running "$GEMSTONE/bin/gem
logsender ..." directly?
All my questions apply in the same way to startlogreceiver as well.