Logsender dies - how would one monitor and restart it?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Logsender dies - how would one monitor and restart it?

Gemstone/S mailing list
I'm trying to debug a hot standby site that is not staying up to date, on
an Ubuntu Linux system.

The logsender just dies after a few hours of running, with log output like
the following:

--- 26/01/2015 02:39:55.125 SAST ---
client 0 CMD_GET_STATUS, file 21377 limit 21377.2603
controlSocket read error,
joining on dataTransmitThread
RDbfReadLogRecords file 21377 EOF
joining on fileReadThread
exited clientCommandReadMain
--- 26/01/2015 02:40:55.127 SAST ---
NetSAccept failed
acceptClient failed
main thread exiting because acceptThread shutdown


Look at the log, the "controlSocket read error" happens occasionally
(transient network errors), and the logsender normally recovers from that.

But what is different in this case "NetSAccept failed", and the logsender
does not recover from that.

The first question is why NetSAccept fails, and whether that is a bug in
the logsender, or a resource problem on the server itself.

The second question is why the whole logsender dies, and why it does not
retry and re-enter the main server loop after some delay.

The third question is how one would monitor a logsender and keep it
running. Because the logsender process is not a descendant of the
startlogsender process, tools like Daemontools, Upstart, or Systemd can
not be used to transform the startlogsender invocation into a process that
is monitored the traditional way.

Would it be OK to just re-run startlogsender every now and then, and if
logsender is running, that will fail, or if logsender died, that would
start a new one?


_______________________________________________
GemStone-Smalltalk mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Logsender dies - how would one monitor and restart it?

Gemstone/S mailing list
On the topic of running a logsender and logreceiver under Unix process
supervisors such as Daemontools:

It seems that startlogsender is just a thin wrapper that forks, background
itself, and then executes $GEMSTONE/bin/gem logsender with the remaining
command-line arguments.

That kind of backgrounding is problematic when running under process
supervisors, since they tend to assume that the service they manage has
died and needs restarting when the process they spawned quits.

One way around that is just to run "$GEMSTONE/bin/gem logsender ..."
directly under the process supervisor, but that would be an undocumented
usage of GemStone binaries.

Am I correct in my understanding that this is all startlogsender does, or
is there crucial logic that is sidestepped when running "$GEMSTONE/bin/gem
logsender ..." directly?

All my questions apply in the same way to startlogreceiver as well.

_______________________________________________
GemStone-Smalltalk mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk