I'm trying to debug a hot standby site that is not staying up to date, on
an Ubuntu Linux system.
The logsender just dies after a few hours of running, with log output like
the following:
--- 26/01/2015 02:39:55.125 SAST ---
client 0 CMD_GET_STATUS, file 21377 limit 21377.2603
controlSocket read error,
joining on dataTransmitThread
RDbfReadLogRecords file 21377 EOF
joining on fileReadThread
exited clientCommandReadMain
--- 26/01/2015 02:40:55.127 SAST ---
NetSAccept failed
acceptClient failed
main thread exiting because acceptThread shutdown
Look at the log, the "controlSocket read error" happens occasionally
(transient network errors), and the logsender normally recovers from that.
But what is different in this case "NetSAccept failed", and the logsender
does not recover from that.
The first question is why NetSAccept fails, and whether that is a bug in
the logsender, or a resource problem on the server itself.
The second question is why the whole logsender dies, and why it does not
retry and re-enter the main server loop after some delay.
The third question is how one would monitor a logsender and keep it
running. Because the logsender process is not a descendant of the
startlogsender process, tools like Daemontools, Upstart, or Systemd can
not be used to transform the startlogsender invocation into a process that
is monitored the traditional way.
Would it be OK to just re-run startlogsender every now and then, and if
logsender is running, that will fail, or if logsender died, that would
start a new one?
_______________________________________________
GemStone-Smalltalk mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk