On 07/23/2009 01:21 AM, Stefan Schmiedl wrote:
> On Thu, 23 Jul 2009 00:48:10 +0200 > Paolo Bonzini<[hidden email]> wrote: > >> Please send the script so I can try to reproduce this on your image. :-) Actually it's enough to do this: Open firefox http://localhost:4080/ Close firefox netstat | grep 4080 -> CLOSE_WAIT Open firefox http://localhost:4080/ Close firefox netstat | grep 4080 -> CLOSE_WAIT * 2 It's indeed a very serious bug. I think I have a fix (in Swazoo only) but I want to check it against upstream Swazoo and send it to Janko. And I had a Sport part too in the fix, but now it's not in my tree and the test seems to run smoothly. It's time we both go to bed now, at least I can't think straight. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
On Thu, 16 Jul 2009 15:33:16 +0200
Stefan Schmiedl <[hidden email]> wrote: > You might recall my building a tool for online testing > with Iliad and GNU Smalltalk. Today was "the" day, and > here is what happened. Today I updated Iliad from svn, gst from git and added some code to mech.rb to also load the images linked into the pages. First of all: The socket situation is drastically improved. Especially, if changing net.ipv4.tcp_fin_timeout = 60 to net.ipv4.tcp_fin_timeout = 5 Now I only have <n> client sockets chilling out after I run ruby mech.rb <n> The bad news is that I still get the "Socket accept error: Error while trying to accept a socket connection" The neutral news is that I need to run ruby mech.rb 54 to provoke it, whereas in the originial setting 25 clients managed to raise it. I'm not sure if the fact that I'm running it on my development box instead of the server is the cause of this. The good news is that I get that error *every* single time, towards the end of the run. The *really interesting* news is watching the output of while true; do netstat | grep tcp | wc -l ; sleep 2 ; done while running ruby mech.rb 54 ... 5 "background" sockets, pop, nntp etc. 113 = 5 + 54*2, makes sense 113 113 113 113 113 113 113 113 113 112 ah, the first mech agent has finished 110 107 105 101 97 94 89 86 76 73 * around here the error message from gst appears, why now? 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 65 64 65 52 52 52 52 5 about 60s after the last request (30 lines) the client sockets are closed Oh btw, if you happen to have atop on your computer, and I don't see a reason why you should not, run the following line atop -m -a -P PRM 2 | grep gst-remote in another window. It shows the following (slightly redacted) output for this run. I've annotated virtual and resisdent memory size and growth, as well as the number of minor and major page faults. VSIZE RSIZE VGR RGR mpf MPF 13:09:04 2 22013 (gst-remote) R 4096 831484 28680 10 3916 9224 8753 0 13:09:06 2 22013 (gst-remote) R 4096 830636 27664 10 -848 -1016 14302 0 13:09:08 2 22013 (gst-remote) R 4096 830540 27720 10 -96 56 11908 0 13:09:10 2 22013 (gst-remote) R 4096 830636 27760 10 96 40 12788 0 13:09:12 2 22013 (gst-remote) R 4096 831220 28352 10 584 592 11529 0 13:09:14 2 22013 (gst-remote) R 4096 830572 27716 10 -648 -636 11298 0 13:09:16 2 22013 (gst-remote) R 4096 831612 30900 10 1040 3184 12433 0 13:09:18 2 22013 (gst-remote) R 4096 830184 33600 10 -1428 2700 10581 0 13:09:20 2 22013 (gst-remote) R 4096 830444 38176 10 260 4576 9844 0 13:09:22 2 22013 (gst-remote) R 4096 830184 41492 10 -260 3316 8749 0 13:09:24 2 22013 (gst-remote) R 4096 830184 45408 10 0 3916 8953 0 13:09:26 2 22013 (gst-remote) R 4096 830184 51176 10 0 5768 9888 0 13:09:28 2 22013 (gst-remote) R 4096 830768 57456 10 584 6280 9401 0 13:09:30 2 22013 (gst-remote) R 4096 830184 60196 10 -584 2740 10673 0 13:09:32 2 22013 (gst-remote) R 4096 831936 65156 10 1752 4960 10783 0 13:09:34 2 22013 (gst-remote) R 4096 830768 71640 10 -1168 6484 8509 0 13:09:36 2 22013 (gst-remote) R 4096 830184 75320 10 -584 3680 10599 0 13:09:38 2 22013 (gst-remote) R 4096 830184 75684 10 0 364 10829 0 13:09:40 2 22013 (gst-remote) R 4096 830184 76112 10 0 428 8875 0 13:09:42 2 22013 (gst-remote) R 4096 831352 87520 10 1168 11408 8732 0 13:09:44 2 22013 (gst-remote) R 4096 830768 87236 10 -584 -284 11247 0 13:09:46 2 22013 (gst-remote) S 4096 830184 86824 10 -584 -412 7097 0 13:09:48 2 22013 (gst-remote) S 4096 830184 86856 10 0 32 51 0 13:09:50 2 22013 (gst-remote) E 4096 0 0 0 0 0 23 0 Paolo and Nico: I'll follow up with details on the testbed offlist. s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Just because I'm lazy and busy, can you try running gst under "strace
-e signal=none -e accept" to see *what* is the "error while trying to accept a socket connection"? > 5 about 60s after the last request (30 lines) the client sockets are closed And the error keeps appearing in a loop? Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Fri, 24 Jul 2009 14:02:22 +0200
Paolo Bonzini <[hidden email]> wrote: > Just because I'm lazy and busy, a deadly combination, especially during summer :-) > can you try running gst under "strace -e signal=none -e accept" > to see *what* is the "error while trying to accept a socket > connection"? heh ... note to self: when using strace, run gst-remote with --server instead of --daemon. gst-remote server started. gst-remote --port=12345 --eval="Iliad.FileHandler filePath: 'public'" accept(4, {sa_family=AF_INET, sin_port=htons(36375), sin_addr=inet_addr("192.168.1.5")}, [47167330844688]) = 5 gst-remote --port=12345 --eval="OnlineTester.OTTest fileIn: 'doc/NuT-Inf.st'" accept(4, {sa_family=AF_INET, sin_port=htons(36376), sin_addr=inet_addr("192.168.1.5")}, [16]) = 5 gst-remote --port=12345 --eval="Iliad.SwazooIliad startOn: 4080" accept(4, {sa_family=AF_INET, sin_port=htons(36377), sin_addr=inet_addr("192.168.1.5")}, [16]) = 5 starting accept(6, ..., [47167330844688]) = 5 accept(6, ..., [16]) = 7 accept(6, ..., [16]) = 8 accept(6, ..., [47167330844688]) = 9 accept(6, ..., [4137119598535245840]) = 10 accept(6, ..., [16]) = 11 waiting accept(6, ..., [7954880229397233680]) = 13 accept(6, ..., [16]) = 17 accept(6, ..., [16]) = 22 accept(6, ..., [2316947471063842832]) = 27 accept(6, ..., [47167330844688]) = 32 accept(6, ..., [16]) = 38 accept(6, ..., [47167330844688]) = 8 accept(6, ..., [47167330844688]) = 49 ... 92 similar lines snipped accept(6, ..., [16]) = 918 accept(6, ..., [16]) = 903 accept(6, ..., [47167330844688]) = 889 accept(6, ..., [47167330844688]) = 866 accept(6, 0x2ae8ad65b080, [13572577691697280]) = -1 EMFILE (Too many open files) Socket accept error: Error while trying to accept a socket connection. s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |