the test test - an experience report

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: the test test - an experience report

Paolo Bonzini-3
On 07/23/2009 01:21 AM, Stefan Schmiedl wrote:
> On Thu, 23 Jul 2009 00:48:10 +0200
> Paolo Bonzini<[hidden email]>  wrote:
>
>> Please send the script so I can try to reproduce this on your image. :-)

Actually it's enough to do this:

   Open firefox
   http://localhost:4080/
   Close firefox
   netstat | grep 4080 -> CLOSE_WAIT
   Open firefox
   http://localhost:4080/
   Close firefox
   netstat | grep 4080 -> CLOSE_WAIT * 2

It's indeed a very serious bug.  I think I have a fix (in Swazoo only)
but I want to check it against upstream Swazoo and send it to Janko.
And I had a Sport part too in the fix, but now it's not in my tree and
the test seems to run smoothly.

It's time we both go to bed now, at least I can't think straight.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: the test test - more data

Stefan Schmiedl
In reply to this post by Stefan Schmiedl
On Thu, 16 Jul 2009 15:33:16 +0200
Stefan Schmiedl <[hidden email]> wrote:

> You might recall my building a tool for online testing
> with Iliad and GNU Smalltalk. Today was "the" day, and
> here is what happened.

Today I updated Iliad from svn, gst from git and added some
code to mech.rb to also load the images linked into the pages.

First of all: The socket situation is drastically improved.
Especially, if changing
        net.ipv4.tcp_fin_timeout = 60
to
        net.ipv4.tcp_fin_timeout = 5

Now I only have <n> client sockets chilling out after I run
        ruby mech.rb <n>

The bad news is that I still get the

   "Socket accept error: Error while trying to accept a socket connection"

The neutral news is that I need to run
        ruby mech.rb 54
to provoke it, whereas in the originial setting 25 clients
managed to raise it. I'm not sure if the fact that I'm running
it on my development box instead of the server is the cause
of this.

The good news is that I get that error *every* single time,
towards the end of the run.

The *really interesting* news is watching the output of
        while true; do netstat | grep tcp | wc -l ; sleep 2 ; done
while running ruby mech.rb 54 ...

5       "background" sockets, pop, nntp etc.
113     = 5 + 54*2, makes sense
113
113
113
113
113
113
113
113
113
112     ah, the first mech agent has finished
110
107
105
101
97
94
89
86
76
73      * around here the error message from gst appears, why now?
73
73
73
73
73
73
73
73
73
73
73
73
73
73
73
73
73
65
64
65
52
52
52
52
5       about 60s after the last request (30 lines) the client sockets are closed


Oh btw, if you happen to have atop on your computer, and I don't
see a reason why you should not, run the following line
        atop -m -a -P PRM 2 | grep gst-remote
in another window. It shows the following (slightly redacted) output
for this run. I've annotated virtual and resisdent memory size
and growth, as well as the number of minor and major page faults.

                                     VSIZE  RSIZE    VGR RGR mpf MPF
13:09:04 2 22013 (gst-remote) R 4096 831484 28680 10 3916 9224 8753 0
13:09:06 2 22013 (gst-remote) R 4096 830636 27664 10 -848 -1016 14302 0
13:09:08 2 22013 (gst-remote) R 4096 830540 27720 10 -96 56 11908 0
13:09:10 2 22013 (gst-remote) R 4096 830636 27760 10 96 40 12788 0
13:09:12 2 22013 (gst-remote) R 4096 831220 28352 10 584 592 11529 0
13:09:14 2 22013 (gst-remote) R 4096 830572 27716 10 -648 -636 11298 0
13:09:16 2 22013 (gst-remote) R 4096 831612 30900 10 1040 3184 12433 0
13:09:18 2 22013 (gst-remote) R 4096 830184 33600 10 -1428 2700 10581 0
13:09:20 2 22013 (gst-remote) R 4096 830444 38176 10 260 4576 9844 0
13:09:22 2 22013 (gst-remote) R 4096 830184 41492 10 -260 3316 8749 0
13:09:24 2 22013 (gst-remote) R 4096 830184 45408 10 0 3916 8953 0
13:09:26 2 22013 (gst-remote) R 4096 830184 51176 10 0 5768 9888 0
13:09:28 2 22013 (gst-remote) R 4096 830768 57456 10 584 6280 9401 0
13:09:30 2 22013 (gst-remote) R 4096 830184 60196 10 -584 2740 10673 0
13:09:32 2 22013 (gst-remote) R 4096 831936 65156 10 1752 4960 10783 0
13:09:34 2 22013 (gst-remote) R 4096 830768 71640 10 -1168 6484 8509 0
13:09:36 2 22013 (gst-remote) R 4096 830184 75320 10 -584 3680 10599 0
13:09:38 2 22013 (gst-remote) R 4096 830184 75684 10 0 364 10829 0
13:09:40 2 22013 (gst-remote) R 4096 830184 76112 10 0 428 8875 0
13:09:42 2 22013 (gst-remote) R 4096 831352 87520 10 1168 11408 8732 0
13:09:44 2 22013 (gst-remote) R 4096 830768 87236 10 -584 -284 11247 0
13:09:46 2 22013 (gst-remote) S 4096 830184 86824 10 -584 -412 7097 0
13:09:48 2 22013 (gst-remote) S 4096 830184 86856 10 0 32 51 0
13:09:50 2 22013 (gst-remote) E 4096 0 0 0 0 0 23 0


Paolo and Nico: I'll follow up with details on the testbed offlist.

s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: the test test - more data

Paolo Bonzini-2
Just because I'm lazy and busy, can you try running gst under "strace
-e signal=none -e accept" to see *what* is the "error while trying to
accept a socket connection"?

> 5       about 60s after the last request (30 lines) the client sockets are closed

And the error keeps appearing in a loop?

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: the test test - more data

Stefan Schmiedl
On Fri, 24 Jul 2009 14:02:22 +0200
Paolo Bonzini <[hidden email]> wrote:

> Just because I'm lazy and busy,

a deadly combination, especially during summer :-)

> can you try running gst under "strace -e signal=none -e accept"
> to see *what* is the "error while trying to accept a socket
> connection"?

heh ... note to self: when using strace, run gst-remote
with --server instead of --daemon.


gst-remote server started.
gst-remote --port=12345 --eval="Iliad.FileHandler filePath: 'public'"
accept(4, {sa_family=AF_INET, sin_port=htons(36375), sin_addr=inet_addr("192.168.1.5")}, [47167330844688]) = 5
gst-remote --port=12345 --eval="OnlineTester.OTTest fileIn: 'doc/NuT-Inf.st'"
accept(4, {sa_family=AF_INET, sin_port=htons(36376), sin_addr=inet_addr("192.168.1.5")}, [16]) = 5
gst-remote --port=12345 --eval="Iliad.SwazooIliad startOn: 4080"
accept(4, {sa_family=AF_INET, sin_port=htons(36377), sin_addr=inet_addr("192.168.1.5")}, [16]) = 5
starting
accept(6, ..., [47167330844688]) = 5
accept(6, ..., [16]) = 7
accept(6, ..., [16]) = 8
accept(6, ..., [47167330844688]) = 9
accept(6, ..., [4137119598535245840]) = 10
accept(6, ..., [16]) = 11
waiting
accept(6, ..., [7954880229397233680]) = 13
accept(6, ..., [16]) = 17
accept(6, ..., [16]) = 22
accept(6, ..., [2316947471063842832]) = 27
accept(6, ..., [47167330844688]) = 32
accept(6, ..., [16]) = 38
accept(6, ..., [47167330844688]) = 8
accept(6, ..., [47167330844688]) = 49
... 92 similar lines snipped
accept(6, ..., [16]) = 918
accept(6, ..., [16]) = 903
accept(6, ..., [47167330844688]) = 889
accept(6, ..., [47167330844688]) = 866
accept(6, 0x2ae8ad65b080, [13572577691697280]) = -1 EMFILE (Too many open files)
Socket accept error: Error while trying to accept a socket connection.

s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
12