Re: [Pharo-project] Networking change in Pharo 1.2?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Networking change in Pharo 1.2?

Eliot Miranda-2
 
Hi Henrik,

On Mon, Apr 18, 2011 at 1:23 AM, Henrik Sperre Johansen <[hidden email]> wrote:
On 17.04.2011 22:48, Chris Muller wrote:
I was able to work on getting Magma 1.2 going in Pharo.  It was quite
easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
and Pharo 1.3.

But something seems to have changed in Pharo's networking from 1.1.1
to 1.2.  All Magma functionality seems to work fine for low-volume
activity.  However, when the test-suite gets to the HA test cases (at
the end), one of the images performing heavy networking activity,
consistently gets very slow and bogged down for some reason; causing
the clients to timeout and disrupting the test suite.  Fortunately, it
happens in the same place in the test-suite every time.

The UI of the image in question becomes VERY sluggish, but
MessageTally spyAllOn: didn't reveal anything useful.  What is it
doing?  I did verify that the Magma server in that image is still
functioning; clients were committing, but I had to increase their
timeouts from 10 to 45 seconds to avoid timeouts..

Unfortunately, two days of wrangling in Pharo (because I'm an old
Squeak dog) I could not nail the problem down; but I have one
suspect..  A couple of times, I caught a process seemingly hung up in
NetworkNameResolver; trying to resolve an IP from 'localhost'.

This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
network and get away with it and Pharo 1.2 is now more strict?  I
don't know, I would just like to ask the experts here for help who
know all what went into Pharo 1.2 so hopefully we can get to the
bottom of it.

Thanks,
  Chris

Which VM did you run these tests on?
IIRC, Cog has a hard limit on how many external semaphores are available, and each Socket consumes 3 of those.

Not so.  The limit is soft.  It can be accessed using Smalltalk vmParameterAt: 49.  It defaults to 256 entries.  It maxes out it 64k entries because the value set via vmParameterAt: 49 put: X persists in a short in the image header.  I expect 20k sockets to be sufficient for a while, right?

So if you are running on Cog, the problem when under heavy load may be that there simpy aren't enough free external semaphores to create enough sockets...

Cheers,
Henry