Networking change in Pharo 1.2?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Networking change in Pharo 1.2?

Chris Muller-4
I was able to work on getting Magma 1.2 going in Pharo.  It was quite
easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
and Pharo 1.3.

But something seems to have changed in Pharo's networking from 1.1.1
to 1.2.  All Magma functionality seems to work fine for low-volume
activity.  However, when the test-suite gets to the HA test cases (at
the end), one of the images performing heavy networking activity,
consistently gets very slow and bogged down for some reason; causing
the clients to timeout and disrupting the test suite.  Fortunately, it
happens in the same place in the test-suite every time.

The UI of the image in question becomes VERY sluggish, but
MessageTally spyAllOn: didn't reveal anything useful.  What is it
doing?  I did verify that the Magma server in that image is still
functioning; clients were committing, but I had to increase their
timeouts from 10 to 45 seconds to avoid timeouts..

Unfortunately, two days of wrangling in Pharo (because I'm an old
Squeak dog) I could not nail the problem down; but I have one
suspect..  A couple of times, I caught a process seemingly hung up in
NetworkNameResolver; trying to resolve an IP from 'localhost'.

This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
network and get away with it and Pharo 1.2 is now more strict?  I
don't know, I would just like to ask the experts here for help who
know all what went into Pharo 1.2 so hopefully we can get to the
bottom of it.

Thanks,
  Chris

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Schwab,Wilhelm K
Two thoughts: (1) Gary recently mentioned a delay fix that IIRC was in Squeak but had not yet made it into Pharo.  It might be central to the problem??

(2) Not to say that the test should take so long to run, the but network layer should not be timing out at all.  Decisions of when to give up should be left to the application and indirectly the user - assuming the machine is attended (the app will "know" that, which the network layer cannot).  Attempts to connect or do I/O should do just that until they are told to stop.  Servers should listen and accept connections until they are stopped.

Bill



________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Chris Muller [[hidden email]]
Sent: Sunday, April 17, 2011 4:48 PM
To: [hidden email]; magma
Subject: [Pharo-project] Networking change in Pharo 1.2?

I was able to work on getting Magma 1.2 going in Pharo.  It was quite
easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
and Pharo 1.3.

But something seems to have changed in Pharo's networking from 1.1.1
to 1.2.  All Magma functionality seems to work fine for low-volume
activity.  However, when the test-suite gets to the HA test cases (at
the end), one of the images performing heavy networking activity,
consistently gets very slow and bogged down for some reason; causing
the clients to timeout and disrupting the test suite.  Fortunately, it
happens in the same place in the test-suite every time.

The UI of the image in question becomes VERY sluggish, but
MessageTally spyAllOn: didn't reveal anything useful.  What is it
doing?  I did verify that the Magma server in that image is still
functioning; clients were committing, but I had to increase their
timeouts from 10 to 45 seconds to avoid timeouts..

Unfortunately, two days of wrangling in Pharo (because I'm an old
Squeak dog) I could not nail the problem down; but I have one
suspect..  A couple of times, I caught a process seemingly hung up in
NetworkNameResolver; trying to resolve an IP from 'localhost'.

This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
network and get away with it and Pharo 1.2 is now more strict?  I
don't know, I would just like to ask the experts here for help who
know all what went into Pharo 1.2 so hopefully we can get to the
bottom of it.

Thanks,
  Chris


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Elliot Finley
In reply to this post by Chris Muller-4
On Sun, Apr 17, 2011 at 2:48 PM, Chris Muller <[hidden email]> wrote:
> Unfortunately, two days of wrangling in Pharo (because I'm an old
> Squeak dog) I could not nail the problem down; but I have one
> suspect..  A couple of times, I caught a process seemingly hung up in
> NetworkNameResolver; trying to resolve an IP from 'localhost'.

probably related to this:
http://lists.squeakfoundation.org/pipermail/magma/2010-September/001594.html

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Marcus Denker-4
In reply to this post by Chris Muller-4

On Apr 17, 2011, at 10:48 PM, Chris Muller wrote:

> I was able to work on getting Magma 1.2 going in Pharo.  It was quite
> easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
> and Pharo 1.3.
>
> But something seems to have changed in Pharo's networking from 1.1.1
> to 1.2.  All Magma functionality seems to work fine for low-volume
> activity.  However, when the test-suite gets to the HA test cases (at
> the end), one of the images performing heavy networking activity,
> consistently gets very slow and bogged down for some reason; causing
> the clients to timeout and disrupting the test suite.  Fortunately, it
> happens in the same place in the test-suite every time.
>
> The UI of the image in question becomes VERY sluggish, but
> MessageTally spyAllOn: didn't reveal anything useful.  What is it
> doing?  I did verify that the Magma server in that image is still
> functioning; clients were committing, but I had to increase their
> timeouts from 10 to 45 seconds to avoid timeouts..
>
Oh... this could be related to finalization / weak references in some way?

> Unfortunately, two days of wrangling in Pharo (because I'm an old
> Squeak dog) I could not nail the problem down; but I have one
> suspect..  A couple of times, I caught a process seemingly hung up in
> NetworkNameResolver; trying to resolve an IP from 'localhost'.
>
The only change to NetNameResolver was this:

        http://code.google.com/p/pharo/issues/detail?id=1853

Socket in general did not see many changes:

        http://code.google.com/p/pharo/issues/list?can=1&q=milestone%3D1.2+Socket


> This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
> Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
> maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
> network and get away with it and Pharo 1.2 is now more strict?  I
> don't know, I would just like to ask the experts here for help who
> know all what went into Pharo 1.2 so hopefully we can get to the
> bottom of it.
>
> Thanks,
>  Chris
>

--
Marcus Denker  -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Marcus Denker-4
In reply to this post by Chris Muller-4

On Apr 17, 2011, at 11:01 PM, Schwab,Wilhelm K wrote:

> Two thoughts: (1) Gary recently mentioned a delay fix that IIRC was in Squeak but had not yet made it into Pharo.  It might be central to the problem??
>
Gary's fix was not in Squeak...
It is now for testing in 1.2.2a and 1.3

        Marcus

--
Marcus Denker  -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Henrik Sperre Johansen
In reply to this post by Chris Muller-4
On 17.04.2011 22:48, Chris Muller wrote:

> I was able to work on getting Magma 1.2 going in Pharo.  It was quite
> easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
> and Pharo 1.3.
>
> But something seems to have changed in Pharo's networking from 1.1.1
> to 1.2.  All Magma functionality seems to work fine for low-volume
> activity.  However, when the test-suite gets to the HA test cases (at
> the end), one of the images performing heavy networking activity,
> consistently gets very slow and bogged down for some reason; causing
> the clients to timeout and disrupting the test suite.  Fortunately, it
> happens in the same place in the test-suite every time.
>
> The UI of the image in question becomes VERY sluggish, but
> MessageTally spyAllOn: didn't reveal anything useful.  What is it
> doing?  I did verify that the Magma server in that image is still
> functioning; clients were committing, but I had to increase their
> timeouts from 10 to 45 seconds to avoid timeouts..
>
> Unfortunately, two days of wrangling in Pharo (because I'm an old
> Squeak dog) I could not nail the problem down; but I have one
> suspect..  A couple of times, I caught a process seemingly hung up in
> NetworkNameResolver; trying to resolve an IP from 'localhost'.
>
> This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
> Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
> maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
> network and get away with it and Pharo 1.2 is now more strict?  I
> don't know, I would just like to ask the experts here for help who
> know all what went into Pharo 1.2 so hopefully we can get to the
> bottom of it.
>
> Thanks,
>    Chris
>
Which VM did you run these tests on?
IIRC, Cog has a hard limit on how many external semaphores are
available, and each Socket consumes 3 of those.
So if you are running on Cog, the problem when under heavy load may be
that there simpy aren't enough free external semaphores to create enough
sockets...

Cheers,
Henry

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Chris Muller-3
This is the VM I used:

3.9-7 #1 Sun Feb  6 18:58:21 PST 2011 gcc 4.1.2
Croquet Closure Cog VM [CoInterpreter VMMaker-oscog.47]
Linux mcqfes 2.6.18-128.el5 #1 SMP Wed Jan 21 10:44:23 EST 2009 i686
i686 i386 GNU/Linux
plugin path: /opt/4dst/thirdparty/squeak/lib/squeak/3.9-7/ [default:
/opt/4dst/thirdparty/squeak/lib/squeak/3.9-7/]

However, I use this same VM when I run the test in Pharo 1.1.1 and it's solid.

 - Chris


On Mon, Apr 18, 2011 at 3:23 AM, Henrik Sperre Johansen
<[hidden email]> wrote:

> On 17.04.2011 22:48, Chris Muller wrote:
>>
>> I was able to work on getting Magma 1.2 going in Pharo.  It was quite
>> easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
>> and Pharo 1.3.
>>
>> But something seems to have changed in Pharo's networking from 1.1.1
>> to 1.2.  All Magma functionality seems to work fine for low-volume
>> activity.  However, when the test-suite gets to the HA test cases (at
>> the end), one of the images performing heavy networking activity,
>> consistently gets very slow and bogged down for some reason; causing
>> the clients to timeout and disrupting the test suite.  Fortunately, it
>> happens in the same place in the test-suite every time.
>>
>> The UI of the image in question becomes VERY sluggish, but
>> MessageTally spyAllOn: didn't reveal anything useful.  What is it
>> doing?  I did verify that the Magma server in that image is still
>> functioning; clients were committing, but I had to increase their
>> timeouts from 10 to 45 seconds to avoid timeouts..
>>
>> Unfortunately, two days of wrangling in Pharo (because I'm an old
>> Squeak dog) I could not nail the problem down; but I have one
>> suspect..  A couple of times, I caught a process seemingly hung up in
>> NetworkNameResolver; trying to resolve an IP from 'localhost'.
>>
>> This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
>> Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
>> maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
>> network and get away with it and Pharo 1.2 is now more strict?  I
>> don't know, I would just like to ask the experts here for help who
>> know all what went into Pharo 1.2 so hopefully we can get to the
>> bottom of it.
>>
>> Thanks,
>>   Chris
>>
> Which VM did you run these tests on?
> IIRC, Cog has a hard limit on how many external semaphores are available,
> and each Socket consumes 3 of those.
> So if you are running on Cog, the problem when under heavy load may be that
> there simpy aren't enough free external semaphores to create enough
> sockets...
>
> Cheers,
> Henry
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Chris Muller-3
In reply to this post by Marcus Denker-4
> The only change to NetNameResolver was this:
>
>        http://code.google.com/p/pharo/issues/detail?id=1853

Reverting this change fixed it.

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Marcus Denker-4
In reply to this post by Marcus Denker-4

On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:

>> The only change to NetNameResolver was this:
>>
>>        http://code.google.com/p/pharo/issues/detail?id=1853
>
> Reverting this change fixed it.
>


Thanks! I have opend the issue again for 1.2.2 and 1.3

        Marcus


--
Marcus Denker  -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Chris Muller-3
Thank you too; it was a bruising problem I'm glad to have it identified.

On Mon, Apr 18, 2011 at 12:21 PM, Marcus Denker <[hidden email]> wrote:

>
> On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:
>
>>> The only change to NetNameResolver was this:
>>>
>>>        http://code.google.com/p/pharo/issues/detail?id=1853
>>
>> Reverting this change fixed it.
>>
>
>
> Thanks! I have opend the issue again for 1.2.2 and 1.3
>
>        Marcus
>
>
> --
> Marcus Denker  -- http://www.marcusdenker.de
> INRIA Lille -- Nord Europe. Team RMoD.
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Stéphane Ducasse
yes
Now do you have a description how we can reproduce your problem because the fix was fixing something.
and it would be good to understand what is the deeper problem.
Stef

On Apr 19, 2011, at 4:20 AM, Chris Muller wrote:

> Thank you too; it was a bruising problem I'm glad to have it identified.
>
> On Mon, Apr 18, 2011 at 12:21 PM, Marcus Denker <[hidden email]> wrote:
>>
>> On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:
>>
>>>> The only change to NetNameResolver was this:
>>>>
>>>>        http://code.google.com/p/pharo/issues/detail?id=1853
>>>
>>> Reverting this change fixed it.
>>>
>>
>>
>> Thanks! I have opend the issue again for 1.2.2 and 1.3
>>
>>        Marcus
>>
>>
>> --
>> Marcus Denker  -- http://www.marcusdenker.de
>> INRIA Lille -- Nord Europe. Team RMoD.
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Schwab,Wilhelm K
Stef,

What was it fixing?  There might be a better solution.  I found myself trying to swim in Linux, Pharo and machines with multiple interfaces (wired and wireless) almost simultaneously.  

I still haven't really figured it out, but *if* #localHostAddress makes sense at all (#localHostAddresses might be more meaningful message), it should probably raise an error if there is not a unique result.  #localHostAddress:ifNone:ifMany: would put the sender in control.  For your problem, #localHostOrLoopBackAddress would be another option; at least the sender would be knowingly accepting the "risk" of getting the loopback address.

Bill


________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Stéphane Ducasse [[hidden email]]
Sent: Tuesday, April 19, 2011 3:27 AM
To: [hidden email]; [hidden email]
Cc: magma
Subject: Re: [Pharo-project] Networking change in Pharo 1.2?

yes
Now do you have a description how we can reproduce your problem because the fix was fixing something.
and it would be good to understand what is the deeper problem.
Stef

On Apr 19, 2011, at 4:20 AM, Chris Muller wrote:

> Thank you too; it was a bruising problem I'm glad to have it identified.
>
> On Mon, Apr 18, 2011 at 12:21 PM, Marcus Denker <[hidden email]> wrote:
>>
>> On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:
>>
>>>> The only change to NetNameResolver was this:
>>>>
>>>>        http://code.google.com/p/pharo/issues/detail?id=1853
>>>
>>> Reverting this change fixed it.
>>>
>>
>>
>> Thanks! I have opend the issue again for 1.2.2 and 1.3
>>
>>        Marcus
>>
>>
>> --
>> Marcus Denker  -- http://www.marcusdenker.de
>> INRIA Lille -- Nord Europe. Team RMoD.
>>
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Chris Muller-3
In reply to this post by Stéphane Ducasse
Just bench it:

Before change:

  [ NetNameResolver localHostAddress ] bench   " '34,000 per second.' "

After change:

  [ NetNameResolver localHostAddress ] bench   " '31 per second.' "

In just looking at the reason given for making the change, it says
this is to satisfy an _exceptional_ case; e.g., the case where "no
network connection is available."

Then I look at the new code called by #localHostAddress and becomes obvious why:

isConnected
        "Dirty, but avoids fixing the plugin bug"
        [NetNameResolver addressForName: 'www.esug.org'.] on:
NameLookupFailure do: [:ex| ^false].
        ^true

A hard-coded nslookup to 'www.esug.org' wrapped in an exception-handler?  Wow!

If this isn't enough, you can run the Magma test-suite to see the
effect on a real-world networking application.

I recommend Pharo crew revert this change and consider a different approach.

 - Chris



On Tue, Apr 19, 2011 at 2:27 AM, Stéphane Ducasse
<[hidden email]> wrote:

> yes
> Now do you have a description how we can reproduce your problem because the fix was fixing something.
> and it would be good to understand what is the deeper problem.
> Stef
>
> On Apr 19, 2011, at 4:20 AM, Chris Muller wrote:
>
>> Thank you too; it was a bruising problem I'm glad to have it identified.
>>
>> On Mon, Apr 18, 2011 at 12:21 PM, Marcus Denker <[hidden email]> wrote:
>>>
>>> On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:
>>>
>>>>> The only change to NetNameResolver was this:
>>>>>
>>>>>        http://code.google.com/p/pharo/issues/detail?id=1853
>>>>
>>>> Reverting this change fixed it.
>>>>
>>>
>>>
>>> Thanks! I have opend the issue again for 1.2.2 and 1.3
>>>
>>>        Marcus
>>>
>>>
>>> --
>>> Marcus Denker  -- http://www.marcusdenker.de
>>> INRIA Lille -- Nord Europe. Team RMoD.
>>>
>>>
>>>
>>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Marcus Denker-4
In reply to this post by Stéphane Ducasse

On Apr 19, 2011, at 5:16 PM, Chris Muller wrote:

>
> Then I look at the new code called by #localHostAddress and becomes obvious why:
>
> isConnected
> "Dirty, but avoids fixing the plugin bug"
> [NetNameResolver addressForName: 'www.esug.org'.] on:
> NameLookupFailure do: [:ex| ^false].
> ^true
>
> A hard-coded nslookup to 'www.esug.org' wrapped in an exception-handler?  Wow!

Ups.... (shamefuly looking somewhere else, as I harvested the change...)

We will fix it.

        Marcus

--
Marcus Denker  -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Eliot Miranda-2
In reply to this post by Henrik Sperre Johansen
Hi Henrik,

On Mon, Apr 18, 2011 at 1:23 AM, Henrik Sperre Johansen <[hidden email]> wrote:
On 17.04.2011 22:48, Chris Muller wrote:
I was able to work on getting Magma 1.2 going in Pharo.  It was quite
easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
and Pharo 1.3.

But something seems to have changed in Pharo's networking from 1.1.1
to 1.2.  All Magma functionality seems to work fine for low-volume
activity.  However, when the test-suite gets to the HA test cases (at
the end), one of the images performing heavy networking activity,
consistently gets very slow and bogged down for some reason; causing
the clients to timeout and disrupting the test suite.  Fortunately, it
happens in the same place in the test-suite every time.

The UI of the image in question becomes VERY sluggish, but
MessageTally spyAllOn: didn't reveal anything useful.  What is it
doing?  I did verify that the Magma server in that image is still
functioning; clients were committing, but I had to increase their
timeouts from 10 to 45 seconds to avoid timeouts..

Unfortunately, two days of wrangling in Pharo (because I'm an old
Squeak dog) I could not nail the problem down; but I have one
suspect..  A couple of times, I caught a process seemingly hung up in
NetworkNameResolver; trying to resolve an IP from 'localhost'.

This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
network and get away with it and Pharo 1.2 is now more strict?  I
don't know, I would just like to ask the experts here for help who
know all what went into Pharo 1.2 so hopefully we can get to the
bottom of it.

Thanks,
  Chris

Which VM did you run these tests on?
IIRC, Cog has a hard limit on how many external semaphores are available, and each Socket consumes 3 of those.

Not so.  The limit is soft.  It can be accessed using Smalltalk vmParameterAt: 49.  It defaults to 256 entries.  It maxes out it 64k entries because the value set via vmParameterAt: 49 put: X persists in a short in the image header.  I expect 20k sockets to be sufficient for a while, right?

So if you are running on Cog, the problem when under heavy load may be that there simpy aren't enough free external semaphores to create enough sockets...

Cheers,
Henry


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Stéphane Ducasse
In reply to this post by Chris Muller-3
Ok thanks for the analysis.
We should really be able to collect such information and build regression tests. Right now we have test checking
for simple behavior but I would like to capture the regression you spotted.

I do not know what was the "avoids fixing the plugin bug" but I would like to know that too.
Stef

> Just bench it:
>
> Before change:
>
>  [ NetNameResolver localHostAddress ] bench   " '34,000 per second.' "
>
> After change:
>
>  [ NetNameResolver localHostAddress ] bench   " '31 per second.' "
>
> In just looking at the reason given for making the change, it says
> this is to satisfy an _exceptional_ case; e.g., the case where "no
> network connection is available."
>
> Then I look at the new code called by #localHostAddress and becomes obvious why:
>
> isConnected
> "Dirty, but avoids fixing the plugin bug"
> [NetNameResolver addressForName: 'www.esug.org'.] on:
> NameLookupFailure do: [:ex| ^false].
> ^true
>
> A hard-coded nslookup to 'www.esug.org' wrapped in an exception-handler?  Wow!
>
> If this isn't enough, you can run the Magma test-suite to see the
> effect on a real-world networking application.
>
> I recommend Pharo crew revert this change and consider a different approach.
>
> - Chris
>
>
>
> On Tue, Apr 19, 2011 at 2:27 AM, Stéphane Ducasse
> <[hidden email]> wrote:
>> yes
>> Now do you have a description how we can reproduce your problem because the fix was fixing something.
>> and it would be good to understand what is the deeper problem.
>> Stef
>>
>> On Apr 19, 2011, at 4:20 AM, Chris Muller wrote:
>>
>>> Thank you too; it was a bruising problem I'm glad to have it identified.
>>>
>>> On Mon, Apr 18, 2011 at 12:21 PM, Marcus Denker <[hidden email]> wrote:
>>>>
>>>> On Apr 18, 2011, at 7:14 PM, Chris Muller wrote:
>>>>
>>>>>> The only change to NetNameResolver was this:
>>>>>>
>>>>>>        http://code.google.com/p/pharo/issues/detail?id=1853
>>>>>
>>>>> Reverting this change fixed it.
>>>>>
>>>>
>>>>
>>>> Thanks! I have opend the issue again for 1.2.2 and 1.3
>>>>
>>>>        Marcus
>>>>
>>>>
>>>> --
>>>> Marcus Denker  -- http://www.marcusdenker.de
>>>> INRIA Lille -- Nord Europe. Team RMoD.
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

abergel
In reply to this post by Marcus Denker-4
> Ups.... (shamefuly looking somewhere else, as I harvested the change...)


No problem Marcus. I think it would be difficult to do a better job than what you do.

Cheers,
Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Stéphane Ducasse
+1

but we should write some specific tests.

On Apr 19, 2011, at 10:40 PM, Alexandre Bergel wrote:

>> Ups.... (shamefuly looking somewhere else, as I harvested the change...)
>
>
> No problem Marcus. I think it would be difficult to do a better job than what you do.
>
> Cheers,
> Alexandre
>
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Henrik Sperre Johansen
In reply to this post by Eliot Miranda-2
On 19.04.2011 20:19, Eliot Miranda wrote:
Hi Henrik,

On Mon, Apr 18, 2011 at 1:23 AM, Henrik Sperre Johansen <[hidden email]> wrote:
On 17.04.2011 22:48, Chris Muller wrote:
I was able to work on getting Magma 1.2 going in Pharo.  It was quite
easy to get the code loaded and functioning in Pharo 1.1.1, Pharo 1.2,
and Pharo 1.3.

But something seems to have changed in Pharo's networking from 1.1.1
to 1.2.  All Magma functionality seems to work fine for low-volume
activity.  However, when the test-suite gets to the HA test cases (at
the end), one of the images performing heavy networking activity,
consistently gets very slow and bogged down for some reason; causing
the clients to timeout and disrupting the test suite.  Fortunately, it
happens in the same place in the test-suite every time.

The UI of the image in question becomes VERY sluggish, but
MessageTally spyAllOn: didn't reveal anything useful.  What is it
doing?  I did verify that the Magma server in that image is still
functioning; clients were committing, but I had to increase their
timeouts from 10 to 45 seconds to avoid timeouts..

Unfortunately, two days of wrangling in Pharo (because I'm an old
Squeak dog) I could not nail the problem down; but I have one
suspect..  A couple of times, I caught a process seemingly hung up in
NetworkNameResolver; trying to resolve an IP from 'localhost'.

This exact set of Magma packages is rock-solid on Pharo 1.1.1 and
Squeak, but that doesn't mean the problem for sure lies in Pharo 1.2;
maybe a networking bug in 1.1.1 is allowing Magma to "misuse" the
network and get away with it and Pharo 1.2 is now more strict?  I
don't know, I would just like to ask the experts here for help who
know all what went into Pharo 1.2 so hopefully we can get to the
bottom of it.

Thanks,
  Chris

Which VM did you run these tests on?
IIRC, Cog has a hard limit on how many external semaphores are available, and each Socket consumes 3 of those.

Not so.  The limit is soft.  It can be accessed using Smalltalk vmParameterAt: 49.  It defaults to 256 entries.  It maxes out it 64k entries because the value set via vmParameterAt: 49 put: X persists in a short in the image header.  I expect 20k sockets to be sufficient for a while, right?
Ah, absolutely :D

That's what I get for skimming readme's,  think it'd be good to upgrade the comment though?
No specific mention is made that it can be(although frowned upon)/how to set it after startup, currently it reads:
"Another significant change is in the external semaphore table support code. 
This is now lock-free at the cost of having to specify a maximum number of
external semaphores at start-up (default 256)."

I guess having it accessible from image is one interpretation of that line, personally I thought it was that you could use some parameter when launching the executable :)

Also, it's currently possible to register more than this limit in current images (Smalltalk registerExternalObject:) without an error.
Am I correct in my reading of the code that when this happens, they will never be signaled?
If so, we'd probably want to do some changes to ExternalSemaphoreTable :)

Cheers,
Henry






Reply | Threaded
Open this post in threaded view
|

Re: Networking change in Pharo 1.2?

Henrik Sperre Johansen
On 20.04.2011 01:13, Henrik Sperre Johansen wrote:

Also, it's currently possible to register more than this limit in current images (Smalltalk registerExternalObject:) without an error.
Am I correct in my reading of the code that when this happens, they will never be signaled?
If so, we'd probably want to do some changes to ExternalSemaphoreTable :)

Cheers,
Henry

Eheh, I guess so.
I did it as part of testing and forgot about it, then when I wanted to publish the slice, I got stuck on a Socket semaphore which never signaled, and had to wait the entire timeout period before it proceeded. :)

Cheers,
Henry