Sockets don't close, 100% CPU

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Sockets don't close, 100% CPU

Janko Mivšek
Hi guys,

If I recall others have also problems with sockets in Pharo, so here is
the my current image, which:

  - has 105 Sockets open, mostly waiting
  - has also >105 processes open
  - consumes 100% cpu
  - on aSocket close responds with primitive failure
  - image is workable, I can browse etc.

  - Pharo1.2.2a on Linux

Sockets are waiting on Socket>>waitForConnectionFor:ifTimeOut:

If I put a breakpoint in this method, most processes waiting on socket
raise an exception and I can manually terminate them. CPU is then back
to normal.

But still 48 sockets and their processes are open.

Now I debug one such process from Process monitor and immediatelly
another batch of self halt exceptions are raised. Now I managed to
manually close all sockets and their processes.

Has anyone an idea how to avoid those sockets being deadlocked on some
waiting? It seems that they timeout but are soon back. They seem to
timeout quite fast, is this a reason for 100% CPU?

Why I cannot close them but instead a close fail with primitive failure?

Best regards
Janko



--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe

On 06 Jul 2011, at 10:40, Janko Mivšek wrote:

> If I recall others have also problems with sockets in Pharo, so here is
> the my current image, which:
>
>  - has 105 Sockets open, mostly waiting
>  - has also >105 processes open
>  - consumes 100% cpu
>  - on aSocket close responds with primitive failure
>  - image is workable, I can browse etc.
>
>  - Pharo1.2.2a on Linux
>
> Sockets are waiting on Socket>>waitForConnectionFor:ifTimeOut:

Is that number of open sockets and processes by design or not ? If they keep coming back, you must be creating them. Your application has to keep these (expensive) resources under control to start with.

Long running Seaside images seem to be fine and they are using sockets and processes a lot.

Socket have #close and #destroy, there might be a difference.

I think that part of the cleanup of sockets is delayed (using finalization) and needs some GC work to complete. GC policies might be involved as well as weak datastructures.

I am also interested in knowning the practical upper bound for open sockets and number of processes.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Henrik Sperre Johansen

On Jul 6, 2011, at 10:54 33AM, Sven Van Caekenberghe wrote:


On 06 Jul 2011, at 10:40, Janko Mivšek wrote:

If I recall others have also problems with sockets in Pharo, so here is
the my current image, which:

- has 105 Sockets open, mostly waiting
- has also >105 processes open
- consumes 100% cpu
- on aSocket close responds with primitive failure
- image is workable, I can browse etc.

- Pharo1.2.2a on Linux

Sockets are waiting on Socket>>waitForConnectionFor:ifTimeOut:

Is that number of open sockets and processes by design or not ? If they keep coming back, you must be creating them. Your application has to keep these (expensive) resources under control to start with.

Long running Seaside images seem to be fine and they are using sockets and processes a lot.

Socket have #close and #destroy, there might be a difference.

I think that part of the cleanup of sockets is delayed (using finalization) and needs some GC work to complete. GC policies might be involved as well as weak datastructures.

I am also interested in knowning the practical upper bound for open sockets and number of processes.

Sven

http://forum.world.st/Networking-change-in-Pharo-1-2-tp3456097p3461723.html
should explain it.

TLDR; Changes in the image are required to work with Cog without silently failing to signal semaphores when too many Sockets are open.

Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Janko Mivšek
In reply to this post by Sven Van Caekenberghe
S, Sven Van Caekenberghe piše:
> Janko Mivšek wrote:

>> If I recall others have also problems with sockets in Pharo, so here is
>> the my current image, which:
>>
>>  - has 105 Sockets open, mostly waiting
>>  - has also >105 processes open
>>  - consumes 100% cpu
>>  - on aSocket close responds with primitive failure
>>  - image is workable, I can browse etc.
>>
>>  - Pharo1.2.2a on Linux
>>
>> Sockets are waiting on Socket>>waitForConnectionFor:ifTimeOut:

> Is that number of open sockets and processes by design or not ? If they keep coming back, you must be creating them. Your application has to keep these (expensive) resources under control to start with.

This is Swazoo accepting the connections, probably from web spiders. But
also promptly closing them, because there are no instances of
HTTPConnection hanging around.

> Long running Seaside images seem to be fine and they are using sockets and processes a lot.
>
> Socket have #close and #destroy, there might be a difference.

Swazoo uses Sport portability layer and its SpSocket actually call
#closeAndDestroy on underlying Socket. Obviously this method don't
manage to actually close nor destroy the socket.

> I think that part of the cleanup of sockets is delayed (using finalization) and needs some GC work to complete. GC policies might be involved as well as weak datastructures.

This could be a way to proceed. How to diagnose that?

> I am also interested in knowning the practical upper bound for open sockets and number of processes.

From Henrik pointer it seems that in 1.2 one-click there can be up to
512 semaphores by default. Every socket need three, so up to 170 sockets
can be open at once. Semaphore limit can be increased , to say 1024*3
for 1024 sockets by:

        Smalltalk vmParameterAt: 49 put: 1024*3
       
Another limit on Linux is namely a default 1024 limit of sockets per a
whole machine. This one can also be increased

Best regards
Janko

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe
In reply to this post by Henrik Sperre Johansen

On 06 Jul 2011, at 11:19, Henrik Johansen wrote:

> http://forum.world.st/Networking-change-in-Pharo-1-2-tp3456097p3461723.html
> should explain it.
>
> TLDR; Changes in the image are required to work with Cog without silently failing to signal semaphores when too many Sockets are open.

OK, I think I understand more or less. Thanks, Hendrik.

Where does it fail silently ? IMHO that should not happen: the failure should be hard as in 'I am running out of semaphores' or whatever. That would help in debugging situations like this.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Henrik Sperre Johansen

On Jul 6, 2011, at 11:52 29AM, Sven Van Caekenberghe wrote:

>
> On 06 Jul 2011, at 11:19, Henrik Johansen wrote:
>
>> http://forum.world.st/Networking-change-in-Pharo-1-2-tp3456097p3461723.html
>> should explain it.
>>
>> TLDR; Changes in the image are required to work with Cog without silently failing to signal semaphores when too many Sockets are open.
>
> OK, I think I understand more or less. Thanks, Hendrik.
>
> Where does it fail silently ? I

IIRC, it should be explained in the thread :)

> MHO that should not happen: the failure should be hard as in 'I am running out of semaphores' or whatever. That would help in debugging situations like this.
>
> Sven

There are two good options:
1) Increase the external table size if possible, and log a warning that some signals may have been lost and you really should increase the size at startup.
2) Raise an error.

I think you'd want 1 in a production environment, and 2 in development.

Cheers,
Henry



Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe

On 06 Jul 2011, at 12:12, Henrik Johansen wrote:

> IIRC, it should be explained in the thread :)

I didn' open all messages ;-)

So #registerExternalObject: and/or #safelyRegisterExternalObject: then.
Is there already an issue for this, should we have one, I think so ?

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Henrik Sperre Johansen

On Jul 6, 2011, at 12:35 17PM, Sven Van Caekenberghe wrote:

>
> On 06 Jul 2011, at 12:12, Henrik Johansen wrote:
>
>> IIRC, it should be explained in the thread :)
>
> I didn' open all messages ;-)
>
> So #registerExternalObject: and/or #safelyRegisterExternalObject: then.
yes
> Is there already an issue for this,
no
> should we have one, I think so ?

yes :)

Cheers,
Henry

Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe
http://code.google.com/p/pharo/issues/detail?id=4505

Feel free to comment or contribute.

On 06 Jul 2011, at 12:47, Henrik Johansen wrote:

>
> On Jul 6, 2011, at 12:35 17PM, Sven Van Caekenberghe wrote:
>
>>
>> On 06 Jul 2011, at 12:12, Henrik Johansen wrote:
>>
>>> IIRC, it should be explained in the thread :)
>>
>> I didn' open all messages ;-)
>>
>> So #registerExternalObject: and/or #safelyRegisterExternalObject: then.
> yes
>> Is there already an issue for this,
> no
>> should we have one, I think so ?
>
> yes :)
>
> Cheers,
> Henry
>


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Henrik Sperre Johansen

On Jul 6, 2011, at 1:03 56PM, Sven Van Caekenberghe wrote:

> http://code.google.com/p/pharo/issues/detail?id=4505
>
> Feel free to comment or contribute.
Holiday starts next week, if weather is bad I hopefully have time for more than yes/no answers ;)

Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe

On 06 Jul 2011, at 13:09, Henrik Johansen wrote:

>
> On Jul 6, 2011, at 1:03 56PM, Sven Van Caekenberghe wrote:
>
>> http://code.google.com/p/pharo/issues/detail?id=4505
>>
>> Feel free to comment or contribute.
> Holiday starts next week, if weather is bad I hopefully have time for more than yes/no answers ;)
>
> Cheers,
> Henry

OK, no sweat!

Enjoy your holidays.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Janko Mivšek
In reply to this post by Henrik Sperre Johansen
S, Henrik Johansen piše:

>> http://code.google.com/p/pharo/issues/detail?id=4505
>>
>> Feel free to comment or contribute.

> Holiday starts next week, if weather is bad I hopefully have time for more than yes/no answers ;)

Nice to hear that :) And thanks in advance!

Janko


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Stéphane Ducasse
In reply to this post by Henrik Sperre Johansen
I do not understand. We fixed the problem reported by chris.
So is there something else?

Stef

On Jul 6, 2011, at 11:19 AM, Henrik Johansen wrote:

>
> On Jul 6, 2011, at 10:54 33AM, Sven Van Caekenberghe wrote:
>
>>
>> On 06 Jul 2011, at 10:40, Janko Mivšek wrote:
>>
>>> If I recall others have also problems with sockets in Pharo, so here is
>>> the my current image, which:
>>>
>>> - has 105 Sockets open, mostly waiting
>>> - has also >105 processes open
>>> - consumes 100% cpu
>>> - on aSocket close responds with primitive failure
>>> - image is workable, I can browse etc.
>>>
>>> - Pharo1.2.2a on Linux
>>>
>>> Sockets are waiting on Socket>>waitForConnectionFor:ifTimeOut:
>>
>> Is that number of open sockets and processes by design or not ? If they keep coming back, you must be creating them. Your application has to keep these (expensive) resources under control to start with.
>>
>> Long running Seaside images seem to be fine and they are using sockets and processes a lot.
>>
>> Socket have #close and #destroy, there might be a difference.
>>
>> I think that part of the cleanup of sockets is delayed (using finalization) and needs some GC work to complete. GC policies might be involved as well as weak datastructures.
>>
>> I am also interested in knowning the practical upper bound for open sockets and number of processes.
>>
>> Sven
>
> http://forum.world.st/Networking-change-in-Pharo-1-2-tp3456097p3461723.html
> should explain it.
>
> TLDR; Changes in the image are required to work with Cog without silently failing to signal semaphores when too many Sockets are open.
>
> Cheers,
> Henry


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Henrik Sperre Johansen

On Jul 6, 2011, at 3:16 39PM, Stéphane Ducasse wrote:

> I do not understand. We fixed the problem reported by chris.
> So is there something else?
>
> Stef

Read the last 2 posts in the thread, the issue is not related to what turned out to be solution for Chris' problem, but my wrong speculation of what it might be. ;)

Alternatively, to see what I mean fast, evaluate in a workspace using a Cog VM:
overflowingSockets := (1 to: Smalltalk vm maxExternalSemaphores / 2) collect: [:e | Socket newTCP]

And then try to connect to a Monticello repository :)

Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Sven Van Caekenberghe

On 06 Jul 2011, at 15:48, Henrik Johansen wrote:

> Alternatively, to see what I mean fast, evaluate in a workspace using a Cog VM:
> overflowingSockets := (1 to: Smalltalk vm maxExternalSemaphores / 2) collect: [:e | Socket newTCP]

I didn't know about #maxExternalSemaphores and maxExternalSemaphores:

Nice to see DomainError being used ;-)

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Sockets don't close, 100% CPU

Stéphane Ducasse
In reply to this post by Henrik Sperre Johansen
Tx
I read it after and realized that.

On Jul 6, 2011, at 3:48 PM, Henrik Johansen wrote:

>
> On Jul 6, 2011, at 3:16 39PM, Stéphane Ducasse wrote:
>
>> I do not understand. We fixed the problem reported by chris.
>> So is there something else?
>>
>> Stef
>
> Read the last 2 posts in the thread, the issue is not related to what turned out to be solution for Chris' problem, but my wrong speculation of what it might be. ;)
>
> Alternatively, to see what I mean fast, evaluate in a workspace using a Cog VM:
> overflowingSockets := (1 to: Smalltalk vm maxExternalSemaphores / 2) collect: [:e | Socket newTCP]
>
> And then try to connect to a Monticello repository :)
>
> Cheers,
> Henry