tODE being disconnected on long "do-its"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

tODE being disconnected on long "do-its"

GLASS mailing list
Hi guys,

It already happens to me since a long time. I am using tODE on my local machine to connect to remote stones on server. I do this via SSH and everything seems to work. However, it happens that when I evaluate something from a workspace that may takes several minutes, then my tODE image does not respond anymore. The gem does not continue either and I must kill the image. Contrary when I run small do it from a workspace, I have no problem. 

Maybe it is related to SSH and keep alive or something? It seems related to the fact I do nothing with tODE (as I can't because tODE gem is locked by running whatever I evaluated in a workspace).

From the gemnetobject log all I can see is:


*****************************************************
****** Abnormal Shutdown at 01/31/2017 13:25:22 EST
*****************************************************
-----------------------------------------------------
GemStone: Error         Fatal
Network error - text follows:
socket read EOF
Error Category: 231169 [GemStone] Number: 4137  Arg Count: 0 Context : 20 exception : 20

On pcmon log I see:


[01/31/17 13:25:22.396 EST]: Client died: Slot    8, PID   28431, LostOtFlags    0, sessionId 5 Name Gem15
[01/31/17 13:25:22.396 EST]: Starting crashed client recovery: Slot 8, PID 28431, Name Gem15
    Cleaned up locked/pinned frame 101760 for crashed process 28431 (pte.lockedPageFrameId)
    Cleaned up locked/pinned frame 50206 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 26906 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 76553 for crashed process 28431 (pte.otCachePinnedFrames)
    Disposing free frame cache from slot 8:  9 of 9 frames recovered.
    Disposing free PCE cache from slot 8:  9 of  9 PCEs recovered.
[01/31/17 13:25:24.474 EST]: Finished crashed client recovery: Slot 8, PID 28431, Name Gem15


And on my client machine console I see:


 ❯ channel 12: open failed: connect failed: Connection refused                                                             


Does this happen to you too? Any ideas / workaround ?

Thanks

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list

Mariano,

It does look like the socket got closed for some reason (`socket read EOF`) ... so you are using an ssh tunnel to connect to server?

It would be good to figure out who is doing the closing ... the server-side doesn't close and the client-side doesn't have any GemStone related timeouts that would close a connection ... so perhaps your guess about keepalive is a good one ...

I'll see if anyone here has any insights.

Dale


On 01/31/2017 11:54 AM, Mariano Martinez Peck via Glass wrote:
Hi guys,

It already happens to me since a long time. I am using tODE on my local machine to connect to remote stones on server. I do this via SSH and everything seems to work. However, it happens that when I evaluate something from a workspace that may takes several minutes, then my tODE image does not respond anymore. The gem does not continue either and I must kill the image. Contrary when I run small do it from a workspace, I have no problem. 

Maybe it is related to SSH and keep alive or something? It seems related to the fact I do nothing with tODE (as I can't because tODE gem is locked by running whatever I evaluated in a workspace).

From the gemnetobject log all I can see is:


*****************************************************
****** Abnormal Shutdown at 01/31/2017 13:25:22 EST
*****************************************************
-----------------------------------------------------
GemStone: Error         Fatal
Network error - text follows:
socket read EOF
Error Category: 231169 [GemStone] Number: 4137  Arg Count: 0 Context : 20 exception : 20

On pcmon log I see:


[01/31/17 13:25:22.396 EST]: Client died: Slot    8, PID   28431, LostOtFlags    0, sessionId 5 Name Gem15
[01/31/17 13:25:22.396 EST]: Starting crashed client recovery: Slot 8, PID 28431, Name Gem15
    Cleaned up locked/pinned frame 101760 for crashed process 28431 (pte.lockedPageFrameId)
    Cleaned up locked/pinned frame 50206 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 26906 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 76553 for crashed process 28431 (pte.otCachePinnedFrames)
    Disposing free frame cache from slot 8:  9 of 9 frames recovered.
    Disposing free PCE cache from slot 8:  9 of  9 PCEs recovered.
[01/31/17 13:25:24.474 EST]: Finished crashed client recovery: Slot 8, PID 28431, Name Gem15


And on my client machine console I see:


 ❯ channel 12: open failed: connect failed: Connection refused                                                             


Does this happen to you too? Any ideas / workaround ?

Thanks


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list


On Tue, Jan 31, 2017 at 5:38 PM, Dale Henrichs <[hidden email]> wrote:

Mariano,

It does look like the socket got closed for some reason (`socket read EOF`) ...


Exactly!
 

so you are using an ssh tunnel to connect to server?


Yes. I something like this:

❯ cat bin/netldiPortForwarding.sh                                                                                          [17:51:47]
#!/bin/sh

#First port is local port, second port is HOST port

lsof -ti:40100 | xargs kill -9
lsof -ti:40200 | xargs kill -9
lsof -ti:40300 | xargs kill -9
lsof -ti:40400 | xargs kill -9
lsof -ti:40500 | xargs kill -9
lsof -ti:40600 | xargs kill -9
lsof -ti:40650 | xargs kill -9

 ssh -c arcfour,blowfish-cbc -XC -N \
        -L 40100:localhost:40100  \
        -L 40200:localhost:40200  \
        -L 40300:localhost:40300  \
        -L 40500:localhost:40500  \
        -L 40600:localhost:40600  \
        -L 40650:localhost:40650  \
        xxserver &

 

It would be good to figure out who is doing the closing ... the server-side doesn't close and the client-side doesn't have any GemStone related timeouts that would close a connection ... so perhaps your guess about keepalive is a good one ...

I'll see if anyone here has any insights.


To you it does not happen? For me, for example, doing a allInstances (with a 50GB stone) from a workspace, does trigger this issue. 

Maybe something with my sshd conf ? mmm
 

Dale


On 01/31/2017 11:54 AM, Mariano Martinez Peck via Glass wrote:
Hi guys,

It already happens to me since a long time. I am using tODE on my local machine to connect to remote stones on server. I do this via SSH and everything seems to work. However, it happens that when I evaluate something from a workspace that may takes several minutes, then my tODE image does not respond anymore. The gem does not continue either and I must kill the image. Contrary when I run small do it from a workspace, I have no problem. 

Maybe it is related to SSH and keep alive or something? It seems related to the fact I do nothing with tODE (as I can't because tODE gem is locked by running whatever I evaluated in a workspace).

From the gemnetobject log all I can see is:


*****************************************************
****** Abnormal Shutdown at 01/31/2017 13:25:22 EST
*****************************************************
-----------------------------------------------------
GemStone: Error         Fatal
Network error - text follows:
socket read EOF
Error Category: 231169 [GemStone] Number: 4137  Arg Count: 0 Context : 20 exception : 20

On pcmon log I see:


[01/31/17 13:25:22.396 EST]: Client died: Slot    8, PID   28431, LostOtFlags    0, sessionId 5 Name Gem15
[01/31/17 13:25:22.396 EST]: Starting crashed client recovery: Slot 8, PID 28431, Name Gem15
    Cleaned up locked/pinned frame 101760 for crashed process 28431 (pte.lockedPageFrameId)
    Cleaned up locked/pinned frame 50206 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 26906 for crashed process 28431 (pte.otCachePinnedFrames)
    Cleaned up locked/pinned frame 76553 for crashed process 28431 (pte.otCachePinnedFrames)
    Disposing free frame cache from slot 8:  9 of 9 frames recovered.
    Disposing free PCE cache from slot 8:  9 of  9 PCEs recovered.
[01/31/17 13:25:24.474 EST]: Finished crashed client recovery: Slot 8, PID 28431, Name Gem15


And on my client machine console I see:


 ❯ channel 12: open failed: connect failed: Connection refused                                                             


Does this happen to you too? Any ideas / workaround ?

Thanks


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list



On 01/31/2017 12:54 PM, Mariano Martinez Peck wrote:


On Tue, Jan 31, 2017 at 5:38 PM, Dale Henrichs <[hidden email]> wrote:

Mariano,

It does look like the socket got closed for some reason (`socket read EOF`) ...


Exactly!
 

so you are using an ssh tunnel to connect to server?


Yes. I something like this:

❯ cat bin/netldiPortForwarding.sh                                                                                          [17:51:47]
#!/bin/sh

#First port is local port, second port is HOST port

lsof -ti:40100 | xargs kill -9
lsof -ti:40200 | xargs kill -9
lsof -ti:40300 | xargs kill -9
lsof -ti:40400 | xargs kill -9
lsof -ti:40500 | xargs kill -9
lsof -ti:40600 | xargs kill -9
lsof -ti:40650 | xargs kill -9

 ssh -c arcfour,blowfish-cbc -XC -N \
        -L 40100:localhost:40100  \
        -L 40200:localhost:40200  \
        -L 40300:localhost:40300  \
        -L 40500:localhost:40500  \
        -L 40600:localhost:40600  \
        -L 40650:localhost:40650  \
        xxserver &

 

It would be good to figure out who is doing the closing ... the server-side doesn't close and the client-side doesn't have any GemStone related timeouts that would close a connection ... so perhaps your guess about keepalive is a good one ...

I'll see if anyone here has any insights.


To you it does not happen? For me, for example, doing a allInstances (with a 50GB stone) from a workspace, does trigger this issue. 

Maybe something with my sshd conf ? mmm
 
Sorry, I don't use the ssh tunnel that often, we have a VPN and I can just connect directly ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list
OK, I may have solved it by adding:

 ❯ cat .ssh/config                                                                                                                                                                                                                                                 [18:24:55]
Host *
    ServerAliveInterval 300
    ServerAliveCountMax 2


Will continue tomorrow with the testing and let you know.

Cheers,


On Tue, Jan 31, 2017 at 6:20 PM, Dale Henrichs <[hidden email]> wrote:



On 01/31/2017 12:54 PM, Mariano Martinez Peck wrote:


On Tue, Jan 31, 2017 at 5:38 PM, Dale Henrichs <[hidden email]> wrote:

Mariano,

It does look like the socket got closed for some reason (`socket read EOF`) ...


Exactly!
 

so you are using an ssh tunnel to connect to server?


Yes. I something like this:

❯ cat bin/netldiPortForwarding.sh                                                                                          [17:51:47]
#!/bin/sh

#First port is local port, second port is HOST port

lsof -ti:40100 | xargs kill -9
lsof -ti:40200 | xargs kill -9
lsof -ti:40300 | xargs kill -9
lsof -ti:40400 | xargs kill -9
lsof -ti:40500 | xargs kill -9
lsof -ti:40600 | xargs kill -9
lsof -ti:40650 | xargs kill -9

 ssh -c arcfour,blowfish-cbc -XC -N \
        -L 40100:localhost:40100  \
        -L 40200:localhost:40200  \
        -L 40300:localhost:40300  \
        -L 40500:localhost:40500  \
        -L 40600:localhost:40600  \
        -L 40650:localhost:40650  \
        xxserver &

 

It would be good to figure out who is doing the closing ... the server-side doesn't close and the client-side doesn't have any GemStone related timeouts that would close a connection ... so perhaps your guess about keepalive is a good one ...

I'll see if anyone here has any insights.


To you it does not happen? For me, for example, doing a allInstances (with a 50GB stone) from a workspace, does trigger this issue. 

Maybe something with my sshd conf ? mmm
 
Sorry, I don't use the ssh tunnel that often, we have a VPN and I can just connect directly ...

Dale



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list
On 01/31/2017 01:25 PM, Mariano Martinez Peck via Glass wrote:

> OK, I may have solved it by adding:
>
>  ❯ cat .ssh/config                                                    
>                                                                      
>                                                                      
>                                                 [18:24:55]
> Host *
>     ServerAliveInterval 300
>     ServerAliveCountMax 2
>
>
> Will continue tomorrow with the testing and let you know.
>
I suspect you're on the right track.

Without more detailed information, the most likely culprit is a stateful
firewall somewhere between your client and your server. Or something
doing NAT. Both of these things track connections, and have timeouts. If
they see no traffic, after some time they'll assume the connection has
been dropped and stop routing packets between the endpoints. Then when
you *do* try to use that connection, the packets don't get through, and
when the TCP connection isn't ACKed after a certain time the OS will
report a socket error to the gem or the client library.

By default, TCP does not send any packets on a connection unless there
is data to exchange. GemStone does enable SO_KEEPALIVE on its sockets.
This tells the OS to do a packet exchange every once in a while on an
idle socket. The OS controls the frequency of this. Default is usually
two hours, so if the NAT router or firewall has a shorter timeout, this
will not help.

Regards,

-Martin

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: tODE being disconnected on long "do-its"

GLASS mailing list
Thanks Martin.

Yes, it was that.  And I confirm gsDevKit decoumentation was correct already:


Cheers,

On Tue, Jan 31, 2017 at 7:57 PM, Martin McClure <[hidden email]> wrote:
On 01/31/2017 01:25 PM, Mariano Martinez Peck via Glass wrote:
> OK, I may have solved it by adding:
>
>  ❯ cat .ssh/config
>
>
>                                                 [18:24:55]
> Host *
>     ServerAliveInterval 300
>     ServerAliveCountMax 2
>
>
> Will continue tomorrow with the testing and let you know.
>
I suspect you're on the right track.

Without more detailed information, the most likely culprit is a stateful
firewall somewhere between your client and your server. Or something
doing NAT. Both of these things track connections, and have timeouts. If
they see no traffic, after some time they'll assume the connection has
been dropped and stop routing packets between the endpoints. Then when
you *do* try to use that connection, the packets don't get through, and
when the TCP connection isn't ACKed after a certain time the OS will
report a socket error to the gem or the client library.

By default, TCP does not send any packets on a connection unless there
is data to exchange. GemStone does enable SO_KEEPALIVE on its sockets.
This tells the OS to do a packet exchange every once in a while on an
idle socket. The OS controls the frequency of this. Default is usually
two hours, so if the NAT router or firewall has a shorter timeout, this
will not help.

Regards,

-Martin




--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass