[Glass] Swazoo server hangs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[Glass] Swazoo server hangs

otto
Hi,

We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4.

We run 4 swazoo servers reverse proxied behind nginx. The problem is
that our Swazoo server hangs up. There is a socket listening on the
known port. The process is idle, but it does not respond to requests -
connecting to the port times out.

Sending kill -USR1 <pid> gives us the output below. We have a
monitoring process that picks up this condition and kills the process.
But this causes the site to be unresponsive until it starts up again.

Do you have any suggestion how to solve this? Have you seen this
problem on your applications? Will an upgrade to GS 3 help?

Thanks
Otto

GemStone signal handler: signal 10 (SIGUSR1), received from process
28660 userId 1000
  si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or abort(3C)

Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST 2013


End of C-level stack:

----------- Lock not acquired - retrying LOG ENTRY: Session lock
denied: 2075-----------

Printing Smalltalk stack for memory usage diagnosis:
Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01
AM.783 SAST]
    iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20
1 = TOP OF STACK,   stackDepth = 10

1  ProcessorScheduler >> _reapEvents: @IP 132  [GsMethod 498429953]
   16: 0x7f6265153ff0 (cls:66817 Array) size:0)
   15: 10 (SmallInteger 1)
   14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18]
  VC at 0x7f6265153f60   VC.unwindBlock= 20 (OOP_NIL)  VC.serialNum=
3005013794875392082 (SmallInteger 375626724359424010)
   13: 3005013794875392082 (SmallInteger 375626724359424010)
   12: 20 (OOP_NIL)
   11: 20 (OOP_NIL)
   10: 0x7f6265153ff0 (cls:66817 Array) size:0)
   9: 0x7f6265153ff0 (cls:66817 Array) size:0)
   8: 268 (OOP_TRUE)
   7: 2 (SmallInteger 0)
   6: 26 (SmallInteger 3)
   5: 80002 (SmallInteger 10000)
   4: 10 (SmallInteger 1)
   3: 11065002175282 (SmallInteger 1383125271910)
   2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3)
   1: 268 (OOP_TRUE)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
[framePtr=0x7f6298ea7090 AR[18]]

2  ProcessorScheduler >> _findReadyProcess @IP 13  [GsMethod 498434561]
   1: 20 (OOP_NIL)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
<--framePtr=0x7f6298ea7080 AR[16]

3  ProcessorScheduler >> _reschedule @IP 13  [GsMethod 498439425]
   2: 20 (OOP_NIL)
   1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
<--framePtr=0x7f6298ea7068 AR[13]

4  GsProcess >> _wait @IP 13  [GsMethod 260628481]
rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
<--framePtr=0x7f6298ea7060 AR[12]

5  Delay >> wait @IP 54  [GsMethod 498471937]
rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3)
<--framePtr=0x7f6298ea7058 AR[11]

6  WAGsSwazooAdaptor >> start @IP 20  [GsMethod 3440353793]
rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601
FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10]

7  WAServerAdaptor (C)  >> startOn: @IP 27  [GsMethod 3494994433]
   2: 0x7f627f50e858 oid:20135281153 (cls:42137601
FinWorksGsSwazooAdaptor) size:5)
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f6291984060 oid:42137601 (cls:42128385
FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038
AR[7]

8  WAGemStoneRunSeasideGems >> startOn: @IP 13  [GsMethod 3494978817]
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems) size:3)
<--framePtr=0x7f6298ea7028 AR[5]

9  WAGemStoneRunSeasideGems (C)  >> startGemServerOn: @IP 21
[GsMethod 3494925057]
   2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems) size:3)
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729
WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010
AR[2]

10  (Executed Code) @IP 71  [GsMethod 0x7f6294094058]
rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1]
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

[Glass] Fwd: Swazoo server hangs

otto
Hi,

Ok, so this is not an obvious one - no responses.

We are getting desperate as it is happening constantly now. Any ideas
are welcome, please.

We thought that we could reproduce this problem with the stack trace
below, but this appears to be less useful because it only shows the
stack of the one active thread. It does mean that other threads could
be in an unhappy state, but we can't see it. Do you know if sigusr1
will dump more in GS 3?

Is the Zinc HTTP server perhaps a better option (in stead of Swazoo)?

Thanks
Otto

---------- Forwarded message ----------
From: Otto Behrens <[hidden email]>
Date: Wed, Oct 30, 2013 at 11:50 AM
Subject: Swazoo server hangs
To: "[hidden email]" <[hidden email]>


Hi,

We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4.

We run 4 swazoo servers reverse proxied behind nginx. The problem is
that our Swazoo server hangs up. There is a socket listening on the
known port. The process is idle, but it does not respond to requests -
connecting to the port times out.

Sending kill -USR1 <pid> gives us the output below. We have a
monitoring process that picks up this condition and kills the process.
But this causes the site to be unresponsive until it starts up again.

Do you have any suggestion how to solve this? Have you seen this
problem on your applications? Will an upgrade to GS 3 help?

Thanks
Otto

GemStone signal handler: signal 10 (SIGUSR1), received from process
28660 userId 1000
  si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or abort(3C)

Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST 2013


End of C-level stack:

----------- Lock not acquired - retrying LOG ENTRY: Session lock
denied: 2075-----------

Printing Smalltalk stack for memory usage diagnosis:
Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01
AM.783 SAST]
    iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20
1 = TOP OF STACK,   stackDepth = 10

1  ProcessorScheduler >> _reapEvents: @IP 132  [GsMethod 498429953]
   16: 0x7f6265153ff0 (cls:66817 Array) size:0)
   15: 10 (SmallInteger 1)
   14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18]
  VC at 0x7f6265153f60   VC.unwindBlock= 20 (OOP_NIL)  VC.serialNum=
3005013794875392082 (SmallInteger 375626724359424010)
   13: 3005013794875392082 (SmallInteger 375626724359424010)
   12: 20 (OOP_NIL)
   11: 20 (OOP_NIL)
   10: 0x7f6265153ff0 (cls:66817 Array) size:0)
   9: 0x7f6265153ff0 (cls:66817 Array) size:0)
   8: 268 (OOP_TRUE)
   7: 2 (SmallInteger 0)
   6: 26 (SmallInteger 3)
   5: 80002 (SmallInteger 10000)
   4: 10 (SmallInteger 1)
   3: 11065002175282 (SmallInteger 1383125271910)
   2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3)
   1: 268 (OOP_TRUE)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
[framePtr=0x7f6298ea7090 AR[18]]

2  ProcessorScheduler >> _findReadyProcess @IP 13  [GsMethod 498434561]
   1: 20 (OOP_NIL)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
<--framePtr=0x7f6298ea7080 AR[16]

3  ProcessorScheduler >> _reschedule @IP 13  [GsMethod 498439425]
   2: 20 (OOP_NIL)
   1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
<--framePtr=0x7f6298ea7068 AR[13]

4  GsProcess >> _wait @IP 13  [GsMethod 260628481]
rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
<--framePtr=0x7f6298ea7060 AR[12]

5  Delay >> wait @IP 54  [GsMethod 498471937]
rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3)
<--framePtr=0x7f6298ea7058 AR[11]

6  WAGsSwazooAdaptor >> start @IP 20  [GsMethod 3440353793]
rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601
FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10]

7  WAServerAdaptor (C)  >> startOn: @IP 27  [GsMethod 3494994433]
   2: 0x7f627f50e858 oid:20135281153 (cls:42137601
FinWorksGsSwazooAdaptor) size:5)
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f6291984060 oid:42137601 (cls:42128385
FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038
AR[7]

8  WAGemStoneRunSeasideGems >> startOn: @IP 13  [GsMethod 3494978817]
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems) size:3)
<--framePtr=0x7f6298ea7028 AR[5]

9  WAGemStoneRunSeasideGems (C)  >> startGemServerOn: @IP 21
[GsMethod 3494925057]
   2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems) size:3)
   1: 64026 (SmallInteger 8003)
rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729
WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010
AR[2]

10  (Executed Code) @IP 71  [GsMethod 0x7f6294094058]
rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1]
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Fwd: Swazoo server hangs

otto
Thanks, Paul.

> Does your monitor process kill & restart all the Swazoo servers at once or one at a time as it detects the misbehaving?  I know you can setup daemontools to kill one at a time if you're not already e.g.

We kill one at a time. A cron job. Daemontools start it up again when
it sees the process is dead.

> http://stackoverflow.com/questions/10650686/how-to-supervise-a-webserver-with-daemontools/10663912#10663912

This looks like a good solution - perhaps even better than a cron job.
But the result should be the same though.

> Can you use the FastCGI server which I think was officially supported for 2.4?  As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better.

We initially used FastCGI but then switched to Hyper and then Swazoo,
simply because it was easier for us to debug as we can use something
like curl to do a get and see what the server pops up. I suppose that
if fast cgi has some nice tools to talk to the GS FastCGI server it
would be as good. But don't you like the simplicity of talking HTTP
all the way through? Why must FastCGI be better?
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Tobias Pape
On 01.11.2013, at 15:44, Otto Behrens <[hidden email]> wrote:

> Thanks, Paul.
>
>> Does your monitor process kill & restart all the Swazoo servers at once or one at a time as it detects the misbehaving?  I know you can setup daemontools to kill one at a time if you're not already e.g.
>
> We kill one at a time. A cron job. Daemontools start it up again when
> it sees the process is dead.
>
>> http://stackoverflow.com/questions/10650686/how-to-supervise-a-webserver-with-daemontools/10663912#10663912
>
> This looks like a good solution - perhaps even better than a cron job.
> But the result should be the same though.
See https://code.google.com/p/glassdb/wiki/GLASSDaemonTools and
https://github.com/Monty/GemStone_daemontools_setup for an application of this.

>
>> Can you use the FastCGI server which I think was officially supported for 2.4?  As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better.
>
> We initially used FastCGI but then switched to Hyper and then Swazoo,
> simply because it was easier for us to debug as we can use something
> like curl to do a get and see what the server pops up. I suppose that
> if fast cgi has some nice tools to talk to the GS FastCGI server it
> would be as good. But don't you like the simplicity of talking HTTP
> all the way through? Why must FastCGI be better?

FastCGI eliminates much of HTTP's verboseness, it is a binary protocol
specifically aimed at (reverse-proxy)<->(app-server) setups.
 I can confirm that this setup works _really_ well for GLASS

Best
        -Tobias

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

signature.asc (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Dale Henrichs-3
In reply to this post by otto
Sorry Otto, I have been in Argentina this last week and haven't been able to focus on "hard problems" ... Swazoo has always been a bit dicey on GemStone, which is one of the main reasons to prefer FastCGI. I think that Zinc is more stable - Johan uses Zinc in production (and perhaps others) ... Also Zinc has support for running client-side HTTP which is very convenient ...

I will have to dig in a bit more ... but I just seem to recall that I didn't feel comfortable with the level of bugfixing that needed to go on ...

Dale
----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: [hidden email]
| Sent: Wednesday, October 30, 2013 2:50:23 AM
| Subject: [Glass] Swazoo server hangs
|
| Hi,
|
| We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4.
|
| We run 4 swazoo servers reverse proxied behind nginx. The problem is
| that our Swazoo server hangs up. There is a socket listening on the
| known port. The process is idle, but it does not respond to requests
| -
| connecting to the port times out.
|
| Sending kill -USR1 <pid> gives us the output below. We have a
| monitoring process that picks up this condition and kills the
| process.
| But this causes the site to be unresponsive until it starts up again.
|
| Do you have any suggestion how to solve this? Have you seen this
| problem on your applications? Will an upgrade to GS 3 help?
|
| Thanks
| Otto
|
| GemStone signal handler: signal 10 (SIGUSR1), received from process
| 28660 userId 1000
|   si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or
|   abort(3C)
|
| Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST
| 2013
|
|
| End of C-level stack:
|
| ----------- Lock not acquired - retrying LOG ENTRY: Session lock
| denied: 2075-----------
|
| Printing Smalltalk stack for memory usage diagnosis:
| Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01
| AM.783 SAST]
|     iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20
| 1 = TOP OF STACK,   stackDepth = 10
|
| 1  ProcessorScheduler >> _reapEvents: @IP 132  [GsMethod 498429953]
|    16: 0x7f6265153ff0 (cls:66817 Array) size:0)
|    15: 10 (SmallInteger 1)
|    14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18]
|   VC at 0x7f6265153f60   VC.unwindBlock= 20 (OOP_NIL)  VC.serialNum=
| 3005013794875392082 (SmallInteger 375626724359424010)
|    13: 3005013794875392082 (SmallInteger 375626724359424010)
|    12: 20 (OOP_NIL)
|    11: 20 (OOP_NIL)
|    10: 0x7f6265153ff0 (cls:66817 Array) size:0)
|    9: 0x7f6265153ff0 (cls:66817 Array) size:0)
|    8: 268 (OOP_TRUE)
|    7: 2 (SmallInteger 0)
|    6: 26 (SmallInteger 3)
|    5: 80002 (SmallInteger 10000)
|    4: 10 (SmallInteger 1)
|    3: 11065002175282 (SmallInteger 1383125271910)
|    2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3)
|    1: 268 (OOP_TRUE)
| rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
| [framePtr=0x7f6298ea7090 AR[18]]
|
| 2  ProcessorScheduler >> _findReadyProcess @IP 13  [GsMethod
| 498434561]
|    1: 20 (OOP_NIL)
| rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
| <--framePtr=0x7f6298ea7080 AR[16]
|
| 3  ProcessorScheduler >> _reschedule @IP 13  [GsMethod 498439425]
|    2: 20 (OOP_NIL)
|    1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
| rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11)
| <--framePtr=0x7f6298ea7068 AR[13]
|
| 4  GsProcess >> _wait @IP 13  [GsMethod 260628481]
| rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22)
| <--framePtr=0x7f6298ea7060 AR[12]
|
| 5  Delay >> wait @IP 54  [GsMethod 498471937]
| rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3)
| <--framePtr=0x7f6298ea7058 AR[11]
|
| 6  WAGsSwazooAdaptor >> start @IP 20  [GsMethod 3440353793]
| rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601
| FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10]
|
| 7  WAServerAdaptor (C)  >> startOn: @IP 27  [GsMethod 3494994433]
|    2: 0x7f627f50e858 oid:20135281153 (cls:42137601
| FinWorksGsSwazooAdaptor) size:5)
|    1: 64026 (SmallInteger 8003)
| rcvr: 0x7f6291984060 oid:42137601 (cls:42128385
| FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038
| AR[7]
|
| 8  WAGemStoneRunSeasideGems >> startOn: @IP 13  [GsMethod 3494978817]
|    1: 64026 (SmallInteger 8003)
| rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems)
| size:3)
| <--framePtr=0x7f6298ea7028 AR[5]
|
| 9  WAGemStoneRunSeasideGems (C)  >> startGemServerOn: @IP 21
| [GsMethod 3494925057]
|    2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems)
|    size:3)
|    1: 64026 (SmallInteger 8003)
| rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729
| WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010
| AR[2]
|
| 10  (Executed Code) @IP 71  [GsMethod 0x7f6294094058]
| rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1]
| _______________________________________________
| Glass mailing list
| [hidden email]
| http://lists.gemtalksystems.com/mailman/listinfo/glass
|
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

otto
> Sorry Otto, I have been in Argentina this last week and haven't been able to focus on "hard problems" ... Swazoo has always been a bit dicey on GemStone, which is one of the main reasons to prefer FastCGI. I think that Zinc is more stable - Johan uses Zinc in production (and perhaps others) ... Also Zinc has support for running client-side HTTP which is very convenient ...

No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot,
so thinking that using the server is reducing the number of
components.

> I will have to dig in a bit more ... but I just seem to recall that I didn't feel comfortable with the level of bugfixing that needed to go on ...

Another avenue seems to be a better option.

Johan, are you using the Zinc server as reverse proxied servers in
stead of FastCGI? If so, what's your experience? Is it stable? On GS 3
or GS 2?
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Dale Henrichs-3
Otto,

Johan is running on GemStone 2.x, but I can't say for sure whether or not he might be using version of Zinc with some private bugfixes ...

Dale

----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: "Dale K. Henrichs" <[hidden email]>, "Johan Brichau" <[hidden email]>
| Cc: [hidden email]
| Sent: Tuesday, November 5, 2013 9:25:39 AM
| Subject: Re: [Glass] Swazoo server hangs
|
| > Sorry Otto, I have been in Argentina this last week and haven't
| > been able to focus on "hard problems" ... Swazoo has always been a
| > bit dicey on GemStone, which is one of the main reasons to prefer
| > FastCGI. I think that Zinc is more stable - Johan uses Zinc in
| > production (and perhaps others) ... Also Zinc has support for
| > running client-side HTTP which is very convenient ...
|
| No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot,
| so thinking that using the server is reducing the number of
| components.
|
| > I will have to dig in a bit more ... but I just seem to recall that
| > I didn't feel comfortable with the level of bugfixing that needed
| > to go on ...
|
| Another avenue seems to be a better option.
|
| Johan, are you using the Zinc server as reverse proxied servers in
| stead of FastCGI? If so, what's your experience? Is it stable? On GS
| 3
| or GS 2?
|
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Johan Brichau-3
Hey guys,

We are using the Zinc client (version 1.7) in production, not the server.
We use the FastCGI adaptors behind an nginx server.

This works perfectly well, so we have had no reason to try using the Zinc server.
We did try Swazoo in the early days and we had noticeable slower responses, so we never really continued to try using that one.

Oh, and we are using both GS 2.4.x and GS 3.1.x setups in production.

I would be interested to try the Zinc server sometimes, but there is still some work to do I think [1]

cheers,
Johan

[1] https://github.com/glassdb/zinc/issues?state=open

On 05 Nov 2013, at 19:08, Dale K. Henrichs <[hidden email]> wrote:

> Otto,
>
> Johan is running on GemStone 2.x, but I can't say for sure whether or not he might be using version of Zinc with some private bugfixes ...
>
> Dale
>
> ----- Original Message -----
> | From: "Otto Behrens" <[hidden email]>
> | To: "Dale K. Henrichs" <[hidden email]>, "Johan Brichau" <[hidden email]>
> | Cc: [hidden email]
> | Sent: Tuesday, November 5, 2013 9:25:39 AM
> | Subject: Re: [Glass] Swazoo server hangs
> |
> | > Sorry Otto, I have been in Argentina this last week and haven't
> | > been able to focus on "hard problems" ... Swazoo has always been a
> | > bit dicey on GemStone, which is one of the main reasons to prefer
> | > FastCGI. I think that Zinc is more stable - Johan uses Zinc in
> | > production (and perhaps others) ... Also Zinc has support for
> | > running client-side HTTP which is very convenient ...
> |
> | No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot,
> | so thinking that using the server is reducing the number of
> | components.
> |
> | > I will have to dig in a bit more ... but I just seem to recall that
> | > I didn't feel comfortable with the level of bugfixing that needed
> | > to go on ...
> |
> | Another avenue seems to be a better option.
> |
> | Johan, are you using the Zinc server as reverse proxied servers in
> | stead of FastCGI? If so, what's your experience? Is it stable? On GS
> | 3
> | or GS 2?
> |

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Johan Brichau-3
In reply to this post by otto
Otto,

I just scanned through the mail thread (seem to have missed it before).

First off: given the number of problems you have using Swazoo and that Zinc server has not been battle tested in Gemstone (and there are open issues nobody really looked at), I definitely recommend to switch (back) to FastCGI. It is stable and fast. But, of course, it would be great if you can flesh out the remaining issues with Zinc server on Gemstone ;-)

Second, are you seeing the lock ups occurring frequently? Are they irregular or is there a pattern?
I am asking this because we do have a similar problem that occurs (rather infrequently) with FastCGI adaptors for Seaside [1]:
A seaside gem will become unresponsive after some time. I already managed to find out that the gateSemaphore of a quit system could still be less than 10 (i.e. some processes got locked and never signaled the semaphore) and that it might have something to do with the front-end server dropping connections. I'm not sure if these problems are related though.

Johan

[1] https://code.google.com/p/glassdb/issues/detail?id=341

On 01 Nov 2013, at 15:44, Otto Behrens <[hidden email]> wrote:

>> Can you use the FastCGI server which I think was officially supported for 2.4?  As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better.
>
> We initially used FastCGI but then switched to Hyper and then Swazoo,
> simply because it was easier for us to debug as we can use something
> like curl to do a get and see what the server pops up. I suppose that
> if fast cgi has some nice tools to talk to the GS FastCGI server it
> would be as good. But don't you like the simplicity of talking HTTP
> all the way through? Why must FastCGI be better?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

otto
Thanks for the input Johan.

> First off: given the number of problems you have using Swazoo and that Zinc server has not been battle tested in Gemstone (and there are open issues nobody really looked at), I definitely recommend to switch (back) to FastCGI. It is stable and fast. But, of course, it would be great if you can flesh out the remaining issues with Zinc server on Gemstone ;-)

Thanks, really can't work on Zinc now, pressure => battle tested FastCGI.

> Second, are you seeing the lock ups occurring frequently? Are they irregular or is there a pattern?

Yes, there's a pattern.

Someone else looked at the problem, but here's my laymens
interpretation. We picked up the pattern when we had the same ajax
call on the onblur and onchange events on the same html element. This
caused virtually simultaneous calls to 2 different Swazoo servers with
the same session (and action) id. This caused a conflict and one
process retries. When it retries, it tries to read from the socket
again, which has already been read on the first try (hey, there's no 2
phase commit on reading from sockets?). So, something like that. In
principle, when we read from / write to external systems and a commit
fails in GS, we generally have a problem.

Does this make sense? I can get more details if you need.

> I am asking this because we do have a similar problem that occurs (rather infrequently) with FastCGI adaptors for Seaside [1]:
> A seaside gem will become unresponsive after some time. I already managed to find out that the gateSemaphore of a quit system could still be less than 10 (i.e. some processes got locked and never signaled the semaphore) and that it might have something to do with the front-end server dropping connections. I'm not sure if these problems are related though.

Does not sound as if they are related, but I suppose it could be.

Thanks
Otto
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Dale Henrichs-3


----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: "Johan Brichau" <[hidden email]>
| Cc: "Dawie Strauss" <[hidden email]>, [hidden email]
| Sent: Wednesday, November 6, 2013 4:49:25 AM
| Subject: Re: [Glass] Swazoo server hangs
|
| Thanks for the input Johan.
|
| > First off: given the number of problems you have using Swazoo and
| > that Zinc server has not been battle tested in Gemstone (and there
| > are open issues nobody really looked at), I definitely recommend
| > to switch (back) to FastCGI. It is stable and fast. But, of
| > course, it would be great if you can flesh out the remaining
| > issues with Zinc server on Gemstone ;-)
|
| Thanks, really can't work on Zinc now, pressure => battle tested
| FastCGI.
|
| > Second, are you seeing the lock ups occurring frequently? Are they
| > irregular or is there a pattern?
|
| Yes, there's a pattern.
|
| Someone else looked at the problem, but here's my laymens
| interpretation. We picked up the pattern when we had the same ajax
| call on the onblur and onchange events on the same html element. This
| caused virtually simultaneous calls to 2 different Swazoo servers
| with
| the same session (and action) id. This caused a conflict and one
| process retries. When it retries, it tries to read from the socket
| again, which has already been read on the first try (hey, there's no
| 2
| phase commit on reading from sockets?). So, something like that. In
| principle, when we read from / write to external systems and a commit
| fails in GS, we generally have a problem.
|
| Does this make sense? I can get more details if you need.

This makes a lot of sense ... I have never really hammered Swazoo under load, like I have FastCGI, so this pattern of retry on failed commit has probably never been tested ... The Zinc code will have to undergo similar load testing before it's ready...

|
| > I am asking this because we do have a similar problem that occurs
| > (rather infrequently) with FastCGI adaptors for Seaside [1]:
| > A seaside gem will become unresponsive after some time. I already
| > managed to find out that the gateSemaphore of a quit system could
| > still be less than 10 (i.e. some processes got locked and never
| > signaled the semaphore) and that it might have something to do
| > with the front-end server dropping connections. I'm not sure if
| > these problems are related though.
|
| Does not sound as if they are related, but I suppose it could be.
|

I think it is something different as well, but I would like to get this problem under a microscope some day...

According to Google Issue #341, there might be a correlation to commit conflicts and I mention a suspicion about ensure blocks ... the issue with ensure blocks is that when an error occurs during the execution of ensure blocks, the rest of the ensure blocks might not get evaluated ... so this vulnerability may be causing Swazoo to misbehave as well ...

Johan, it might be worth adding some logging in the ensure blocks associated with the gateSemaphore to eliminate this as a possible problem..

Dale

[1] https://code.google.com/p/glassdb/issues/detail?id=341&q=fastCGI&colspec=ID%20Type%20Status%20Priority%20GLASS%20Version%20Milestone%20Owner%20Summary%20bugid%20Fixed
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Johan Brichau-3

On 06 Nov 2013, at 16:27, Dale K. Henrichs <[hidden email]> wrote:

> According to Google Issue #341, there might be a correlation to commit conflicts and I mention a suspicion about ensure blocks ... the issue with ensure blocks is that when an error occurs during the execution of ensure blocks, the rest of the ensure blocks might not get evaluated ... so this vulnerability may be causing Swazoo to misbehave as well ...

That's interesting intel. I never understood your mention of 'ensure block' bug that way.
I'll take a look because it does ring a bell that the last things that seems to show up in the gem log before the unresponsiveness are commit conflict retries...

> Johan, it might be worth adding some logging in the ensure blocks associated with the gateSemaphore to eliminate this as a possible problem..

Yes, that is a good idea.
Just today I eliminated my previous suspicion that socket disconnects by the front-end server might be related to this. I managed to confirm they are not related at all.

But weeks may go by before we hit this bug at over 50K requests per day on a single stone with 3 seaside gems, so it's not _that_ common.

Johan
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

otto
In reply to this post by Johan Brichau-3
Hi,

> We are using the Zinc client (version 1.7) in production, not the server.
> We use the FastCGI adaptors behind an nginx server.

Do you mind sending us an example nginx config that works for you?

Thanks
Otto
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Paul DeBruicker


here's one I've used:


#########################################################################
upstream seaside {
         server localhost:9001;
         server localhost:9002;
         server localhost:9003;
}


server {
       listen 80 default_server;
       server_name www.example.com example.com;
       root /var/www/www.example.com;

        gzip on;
        gzip_disable "msie6";
        gzip_static on;
        gzip_vary on;
        gzip_proxied any;
        gzip_comp_level 6;
        gzip_buffers 16 8k;
        gzip_http_version 1.1;
        gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;



       location @fastCgi {
           include fastcgi_params;
           fastcgi_pass seaside;
        }

        location / {
                try_files $uri @getcooperation;
        }

        location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ {
                add_header Access-Control-Allow-Origin http://www.getcooperation.com ;
                expires max;
                add_header Cache-Control "public, must-revalidate, proxy-revalidate";
                add_header Pragma public;
        }
}

#####################################################################


It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors.  It sets the expiration headers for any static file and zips the static and dynamic content.  If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly.









On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote:

> Hi,
>
>> We are using the Zinc client (version 1.7) in production, not the server.
>> We use the FastCGI adaptors behind an nginx server.
>
> Do you mind sending us an example nginx config that works for you?
>
> Thanks
> Otto
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Paul DeBruicker
err. this line:

>             try_files $uri @getcooperation;


should be this line:

        try_files $uri @fastCgi;


hth

Paul


On Nov 11, 2013, at 1:29 PM, Paul DeBruicker <[hidden email]> wrote:

>
>
> here's one I've used:
>
>
> #########################################################################
> upstream seaside {
>         server localhost:9001;
>         server localhost:9002;
>         server localhost:9003;
> }
>
>
> server {
>       listen 80 default_server;
>       server_name www.example.com example.com;
>       root /var/www/www.example.com;
>
>        gzip on;
>        gzip_disable "msie6";
>        gzip_static on;
>        gzip_vary on;
>        gzip_proxied any;
>        gzip_comp_level 6;
>        gzip_buffers 16 8k;
>        gzip_http_version 1.1;
>        gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
>
>
>
>       location @fastCgi {
>           include fastcgi_params;
>           fastcgi_pass seaside;
>        }
>
>        location / {
>                 try_files $uri @getcooperation;
>        }
>
>        location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ {
>                add_header Access-Control-Allow-Origin http://www.getcooperation.com ;
>                expires max;
>                add_header Cache-Control "public, must-revalidate, proxy-revalidate";
>                add_header Pragma public;
>        }
> }
>
> #####################################################################
>
>
> It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors.  It sets the expiration headers for any static file and zips the static and dynamic content.  If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly.
>
>
>
>
>
>
>
>
>
> On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote:
>
>> Hi,
>>
>>> We are using the Zinc client (version 1.7) in production, not the server.
>>> We use the FastCGI adaptors behind an nginx server.
>>
>> Do you mind sending us an example nginx config that works for you?
>>
>> Thanks
>> Otto
>> _______________________________________________
>> Glass mailing list
>> [hidden email]
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

otto
Thanks, got it
www.FinWorks.biz
+27 82 809 2375


On Mon, Nov 11, 2013 at 11:51 PM, Paul DeBruicker <[hidden email]> wrote:

> err. this line:
>
>>             try_files $uri @getcooperation;
>
>
> should be this line:
>
>         try_files $uri @fastCgi;
>
>
> hth
>
> Paul
>
>
> On Nov 11, 2013, at 1:29 PM, Paul DeBruicker <[hidden email]> wrote:
>
>>
>>
>> here's one I've used:
>>
>>
>> #########################################################################
>> upstream seaside {
>>         server localhost:9001;
>>         server localhost:9002;
>>         server localhost:9003;
>> }
>>
>>
>> server {
>>       listen 80 default_server;
>>       server_name www.example.com example.com;
>>       root /var/www/www.example.com;
>>
>>        gzip on;
>>        gzip_disable "msie6";
>>        gzip_static on;
>>        gzip_vary on;
>>        gzip_proxied any;
>>        gzip_comp_level 6;
>>        gzip_buffers 16 8k;
>>        gzip_http_version 1.1;
>>        gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
>>
>>
>>
>>       location @fastCgi {
>>           include fastcgi_params;
>>           fastcgi_pass seaside;
>>        }
>>
>>        location / {
>>                 try_files $uri @getcooperation;
>>        }
>>
>>        location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ {
>>                add_header Access-Control-Allow-Origin http://www.getcooperation.com ;
>>                expires max;
>>                add_header Cache-Control "public, must-revalidate, proxy-revalidate";
>>                add_header Pragma public;
>>        }
>> }
>>
>> #####################################################################
>>
>>
>> It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors.  It sets the expiration headers for any static file and zips the static and dynamic content.  If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>>> We are using the Zinc client (version 1.7) in production, not the server.
>>>> We use the FastCGI adaptors behind an nginx server.
>>>
>>> Do you mind sending us an example nginx config that works for you?
>>>
>>> Thanks
>>> Otto
>>> _______________________________________________
>>> Glass mailing list
>>> [hidden email]
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

otto
Has anyone got some script that checks if a fast gci server is up?

Currently, while using Swazoo, we have a little monitoring process
that does an http get to the GS server for each of the back-ends. If
the swazoo server does not respond, we have some code that checks if
it is busy. If we think it is not busy and it does not respond to a
get, we restart it.

I was thinking to do a similar "get" to the fast cgi server to verify
if it is alive. Does this make sense?

Thanks
Otto
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Johan Brichau-3
Hi Otto,

To check if a fastcgi process is up-and-running, you can use monit [1] with the following configuration. I believe I found this online somewhere, perhaps [2].

You will want to adapt the <code to start the seaside gem> part. I just copy/pasted from our configuration and removed that command since we have specific scripts for our setup. I believe it should be something like this: "startSeaside30Adaptor FastCGI 9001".

Here is the part for monit. Just copy/paste it for each fastcgi port you are running.

check process fastcgi_9001 with pidfile /opt/gemstone/product/seaside/data/FastCGI_server-9001.pid
        start program = "<code to start the seaside gem>" as uid sites and gid sites
        stop program = "<code to stop the seaside gem>" as uid sites and gid sites
        # Empty FastCGI request
        if failed port 9001
                  # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
                  # padding 8 bytes (0x08), followed by 8xNULLs padding
                  send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
                  # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
                  expect "\0x01\0x0A"
                  timeout 10 seconds
        then restart

Hope this helps!
Johan

[1] http://mmonit.com/monit/
[2] http://richard.wallman.org.uk/2010/03/monitor-a-fastcgi-server-using-monit/

On 13 Nov 2013, at 12:18, Otto Behrens <[hidden email]> wrote:

> Has anyone got some script that checks if a fast gci server is up?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Swazoo server hangs

Paul DeBruicker
Hi Otto & Johan-

Just to offer another example I have not been using monit and instead have a bash script controlled by daemontools like this:


#!/bin/bash
# named httpServerHealthCheck.sh
SERVICE_TO_MONITOR=/etc/service/gs_fastcgi-9001
sleep 30
curl -R -O http://127.0.0.1/f9001
RESULT=`awk 'NR=1{print $1}' f9001`
if [ "$RESULT" != "/f9001" ] ; then
  svc -t $SERVICE_TO_MONITOR
  echo `date` "-" $SERVICE_TO_MONITOR "was restarted." ;
fi


and in my nginx config have a location block like this:

location /f9001 {
      fastcgi_pass localhost:9001;
}

The bash script:
1. Sleeps for 30 seconds

2. uses curl to access a nonexistent seaside app so I get the standard Seaside 'not found' error (.e.g. "/f9001 not found") stored into a file 'f9001' next to the bash script.  

3. the bash script then uses awk to store the first bit of the downloaded file in the RESULT variable.

4. the RESULT variable is then compared with the expected result, and if not found, the Gem is restarted by daemontools.

5. the bash script exits

daemontools monitors the bash script and restarts it shortly after it stops running

the daemontools 'run' script is:

#!/bin/sh
exec ./httpServerHealthCheck.sh




Hope this helps

Paul




On Nov 15, 2013, at 12:20 PM, Johan Brichau <[hidden email]> wrote:

> Hi Otto,
>
> To check if a fastcgi process is up-and-running, you can use monit [1] with the following configuration. I believe I found this online somewhere, perhaps [2].
>
> You will want to adapt the <code to start the seaside gem> part. I just copy/pasted from our configuration and removed that command since we have specific scripts for our setup. I believe it should be something like this: "startSeaside30Adaptor FastCGI 9001".
>
> Here is the part for monit. Just copy/paste it for each fastcgi port you are running.
>
> check process fastcgi_9001 with pidfile /opt/gemstone/product/seaside/data/FastCGI_server-9001.pid
>        start program = "<code to start the seaside gem>" as uid sites and gid sites
>        stop program = "<code to stop the seaside gem>" as uid sites and gid sites
> # Empty FastCGI request
> if failed port 9001
>  # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
>  # padding 8 bytes (0x08), followed by 8xNULLs padding
>  send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
>  # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
>  expect "\0x01\0x0A"
>  timeout 10 seconds
> then restart
>
> Hope this helps!
> Johan
>
> [1] http://mmonit.com/monit/
> [2] http://richard.wallman.org.uk/2010/03/monitor-a-fastcgi-server-using-monit/
>
> On 13 Nov 2013, at 12:18, Otto Behrens <[hidden email]> wrote:
>
>> Has anyone got some script that checks if a fast gci server is up?
>

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass