Hi,
We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4. We run 4 swazoo servers reverse proxied behind nginx. The problem is that our Swazoo server hangs up. There is a socket listening on the known port. The process is idle, but it does not respond to requests - connecting to the port times out. Sending kill -USR1 <pid> gives us the output below. We have a monitoring process that picks up this condition and kills the process. But this causes the site to be unresponsive until it starts up again. Do you have any suggestion how to solve this? Have you seen this problem on your applications? Will an upgrade to GS 3 help? Thanks Otto GemStone signal handler: signal 10 (SIGUSR1), received from process 28660 userId 1000 si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or abort(3C) Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST 2013 End of C-level stack: ----------- Lock not acquired - retrying LOG ENTRY: Session lock denied: 2075----------- Printing Smalltalk stack for memory usage diagnosis: Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01 AM.783 SAST] iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20 1 = TOP OF STACK, stackDepth = 10 1 ProcessorScheduler >> _reapEvents: @IP 132 [GsMethod 498429953] 16: 0x7f6265153ff0 (cls:66817 Array) size:0) 15: 10 (SmallInteger 1) 14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18] VC at 0x7f6265153f60 VC.unwindBlock= 20 (OOP_NIL) VC.serialNum= 3005013794875392082 (SmallInteger 375626724359424010) 13: 3005013794875392082 (SmallInteger 375626724359424010) 12: 20 (OOP_NIL) 11: 20 (OOP_NIL) 10: 0x7f6265153ff0 (cls:66817 Array) size:0) 9: 0x7f6265153ff0 (cls:66817 Array) size:0) 8: 268 (OOP_TRUE) 7: 2 (SmallInteger 0) 6: 26 (SmallInteger 3) 5: 80002 (SmallInteger 10000) 4: 10 (SmallInteger 1) 3: 11065002175282 (SmallInteger 1383125271910) 2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3) 1: 268 (OOP_TRUE) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) [framePtr=0x7f6298ea7090 AR[18]] 2 ProcessorScheduler >> _findReadyProcess @IP 13 [GsMethod 498434561] 1: 20 (OOP_NIL) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) <--framePtr=0x7f6298ea7080 AR[16] 3 ProcessorScheduler >> _reschedule @IP 13 [GsMethod 498439425] 2: 20 (OOP_NIL) 1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) <--framePtr=0x7f6298ea7068 AR[13] 4 GsProcess >> _wait @IP 13 [GsMethod 260628481] rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) <--framePtr=0x7f6298ea7060 AR[12] 5 Delay >> wait @IP 54 [GsMethod 498471937] rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3) <--framePtr=0x7f6298ea7058 AR[11] 6 WAGsSwazooAdaptor >> start @IP 20 [GsMethod 3440353793] rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601 FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10] 7 WAServerAdaptor (C) >> startOn: @IP 27 [GsMethod 3494994433] 2: 0x7f627f50e858 oid:20135281153 (cls:42137601 FinWorksGsSwazooAdaptor) size:5) 1: 64026 (SmallInteger 8003) rcvr: 0x7f6291984060 oid:42137601 (cls:42128385 FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038 AR[7] 8 WAGemStoneRunSeasideGems >> startOn: @IP 13 [GsMethod 3494978817] 1: 64026 (SmallInteger 8003) rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems) size:3) <--framePtr=0x7f6298ea7028 AR[5] 9 WAGemStoneRunSeasideGems (C) >> startGemServerOn: @IP 21 [GsMethod 3494925057] 2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems) size:3) 1: 64026 (SmallInteger 8003) rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729 WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010 AR[2] 10 (Executed Code) @IP 71 [GsMethod 0x7f6294094058] rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1] _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi,
Ok, so this is not an obvious one - no responses. We are getting desperate as it is happening constantly now. Any ideas are welcome, please. We thought that we could reproduce this problem with the stack trace below, but this appears to be less useful because it only shows the stack of the one active thread. It does mean that other threads could be in an unhappy state, but we can't see it. Do you know if sigusr1 will dump more in GS 3? Is the Zinc HTTP server perhaps a better option (in stead of Swazoo)? Thanks Otto ---------- Forwarded message ---------- From: Otto Behrens <[hidden email]> Date: Wed, Oct 30, 2013 at 11:50 AM Subject: Swazoo server hangs To: "[hidden email]" <[hidden email]> Hi, We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4. We run 4 swazoo servers reverse proxied behind nginx. The problem is that our Swazoo server hangs up. There is a socket listening on the known port. The process is idle, but it does not respond to requests - connecting to the port times out. Sending kill -USR1 <pid> gives us the output below. We have a monitoring process that picks up this condition and kills the process. But this causes the site to be unresponsive until it starts up again. Do you have any suggestion how to solve this? Have you seen this problem on your applications? Will an upgrade to GS 3 help? Thanks Otto GemStone signal handler: signal 10 (SIGUSR1), received from process 28660 userId 1000 si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or abort(3C) Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST 2013 End of C-level stack: ----------- Lock not acquired - retrying LOG ENTRY: Session lock denied: 2075----------- Printing Smalltalk stack for memory usage diagnosis: Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01 AM.783 SAST] iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20 1 = TOP OF STACK, stackDepth = 10 1 ProcessorScheduler >> _reapEvents: @IP 132 [GsMethod 498429953] 16: 0x7f6265153ff0 (cls:66817 Array) size:0) 15: 10 (SmallInteger 1) 14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18] VC at 0x7f6265153f60 VC.unwindBlock= 20 (OOP_NIL) VC.serialNum= 3005013794875392082 (SmallInteger 375626724359424010) 13: 3005013794875392082 (SmallInteger 375626724359424010) 12: 20 (OOP_NIL) 11: 20 (OOP_NIL) 10: 0x7f6265153ff0 (cls:66817 Array) size:0) 9: 0x7f6265153ff0 (cls:66817 Array) size:0) 8: 268 (OOP_TRUE) 7: 2 (SmallInteger 0) 6: 26 (SmallInteger 3) 5: 80002 (SmallInteger 10000) 4: 10 (SmallInteger 1) 3: 11065002175282 (SmallInteger 1383125271910) 2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3) 1: 268 (OOP_TRUE) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) [framePtr=0x7f6298ea7090 AR[18]] 2 ProcessorScheduler >> _findReadyProcess @IP 13 [GsMethod 498434561] 1: 20 (OOP_NIL) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) <--framePtr=0x7f6298ea7080 AR[16] 3 ProcessorScheduler >> _reschedule @IP 13 [GsMethod 498439425] 2: 20 (OOP_NIL) 1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) <--framePtr=0x7f6298ea7068 AR[13] 4 GsProcess >> _wait @IP 13 [GsMethod 260628481] rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) <--framePtr=0x7f6298ea7060 AR[12] 5 Delay >> wait @IP 54 [GsMethod 498471937] rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3) <--framePtr=0x7f6298ea7058 AR[11] 6 WAGsSwazooAdaptor >> start @IP 20 [GsMethod 3440353793] rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601 FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10] 7 WAServerAdaptor (C) >> startOn: @IP 27 [GsMethod 3494994433] 2: 0x7f627f50e858 oid:20135281153 (cls:42137601 FinWorksGsSwazooAdaptor) size:5) 1: 64026 (SmallInteger 8003) rcvr: 0x7f6291984060 oid:42137601 (cls:42128385 FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038 AR[7] 8 WAGemStoneRunSeasideGems >> startOn: @IP 13 [GsMethod 3494978817] 1: 64026 (SmallInteger 8003) rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems) size:3) <--framePtr=0x7f6298ea7028 AR[5] 9 WAGemStoneRunSeasideGems (C) >> startGemServerOn: @IP 21 [GsMethod 3494925057] 2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems) size:3) 1: 64026 (SmallInteger 8003) rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729 WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010 AR[2] 10 (Executed Code) @IP 71 [GsMethod 0x7f6294094058] rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1] _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Thanks, Paul.
> Does your monitor process kill & restart all the Swazoo servers at once or one at a time as it detects the misbehaving? I know you can setup daemontools to kill one at a time if you're not already e.g. We kill one at a time. A cron job. Daemontools start it up again when it sees the process is dead. > http://stackoverflow.com/questions/10650686/how-to-supervise-a-webserver-with-daemontools/10663912#10663912 This looks like a good solution - perhaps even better than a cron job. But the result should be the same though. > Can you use the FastCGI server which I think was officially supported for 2.4? As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better. We initially used FastCGI but then switched to Hyper and then Swazoo, simply because it was easier for us to debug as we can use something like curl to do a get and see what the server pops up. I suppose that if fast cgi has some nice tools to talk to the GS FastCGI server it would be as good. But don't you like the simplicity of talking HTTP all the way through? Why must FastCGI be better? _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 01.11.2013, at 15:44, Otto Behrens <[hidden email]> wrote:
> Thanks, Paul. > >> Does your monitor process kill & restart all the Swazoo servers at once or one at a time as it detects the misbehaving? I know you can setup daemontools to kill one at a time if you're not already e.g. > > We kill one at a time. A cron job. Daemontools start it up again when > it sees the process is dead. > >> http://stackoverflow.com/questions/10650686/how-to-supervise-a-webserver-with-daemontools/10663912#10663912 > > This looks like a good solution - perhaps even better than a cron job. > But the result should be the same though. https://github.com/Monty/GemStone_daemontools_setup for an application of this. > >> Can you use the FastCGI server which I think was officially supported for 2.4? As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better. > > We initially used FastCGI but then switched to Hyper and then Swazoo, > simply because it was easier for us to debug as we can use something > like curl to do a get and see what the server pops up. I suppose that > if fast cgi has some nice tools to talk to the GS FastCGI server it > would be as good. But don't you like the simplicity of talking HTTP > all the way through? Why must FastCGI be better? FastCGI eliminates much of HTTP's verboseness, it is a binary protocol specifically aimed at (reverse-proxy)<->(app-server) setups. I can confirm that this setup works _really_ well for GLASS Best -Tobias _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass signature.asc (1K) Download Attachment |
In reply to this post by otto
Sorry Otto, I have been in Argentina this last week and haven't been able to focus on "hard problems" ... Swazoo has always been a bit dicey on GemStone, which is one of the main reasons to prefer FastCGI. I think that Zinc is more stable - Johan uses Zinc in production (and perhaps others) ... Also Zinc has support for running client-side HTTP which is very convenient ...
I will have to dig in a bit more ... but I just seem to recall that I didn't feel comfortable with the level of bugfixing that needed to go on ... Dale ----- Original Message ----- | From: "Otto Behrens" <[hidden email]> | To: [hidden email] | Sent: Wednesday, October 30, 2013 2:50:23 AM | Subject: [Glass] Swazoo server hangs | | Hi, | | We are running GS 2.4.4.4 with Seaside30 3.0.7 and Swazoo2 2.2.0.4. | | We run 4 swazoo servers reverse proxied behind nginx. The problem is | that our Swazoo server hangs up. There is a socket listening on the | known port. The process is idle, but it does not respond to requests | - | connecting to the port times out. | | Sending kill -USR1 <pid> gives us the output below. We have a | monitoring process that picks up this condition and kills the | process. | But this causes the site to be unresponsive until it starts up again. | | Do you have any suggestion how to solve this? Have you seen this | problem on your applications? Will an upgrade to GS 3 help? | | Thanks | Otto | | GemStone signal handler: signal 10 (SIGUSR1), received from process | 28660 userId 1000 | si_code: 0, SI_USER, signal from kill(2), sigsend(2), raise(3C) or | abort(3C) | | Begin attempt to print C-level stack at: Wed Oct 30 11:28:01 SAST | 2013 | | | End of C-level stack: | | ----------- Lock not acquired - retrying LOG ENTRY: Session lock | denied: 2075----------- | | Printing Smalltalk stack for memory usage diagnosis: | Smalltalk stack: printing to topaz .out file at [10/30/2013 11:28:01 | AM.783 SAST] | iS->ARStackPtr = 0x7f6298ea70a0, offset from base = 20 | 1 = TOP OF STACK, stackDepth = 10 | | 1 ProcessorScheduler >> _reapEvents: @IP 132 [GsMethod 498429953] | 16: 0x7f6265153ff0 (cls:66817 Array) size:0) | 15: 10 (SmallInteger 1) | 14: 10 (SmallInteger 1) <--framePtr=0x7f6298ea7090 AR[18] | VC at 0x7f6265153f60 VC.unwindBlock= 20 (OOP_NIL) VC.serialNum= | 3005013794875392082 (SmallInteger 375626724359424010) | 13: 3005013794875392082 (SmallInteger 375626724359424010) | 12: 20 (OOP_NIL) | 11: 20 (OOP_NIL) | 10: 0x7f6265153ff0 (cls:66817 Array) size:0) | 9: 0x7f6265153ff0 (cls:66817 Array) size:0) | 8: 268 (OOP_TRUE) | 7: 2 (SmallInteger 0) | 6: 26 (SmallInteger 3) | 5: 80002 (SmallInteger 10000) | 4: 10 (SmallInteger 1) | 3: 11065002175282 (SmallInteger 1383125271910) | 2: 0x7f627f5ea790 (cls:92929 SortedCollection) size:3) | 1: 268 (OOP_TRUE) | rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) | [framePtr=0x7f6298ea7090 AR[18]] | | 2 ProcessorScheduler >> _findReadyProcess @IP 13 [GsMethod | 498434561] | 1: 20 (OOP_NIL) | rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) | <--framePtr=0x7f6298ea7080 AR[16] | | 3 ProcessorScheduler >> _reschedule @IP 13 [GsMethod 498439425] | 2: 20 (OOP_NIL) | 1: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) | rcvr: 0x7f627f49d758 (cls:116481 ProcessorScheduler) size:11) | <--framePtr=0x7f6298ea7068 AR[13] | | 4 GsProcess >> _wait @IP 13 [GsMethod 260628481] | rcvr: 0x7f627f5ec4a8 oid:289184257 (cls:99841 GsProcess) size:22) | <--framePtr=0x7f6298ea7060 AR[12] | | 5 Delay >> wait @IP 54 [GsMethod 498471937] | rcvr: 0x7f6265153f10 (cls:115969 Delay) size:3) | <--framePtr=0x7f6298ea7058 AR[11] | | 6 WAGsSwazooAdaptor >> start @IP 20 [GsMethod 3440353793] | rcvr: 0x7f627f50e858 oid:20135281153 (cls:42137601 | FinWorksGsSwazooAdaptor) size:5) <--framePtr=0x7f6298ea7050 AR[10] | | 7 WAServerAdaptor (C) >> startOn: @IP 27 [GsMethod 3494994433] | 2: 0x7f627f50e858 oid:20135281153 (cls:42137601 | FinWorksGsSwazooAdaptor) size:5) | 1: 64026 (SmallInteger 8003) | rcvr: 0x7f6291984060 oid:42137601 (cls:42128385 | FinWorksGsSwazooAdaptor (C) ) size:19) <--framePtr=0x7f6298ea7038 | AR[7] | | 8 WAGemStoneRunSeasideGems >> startOn: @IP 13 [GsMethod 3494978817] | 1: 64026 (SmallInteger 8003) | rcvr: 0x7f627f5f99e8 (cls:1470662657 WAGemStoneRunSeasideGems) | size:3) | <--framePtr=0x7f6298ea7028 AR[5] | | 9 WAGemStoneRunSeasideGems (C) >> startGemServerOn: @IP 21 | [GsMethod 3494925057] | 2: 0x7f627f50e948 (cls:1470662657 WAGemStoneRunSeasideGems) | size:3) | 1: 64026 (SmallInteger 8003) | rcvr: 0x7f62919839b0 oid:1470662657 (cls:1470665729 | WAGemStoneRunSeasideGems (C) ) size:19) <--framePtr=0x7f6298ea7010 | AR[2] | | 10 (Executed Code) @IP 71 [GsMethod 0x7f6294094058] | rcvr: 20 (OOP_NIL) <--framePtr=0x7f6298ea7008 AR[1] | _______________________________________________ | Glass mailing list | [hidden email] | http://lists.gemtalksystems.com/mailman/listinfo/glass | _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
> Sorry Otto, I have been in Argentina this last week and haven't been able to focus on "hard problems" ... Swazoo has always been a bit dicey on GemStone, which is one of the main reasons to prefer FastCGI. I think that Zinc is more stable - Johan uses Zinc in production (and perhaps others) ... Also Zinc has support for running client-side HTTP which is very convenient ...
No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot, so thinking that using the server is reducing the number of components. > I will have to dig in a bit more ... but I just seem to recall that I didn't feel comfortable with the level of bugfixing that needed to go on ... Another avenue seems to be a better option. Johan, are you using the Zinc server as reverse proxied servers in stead of FastCGI? If so, what's your experience? Is it stable? On GS 3 or GS 2? _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Otto,
Johan is running on GemStone 2.x, but I can't say for sure whether or not he might be using version of Zinc with some private bugfixes ... Dale ----- Original Message ----- | From: "Otto Behrens" <[hidden email]> | To: "Dale K. Henrichs" <[hidden email]>, "Johan Brichau" <[hidden email]> | Cc: [hidden email] | Sent: Tuesday, November 5, 2013 9:25:39 AM | Subject: Re: [Glass] Swazoo server hangs | | > Sorry Otto, I have been in Argentina this last week and haven't | > been able to focus on "hard problems" ... Swazoo has always been a | > bit dicey on GemStone, which is one of the main reasons to prefer | > FastCGI. I think that Zinc is more stable - Johan uses Zinc in | > production (and perhaps others) ... Also Zinc has support for | > running client-side HTTP which is very convenient ... | | No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot, | so thinking that using the server is reducing the number of | components. | | > I will have to dig in a bit more ... but I just seem to recall that | > I didn't feel comfortable with the level of bugfixing that needed | > to go on ... | | Another avenue seems to be a better option. | | Johan, are you using the Zinc server as reverse proxied servers in | stead of FastCGI? If so, what's your experience? Is it stable? On GS | 3 | or GS 2? | _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hey guys,
We are using the Zinc client (version 1.7) in production, not the server. We use the FastCGI adaptors behind an nginx server. This works perfectly well, so we have had no reason to try using the Zinc server. We did try Swazoo in the early days and we had noticeable slower responses, so we never really continued to try using that one. Oh, and we are using both GS 2.4.x and GS 3.1.x setups in production. I would be interested to try the Zinc server sometimes, but there is still some work to do I think [1] cheers, Johan [1] https://github.com/glassdb/zinc/issues?state=open On 05 Nov 2013, at 19:08, Dale K. Henrichs <[hidden email]> wrote: > Otto, > > Johan is running on GemStone 2.x, but I can't say for sure whether or not he might be using version of Zinc with some private bugfixes ... > > Dale > > ----- Original Message ----- > | From: "Otto Behrens" <[hidden email]> > | To: "Dale K. Henrichs" <[hidden email]>, "Johan Brichau" <[hidden email]> > | Cc: [hidden email] > | Sent: Tuesday, November 5, 2013 9:25:39 AM > | Subject: Re: [Glass] Swazoo server hangs > | > | > Sorry Otto, I have been in Argentina this last week and haven't > | > been able to focus on "hard problems" ... Swazoo has always been a > | > bit dicey on GemStone, which is one of the main reasons to prefer > | > FastCGI. I think that Zinc is more stable - Johan uses Zinc in > | > production (and perhaps others) ... Also Zinc has support for > | > running client-side HTTP which is very convenient ... > | > | No worries, Dale, thanks. Yes, using the Zinc client-side HTTP a lot, > | so thinking that using the server is reducing the number of > | components. > | > | > I will have to dig in a bit more ... but I just seem to recall that > | > I didn't feel comfortable with the level of bugfixing that needed > | > to go on ... > | > | Another avenue seems to be a better option. > | > | Johan, are you using the Zinc server as reverse proxied servers in > | stead of FastCGI? If so, what's your experience? Is it stable? On GS > | 3 > | or GS 2? > | _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by otto
Otto,
I just scanned through the mail thread (seem to have missed it before). First off: given the number of problems you have using Swazoo and that Zinc server has not been battle tested in Gemstone (and there are open issues nobody really looked at), I definitely recommend to switch (back) to FastCGI. It is stable and fast. But, of course, it would be great if you can flesh out the remaining issues with Zinc server on Gemstone ;-) Second, are you seeing the lock ups occurring frequently? Are they irregular or is there a pattern? I am asking this because we do have a similar problem that occurs (rather infrequently) with FastCGI adaptors for Seaside [1]: A seaside gem will become unresponsive after some time. I already managed to find out that the gateSemaphore of a quit system could still be less than 10 (i.e. some processes got locked and never signaled the semaphore) and that it might have something to do with the front-end server dropping connections. I'm not sure if these problems are related though. Johan [1] https://code.google.com/p/glassdb/issues/detail?id=341 On 01 Nov 2013, at 15:44, Otto Behrens <[hidden email]> wrote: >> Can you use the FastCGI server which I think was officially supported for 2.4? As far as I know Zinc works on 2.4 and people use it in production, but others would be better able to address its suitability. As a test you could add a Zinc and/or FastCGI server or two to your pool of 4 Swazoo servers and see if things change for the better. > > We initially used FastCGI but then switched to Hyper and then Swazoo, > simply because it was easier for us to debug as we can use something > like curl to do a get and see what the server pops up. I suppose that > if fast cgi has some nice tools to talk to the GS FastCGI server it > would be as good. But don't you like the simplicity of talking HTTP > all the way through? Why must FastCGI be better? _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Thanks for the input Johan.
> First off: given the number of problems you have using Swazoo and that Zinc server has not been battle tested in Gemstone (and there are open issues nobody really looked at), I definitely recommend to switch (back) to FastCGI. It is stable and fast. But, of course, it would be great if you can flesh out the remaining issues with Zinc server on Gemstone ;-) Thanks, really can't work on Zinc now, pressure => battle tested FastCGI. > Second, are you seeing the lock ups occurring frequently? Are they irregular or is there a pattern? Yes, there's a pattern. Someone else looked at the problem, but here's my laymens interpretation. We picked up the pattern when we had the same ajax call on the onblur and onchange events on the same html element. This caused virtually simultaneous calls to 2 different Swazoo servers with the same session (and action) id. This caused a conflict and one process retries. When it retries, it tries to read from the socket again, which has already been read on the first try (hey, there's no 2 phase commit on reading from sockets?). So, something like that. In principle, when we read from / write to external systems and a commit fails in GS, we generally have a problem. Does this make sense? I can get more details if you need. > I am asking this because we do have a similar problem that occurs (rather infrequently) with FastCGI adaptors for Seaside [1]: > A seaside gem will become unresponsive after some time. I already managed to find out that the gateSemaphore of a quit system could still be less than 10 (i.e. some processes got locked and never signaled the semaphore) and that it might have something to do with the front-end server dropping connections. I'm not sure if these problems are related though. Does not sound as if they are related, but I suppose it could be. Thanks Otto _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
----- Original Message ----- | From: "Otto Behrens" <[hidden email]> | To: "Johan Brichau" <[hidden email]> | Cc: "Dawie Strauss" <[hidden email]>, [hidden email] | Sent: Wednesday, November 6, 2013 4:49:25 AM | Subject: Re: [Glass] Swazoo server hangs | | Thanks for the input Johan. | | > First off: given the number of problems you have using Swazoo and | > that Zinc server has not been battle tested in Gemstone (and there | > are open issues nobody really looked at), I definitely recommend | > to switch (back) to FastCGI. It is stable and fast. But, of | > course, it would be great if you can flesh out the remaining | > issues with Zinc server on Gemstone ;-) | | Thanks, really can't work on Zinc now, pressure => battle tested | FastCGI. | | > Second, are you seeing the lock ups occurring frequently? Are they | > irregular or is there a pattern? | | Yes, there's a pattern. | | Someone else looked at the problem, but here's my laymens | interpretation. We picked up the pattern when we had the same ajax | call on the onblur and onchange events on the same html element. This | caused virtually simultaneous calls to 2 different Swazoo servers | with | the same session (and action) id. This caused a conflict and one | process retries. When it retries, it tries to read from the socket | again, which has already been read on the first try (hey, there's no | 2 | phase commit on reading from sockets?). So, something like that. In | principle, when we read from / write to external systems and a commit | fails in GS, we generally have a problem. | | Does this make sense? I can get more details if you need. This makes a lot of sense ... I have never really hammered Swazoo under load, like I have FastCGI, so this pattern of retry on failed commit has probably never been tested ... The Zinc code will have to undergo similar load testing before it's ready... | | > I am asking this because we do have a similar problem that occurs | > (rather infrequently) with FastCGI adaptors for Seaside [1]: | > A seaside gem will become unresponsive after some time. I already | > managed to find out that the gateSemaphore of a quit system could | > still be less than 10 (i.e. some processes got locked and never | > signaled the semaphore) and that it might have something to do | > with the front-end server dropping connections. I'm not sure if | > these problems are related though. | | Does not sound as if they are related, but I suppose it could be. | I think it is something different as well, but I would like to get this problem under a microscope some day... According to Google Issue #341, there might be a correlation to commit conflicts and I mention a suspicion about ensure blocks ... the issue with ensure blocks is that when an error occurs during the execution of ensure blocks, the rest of the ensure blocks might not get evaluated ... so this vulnerability may be causing Swazoo to misbehave as well ... Johan, it might be worth adding some logging in the ensure blocks associated with the gateSemaphore to eliminate this as a possible problem.. Dale [1] https://code.google.com/p/glassdb/issues/detail?id=341&q=fastCGI&colspec=ID%20Type%20Status%20Priority%20GLASS%20Version%20Milestone%20Owner%20Summary%20bugid%20Fixed _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 06 Nov 2013, at 16:27, Dale K. Henrichs <[hidden email]> wrote: > According to Google Issue #341, there might be a correlation to commit conflicts and I mention a suspicion about ensure blocks ... the issue with ensure blocks is that when an error occurs during the execution of ensure blocks, the rest of the ensure blocks might not get evaluated ... so this vulnerability may be causing Swazoo to misbehave as well ... That's interesting intel. I never understood your mention of 'ensure block' bug that way. I'll take a look because it does ring a bell that the last things that seems to show up in the gem log before the unresponsiveness are commit conflict retries... > Johan, it might be worth adding some logging in the ensure blocks associated with the gateSemaphore to eliminate this as a possible problem.. Yes, that is a good idea. Just today I eliminated my previous suspicion that socket disconnects by the front-end server might be related to this. I managed to confirm they are not related at all. But weeks may go by before we hit this bug at over 50K requests per day on a single stone with 3 seaside gems, so it's not _that_ common. Johan _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by Johan Brichau-3
Hi,
> We are using the Zinc client (version 1.7) in production, not the server. > We use the FastCGI adaptors behind an nginx server. Do you mind sending us an example nginx config that works for you? Thanks Otto _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
here's one I've used: ######################################################################### upstream seaside { server localhost:9001; server localhost:9002; server localhost:9003; } server { listen 80 default_server; server_name www.example.com example.com; root /var/www/www.example.com; gzip on; gzip_disable "msie6"; gzip_static on; gzip_vary on; gzip_proxied any; gzip_comp_level 6; gzip_buffers 16 8k; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml; location @fastCgi { include fastcgi_params; fastcgi_pass seaside; } location / { try_files $uri @getcooperation; } location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ { add_header Access-Control-Allow-Origin http://www.getcooperation.com ; expires max; add_header Cache-Control "public, must-revalidate, proxy-revalidate"; add_header Pragma public; } } ##################################################################### It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors. It sets the expiration headers for any static file and zips the static and dynamic content. If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly. On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote: > Hi, > >> We are using the Zinc client (version 1.7) in production, not the server. >> We use the FastCGI adaptors behind an nginx server. > > Do you mind sending us an example nginx config that works for you? > > Thanks > Otto > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
err. this line:
> try_files $uri @getcooperation; should be this line: try_files $uri @fastCgi; hth Paul On Nov 11, 2013, at 1:29 PM, Paul DeBruicker <[hidden email]> wrote: > > > here's one I've used: > > > ######################################################################### > upstream seaside { > server localhost:9001; > server localhost:9002; > server localhost:9003; > } > > > server { > listen 80 default_server; > server_name www.example.com example.com; > root /var/www/www.example.com; > > gzip on; > gzip_disable "msie6"; > gzip_static on; > gzip_vary on; > gzip_proxied any; > gzip_comp_level 6; > gzip_buffers 16 8k; > gzip_http_version 1.1; > gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml; > > > > location @fastCgi { > include fastcgi_params; > fastcgi_pass seaside; > } > > location / { > try_files $uri @getcooperation; > } > > location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ { > add_header Access-Control-Allow-Origin http://www.getcooperation.com ; > expires max; > add_header Cache-Control "public, must-revalidate, proxy-revalidate"; > add_header Pragma public; > } > } > > ##################################################################### > > > It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors. It sets the expiration headers for any static file and zips the static and dynamic content. If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly. > > > > > > > > > > On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote: > >> Hi, >> >>> We are using the Zinc client (version 1.7) in production, not the server. >>> We use the FastCGI adaptors behind an nginx server. >> >> Do you mind sending us an example nginx config that works for you? >> >> Thanks >> Otto >> _______________________________________________ >> Glass mailing list >> [hidden email] >> http://lists.gemtalksystems.com/mailman/listinfo/glass > _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Thanks, got it
www.FinWorks.biz +27 82 809 2375 On Mon, Nov 11, 2013 at 11:51 PM, Paul DeBruicker <[hidden email]> wrote: > err. this line: > >> try_files $uri @getcooperation; > > > should be this line: > > try_files $uri @fastCgi; > > > hth > > Paul > > > On Nov 11, 2013, at 1:29 PM, Paul DeBruicker <[hidden email]> wrote: > >> >> >> here's one I've used: >> >> >> ######################################################################### >> upstream seaside { >> server localhost:9001; >> server localhost:9002; >> server localhost:9003; >> } >> >> >> server { >> listen 80 default_server; >> server_name www.example.com example.com; >> root /var/www/www.example.com; >> >> gzip on; >> gzip_disable "msie6"; >> gzip_static on; >> gzip_vary on; >> gzip_proxied any; >> gzip_comp_level 6; >> gzip_buffers 16 8k; >> gzip_http_version 1.1; >> gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml; >> >> >> >> location @fastCgi { >> include fastcgi_params; >> fastcgi_pass seaside; >> } >> >> location / { >> try_files $uri @getcooperation; >> } >> >> location ~* ^.+\.(css|js|jpg|jpeg|gif|png|ico|svg)$ { >> add_header Access-Control-Allow-Origin http://www.getcooperation.com ; >> expires max; >> add_header Cache-Control "public, must-revalidate, proxy-revalidate"; >> add_header Pragma public; >> } >> } >> >> ##################################################################### >> >> >> It first attempts to find a file at the URI and if not found passes the request round robin style to the FastCGI adaptors. It sets the expiration headers for any static file and zips the static and dynamic content. If you gzip your css and js files and store them next to the uncompressed files nginx will serve the gzipped files, saving some CPU overhead on each request. (E.g. if your css fils is main.css then put main.css.gz next to it and nginx will server the .gz one) The dynamic content from FastCGI is gzipped on the fly. >> >> >> >> >> >> >> >> >> >> On Nov 11, 2013, at 8:51 AM, Otto Behrens <[hidden email]> wrote: >> >>> Hi, >>> >>>> We are using the Zinc client (version 1.7) in production, not the server. >>>> We use the FastCGI adaptors behind an nginx server. >>> >>> Do you mind sending us an example nginx config that works for you? >>> >>> Thanks >>> Otto >>> _______________________________________________ >>> Glass mailing list >>> [hidden email] >>> http://lists.gemtalksystems.com/mailman/listinfo/glass >> > > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Has anyone got some script that checks if a fast gci server is up?
Currently, while using Swazoo, we have a little monitoring process that does an http get to the GS server for each of the back-ends. If the swazoo server does not respond, we have some code that checks if it is busy. If we think it is not busy and it does not respond to a get, we restart it. I was thinking to do a similar "get" to the fast cgi server to verify if it is alive. Does this make sense? Thanks Otto _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Otto,
To check if a fastcgi process is up-and-running, you can use monit [1] with the following configuration. I believe I found this online somewhere, perhaps [2]. You will want to adapt the <code to start the seaside gem> part. I just copy/pasted from our configuration and removed that command since we have specific scripts for our setup. I believe it should be something like this: "startSeaside30Adaptor FastCGI 9001". Here is the part for monit. Just copy/paste it for each fastcgi port you are running. check process fastcgi_9001 with pidfile /opt/gemstone/product/seaside/data/FastCGI_server-9001.pid start program = "<code to start the seaside gem>" as uid sites and gid sites stop program = "<code to stop the seaside gem>" as uid sites and gid sites # Empty FastCGI request if failed port 9001 # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) # padding 8 bytes (0x08), followed by 8xNULLs padding send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) expect "\0x01\0x0A" timeout 10 seconds then restart Hope this helps! Johan [1] http://mmonit.com/monit/ [2] http://richard.wallman.org.uk/2010/03/monitor-a-fastcgi-server-using-monit/ On 13 Nov 2013, at 12:18, Otto Behrens <[hidden email]> wrote: > Has anyone got some script that checks if a fast gci server is up? _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Otto & Johan-
Just to offer another example I have not been using monit and instead have a bash script controlled by daemontools like this: #!/bin/bash # named httpServerHealthCheck.sh SERVICE_TO_MONITOR=/etc/service/gs_fastcgi-9001 sleep 30 curl -R -O http://127.0.0.1/f9001 RESULT=`awk 'NR=1{print $1}' f9001` if [ "$RESULT" != "/f9001" ] ; then svc -t $SERVICE_TO_MONITOR echo `date` "-" $SERVICE_TO_MONITOR "was restarted." ; fi and in my nginx config have a location block like this: location /f9001 { fastcgi_pass localhost:9001; } The bash script: 1. Sleeps for 30 seconds 2. uses curl to access a nonexistent seaside app so I get the standard Seaside 'not found' error (.e.g. "/f9001 not found") stored into a file 'f9001' next to the bash script. 3. the bash script then uses awk to store the first bit of the downloaded file in the RESULT variable. 4. the RESULT variable is then compared with the expected result, and if not found, the Gem is restarted by daemontools. 5. the bash script exits daemontools monitors the bash script and restarts it shortly after it stops running the daemontools 'run' script is: #!/bin/sh exec ./httpServerHealthCheck.sh Hope this helps Paul On Nov 15, 2013, at 12:20 PM, Johan Brichau <[hidden email]> wrote: > Hi Otto, > > To check if a fastcgi process is up-and-running, you can use monit [1] with the following configuration. I believe I found this online somewhere, perhaps [2]. > > You will want to adapt the <code to start the seaside gem> part. I just copy/pasted from our configuration and removed that command since we have specific scripts for our setup. I believe it should be something like this: "startSeaside30Adaptor FastCGI 9001". > > Here is the part for monit. Just copy/paste it for each fastcgi port you are running. > > check process fastcgi_9001 with pidfile /opt/gemstone/product/seaside/data/FastCGI_server-9001.pid > start program = "<code to start the seaside gem>" as uid sites and gid sites > stop program = "<code to stop the seaside gem>" as uid sites and gid sites > # Empty FastCGI request > if failed port 9001 > # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) > # padding 8 bytes (0x08), followed by 8xNULLs padding > send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" > # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) > expect "\0x01\0x0A" > timeout 10 seconds > then restart > > Hope this helps! > Johan > > [1] http://mmonit.com/monit/ > [2] http://richard.wallman.org.uk/2010/03/monitor-a-fastcgi-server-using-monit/ > > On 13 Nov 2013, at 12:18, Otto Behrens <[hidden email]> wrote: > >> Has anyone got some script that checks if a fast gci server is up? > _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |