Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port. Thoughts? Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Mariano,
This is what I setup for each stone. Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go. Johan ################################# ## TEMPLATE ################################# check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan DEPENDS on stone_TEMPLATE GROUP TEMPLATE # Empty FastCGI request if failed port PORT1 # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) # padding 8 bytes (0x08), followed by 8xNULLs padding send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) expect "\0x01\0x0A" timeout 40 seconds then restart check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan DEPENDS on stone_TEMPLATE GROUP TEMPLATE # Empty FastCGI request if failed port PORT2 # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) # padding 8 bytes (0x08), followed by 8xNULLs padding send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) expect "\0x01\0x0A" timeout 40 seconds then restart check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan DEPENDS on stone_TEMPLATE GROUP TEMPLATE # Empty FastCGI request if failed port PORT3 # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) # padding 8 bytes (0x08), followed by 8xNULLs padding send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) expect "\0x01\0x0A" timeout 40 seconds then restart check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan DEPENDS on stone_TEMPLATE GROUP TEMPLATE check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf if size > 4 GB then alert GROUP TEMPLATE check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan GROUP TEMPLATE On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote: > Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right? > > The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port. > > Thoughts? > > -- > Mariano > http://marianopeck.wordpress.com > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Johan, Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how?
I was planning to use: gslist -p -n myStoneName >> myStoneName.pid or something like that. Thanks in advance. On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote: Hi Mariano, Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Mariano,
Yes, that is the procedure we use for the stone pid. In our script, this happens at the end: gslist -p -n $GEMSTONE_NAME > $GEMSTONE_DATADIR/$GEMSTONE_NAME.pid cheers, Johan On 18 Aug 2014, at 15:31, Mariano Martinez Peck <[hidden email]> wrote: > Hi Johan, > > Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how? > > I was planning to use: gslist -p -n myStoneName >> myStoneName.pid > or something like that. > > Thanks in advance. > > > > On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote: > Hi Mariano, > > This is what I setup for each stone. > Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go. > > Johan > > ################################# > ## TEMPLATE > ################################# > check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid > start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan > stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan > DEPENDS on stone_TEMPLATE > GROUP TEMPLATE > # Empty FastCGI request > if failed port PORT1 > # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) > # padding 8 bytes (0x08), followed by 8xNULLs padding > send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" > # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) > expect "\0x01\0x0A" > timeout 40 seconds > then restart > check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid > start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan > stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan > DEPENDS on stone_TEMPLATE > GROUP TEMPLATE > # Empty FastCGI request > if failed port PORT2 > # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) > # padding 8 bytes (0x08), followed by 8xNULLs padding > send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" > # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) > expect "\0x01\0x0A" > timeout 40 seconds > then restart > check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid > start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan > stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan > DEPENDS on stone_TEMPLATE > GROUP TEMPLATE > # Empty FastCGI request > if failed port PORT3 > # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09) > # padding 8 bytes (0x08), followed by 8xNULLs padding > send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00" > # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A) > expect "\0x01\0x0A" > timeout 40 seconds > then restart > check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid > start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan > stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan > DEPENDS on stone_TEMPLATE > GROUP TEMPLATE > check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf > if size > 4 GB then alert > GROUP TEMPLATE > check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid > start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan > stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan > GROUP TEMPLATE > > > On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote: > > > Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right? > > > > The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port. > > > > Thoughts? > > > > -- > > Mariano > > http://marianopeck.wordpress.com > > _______________________________________________ > > Glass mailing list > > [hidden email] > > http://lists.gemtalksystems.com/mailman/listinfo/glass > > > > > -- > Mariano > http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Thanks Johan, Monit is working nice :) On Mon, Aug 18, 2014 at 7:59 PM, Johan Brichau <[hidden email]> wrote: Hi Mariano, Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Johan, I have been having some problems when running monit scripts together with init.d scripts. So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?
if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again. In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
I can explain with more details what I am doing exactly, but just wanted to know what you were doing. Thanks!
On Tue, Aug 19, 2014 at 3:52 PM, Mariano Martinez Peck <[hidden email]> wrote:
Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote: > So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily? In case of reboot, I have found that monit does not work well to get everything up and running fast. So, yes, I am using init.d to start all stones and gems. > if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again. Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that... > In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts. > > I can explain with more details what I am doing exactly, but just wanted to know what you were doing. At the moment, I'm not doing anything automated with the lock files. If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them. It would be great to know and develop a common strategy for everyone to follow... Johan _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
The presence of lock files will prevent processes from starting, so they might need to be cleared. They can be cleared by gslist with the ‘-c’ option and/or you could clear them as part of an init.d script at system boot time. Obviously, they should not be always cleared as part of a general start-up script since this would defeat the purpose, though a ‘gslist -c’ could be done anytime (my environments typically have an alias for gslist=‘gslist -clv’).
James On Aug 20, 2014, at 4:17 AM, Johan Brichau <[hidden email]> wrote: > On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote: > >> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily? > > In case of reboot, I have found that monit does not work well to get everything up and running fast. > So, yes, I am using init.d to start all stones and gems. > >> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again. > > Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that... > >> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts. >> >> I can explain with more details what I am doing exactly, but just wanted to know what you were doing. > > At the moment, I'm not doing anything automated with the lock files. > If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them. > > It would be great to know and develop a common strategy for everyone to follow... > > Johan > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by Johan Brichau-3
On Wed, Aug 20, 2014 at 5:17 AM, Johan Brichau <[hidden email]> wrote:
Yes, me too. Ok, so agree here. But...I had yet another problem. Upon startup...no problem. But upon shutdown, it seemed that monit was shutting down AFTER my init.d script. Therefore...I was shutting down in init.d and monit was alive putting them alive again. And that (I think), let me to strange situations. So I had to change the priorities of my init script to be sure upon startup it runs before monit and in shutdown, after.
You didn't have problems with this? Or maybe just by chance your startup/shutdown order is like that?
OK. I see.
What I did is in my init script I do: case "$1" in start)
printf "%-50s" "Starting $NAME..." find /opt/gemstone/locks/ -type f -not -name 'gemstone.hostid' -delete /bin/bash /opt/gemstoneAdditions/scripts/startStonesAll.sh
/bin/bash /opt/gemstoneAdditions/scripts/startSeasideGemsAll.sh ;; note the: find /opt/gemstone/locks/ -type f -not -name 'gemstone.hostid' -delete
Why? Basically, I assume that if the system is starting, none process should be alive, hence all locks files should be deleted. Is this assumption safe? In any case, I think this is a workaround. I would like to know why some lock files are not being deleted. From what I understand, lock files should always be removed, unless you do a kill -9 or something. And I am not doing that.
Thanks, Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by James Foster-9
On Wed, Aug 20, 2014 at 7:56 AM, James Foster <[hidden email]> wrote: The presence of lock files will prevent processes from starting, so they might need to be cleared. They can be cleared by gslist with the ‘-c’ option and/or you could clear them as part of an init.d script at system boot time. Obviously, they should not be always cleared as part of a general start-up script since this would defeat the purpose, mmmm why? I do run it always in my init.t script as part of my startup. See my other answer to Johan. Is my assumption wrong? though a ‘gslist -c’ could be done anytime (my environments typically have an alias for gslist=‘gslist -clv’). I tried that yesterday in my server when I had these lock files of dead process, and gslist -c did nothing. It simply displayed the stones and it didn't delete anything from /opt/gemstone/locks and I am sure the processes associated to those locks are dead.
Thanks James Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Ohh and I have yet another question. At night, I have some scripts that run for GC , backups, and other stuff. For this, what I do is to stop all seaside gems so that I don't get gslocks or waiting for vote or whatever during MFC or the other tasks. So at the very beginning I turn off seaside gems, and then, once I am done with everything, I start them again. Of course, now with monit it could happen that monit starts again the seaside gems I turned off. So, what do you do? I was thinking to execute: sudo monit unmonitor xxxx where xxxx is every name of every monit file (one per stone)... Shutting down monit is also an option but maybe too much. Thoughts? On Wed, Aug 20, 2014 at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:
Mariano http://marianopeck.wordpress.com _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Administrator
|
In reply to this post by Mariano Martinez Peck
There is a golden rule for programming by contract which should apply here, too. Initialisation and termination should have reverse sequences. A subclass initialiser needs to perform super class initialisation first, but when cleaning up, it cleans up its own stuff first, then the rest according to its super class rules. The same pattern should apply in test set up and tear down sequences, and it should apply to system start up and shutdown. I don't know about Unix and init.d, so I don't know if the system design stops things in reverse from how it starts them of if you need to manage it. If I had to guess, I would expect the system to already work the right way, by design. |
In reply to this post by Johan Brichau-3
On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote: Hi Mariano, Hi Johan, Sorry for reviving this very old thread. But I have a doubt. Why do you do that special send / expect of the FastCGI protocol rather than a simple: if failed port PORT1 then restart Is this in case that the gem could be somehow "hanged" yet still having the port still open? Any other reason? I ask because I am experimenting something which I am not sure this is the expected behavior. Basically, what happens to me is that if the gem is up and alive, monit (when it checks the status of the process) will be waiting for those 40 seconds before continue checking the next process. Is this expect? It is happening the same to you? which timeout you have in your webserver for the upstreams? 40 seconds too? Thanks in advance, Thanks in advance, check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Wed, Feb 18, 2015 at 7:07 PM, Mariano Martinez Peck <[hidden email]> wrote:
Let me re-ask what is the more important question I have.... for that monit timeout "timeout 40 seconds" I must setup a number like that, or exactly the timeout I define at my web server. For example, I have loooong request (yes, I know, I should be using service VM...but until then..), so my timeout for the upstreams at nginx level is 5 minutes. So which timeout do I need to set here? 5 minutes too? Or still 40 seconds? I ask because I don't want the scenario where I am processing a long request (imagine one of 4 minutes) and then monit after 40 seconds thinks the server is down and hence triggers a restar. So I wonder...is the FastCGI server able to answer the Monit send even if it is processing a request? or the return to Monit will only be once it finishes processing current request? That might tell me which timeout should I set here. Thanks in advance!!
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
We do sometimes have a gem that stops being responsive or a gem that is just processing a request that takes way too long (blocking other users). In those cases, I prefer to kill it and have it restarted.
aha… I did not notice that actually. I never actually checked either. I just assumed monit has it’s ‘cycle’ doing these checks every xx minutes (this is configurable). So… as long as the check is done every xxx time, I’m comfortable with it.
Approximately yes. I do some experimenting with these numbers to see what happens but they are mostly set at 40 or 50s for Yesplan. If monit has to wait that long to get a response, it means a user has to do that too and, hence, we let monit reboot the gem. If it is was processing stuff, we just killed that guy and if not, we made it responsive again. I’m sure there are better ways but this is what we are doing for now. Johan _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
Yes, you should use a service vm :) Anyway, our nginx timeout is set to 60s and I set the monit timeout always lower than that. We stick to the 60s timeout as the hard deadline for any request in Yesplan. Monit might sometimes cut one shorted but this raises an alarm for us to take a look what’s happening and see if we can fix the performance.
If you want to allow longer requests, then make sure to set the timeout longer. GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request. So, what you say is correct: if a gem is busy processing requests, it will not answer to monit. That’s how we work right but I have to admit this is still the same way as how we got started a few years ago. I should try to explore if there are better ways, so don’t take my advice for granted :) Johan _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Thu, Feb 19, 2015 at 5:05 PM, Johan Brichau <[hidden email]> wrote:
OK..good points.
mmmmm are we 100% sure about this? I mean...at which level is the "lock" inside the gem to block requests? I thought it was at kind of at seaside session level. If monit request is a plain fast cgi request (outside seaside context) maybe the gem CAN answer because the lock happens further? Let me CC Dale just in case ;)
hehehe okok. Thanks in either case! _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 2/19/15 1:36 PM, Mariano Martinez
Peck wrote:
Depending upon how deep the monit request goes into the request handling stack for FastCGI, yes. There is a session lock that is applied once we've processed enough of the request to know what session we're dealing with. I think if the session is already locked we kick out and retry the http request (don't recall exactly). Then down the stack a ways in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we acquire a transaction mutex, and this is where we ensure that only one request is processed at a time (inside the transaction). If monit makes it this far, then it will wait for the transaction mutex to be released ... For FastCGI, there is a gateway semaphore that only lets 10 concurrent threads to be forked for queuing up on the transaction mutex ... I found in testing that under high enough loads, the gem would blow up with an out of memory that was caused by too many threads sitting idle waiting for the transaction mutex ... So monit could be blocking here, if there a number of outstanding requests ... of course there isn't much difference between waiting on the gateway semaphore and blocking on the transaction mutex ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Dale,
I’m a suspicious about the ’10 concurrent threads’ per gem. I know this is what the code is supposed to do, but I always encounter a deadlock when a (seaside) request in process sends a request that is to be treated by the same seaside gem, but for a completely different application (and thus certainly not the same seaside session). I would expect that the gem accepts the request and is able to process it because 10 concurrent requests can be accepted, nevertheless I always get a timeout. On the other hand, if you remember, we did hit and fix a bug with the semaphore back at ESUG in Edinburgh that clearly indicated that another request _can_ be processed while another one is sitting idle. However, there does not seem to be a guarantee in my experience. What I suspect (but you can probably tell me more :) is that the request in process not always yields to a process that would accept the incoming request. Just yesterday I got at such a situation because I am fine-tuning the automated tests of our application. In that scenario, there is a single Swazoo (or Zinc) server running and a request in process is making a (Zinc) http call to the localhost for a different Seaside application (but thus served by the same gem). In a real usecase, this call is done in the service vm but for testing purposes this http call is made synchronously. However, I get a timeout for this second call. I hit this both in Gemstone 2.4 and 3.1 Johan
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Sorry for the late response, I've been sick the last week or so and
I'm just starting to dig out of my email hole:)
On 2/21/15 6:40 AM, Johan Brichau
wrote:
Hi Dale,The 10 concurrent threads is a function of how many accept() calls are processed (gateway semaphore)... the gem is still only going to process 1 request at a time (transaction mutex) so I guess I'd like to see some stacks with these errors ... is the timeout happening because of an accept() timeout or is the timout happening because the overall processing time has taken too lon? I don't remember those details ... do you have a reference to the code we fixed? Okay ... you are being deadlocked by the transaction mutex ... with the transaction mutex there will only ever be on request being processes at any one time ... so with a single gem and making a client http call from within the single active seaside request (with the transaction mutex) you are guaranteed to get deadlock ...
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |