one click death

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

one click death

NorbertHartl
I created a new pier instance today. A few minutes after there was a 500 error. It wasn't just one page. It was stone wide. Every pier instance just  showed 500. By restarting the stone it could be solved. It took me a while to find the cause. In my newly created instance the /about/syntax page caused it. On click on this link and I have to restart the whole gemstone process.

To be sure I deleted the hierarchy under /about and recreated it. But if this will happen again I'm not sure where to look at. Is there a way to enable fast cgi to snapp off a continuation of an error occurrs? Too spooky for me.

Norbert
Reply | Threaded
Open this post in threaded view
|

Re: one click death

Dale
I have tried to ensure that at a minimum there is a stack that gets dumped to the gem log file. The 500 is an indication that a gem has died because of a error that could not be handled. there are only a few errors that cannot be handled, but the log for the gem should contain the clues.

GemSource has been running continuously (except for a couple of power outages/machine crashes) for nearly 3 years now. One of the "tricks" that I use is to automatically restart the gem whenever it exits. If a particular file exists the restart loop is exited (so I can shut down the system).

I really feel that for most of the cases when a gem crashes this is the correct behavior ... one of the error that cannot be handled is an out-of-memory condition that can occur if there is even a small "object leak".

I haven't published the script, because it isn't bullet proof (If you start a gem and the stone is gone, the autstart script runs in a very hot loop)...

I remember that folks have posted potential solutions in the past and I have worked a bit on improving the restart logic, but I don't think I've gotten to a bullet proof solution yet...

This isn't a direct solution to your problem, but if you had some restart logic for the gems, the entire system probably wouldn't fall over (unless the error involves the stone itself:)

Dale
----- "Norbert Hartl" <[hidden email]> wrote:

| I created a new pier instance today. A few minutes after there was a
| 500 error. It wasn't just one page. It was stone wide. Every pier
| instance just  showed 500. By restarting the stone it could be solved.
| It took me a while to find the cause. In my newly created instance the
| /about/syntax page caused it. On click on this link and I have to
| restart the whole gemstone process.
|
| To be sure I deleted the hierarchy under /about and recreated it. But
| if this will happen again I'm not sure where to look at. Is there a
| way to enable fast cgi to snapp off a continuation of an error
| occurrs? Too spooky for me.
|
| Norbert
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl

On 25.02.2010, at 18:45, Dale Henrichs wrote:

> I have tried to ensure that at a minimum there is a stack that gets dumped to the gem log file. The 500 is an indication that a gem has died because of a error that could not be handled. there are only a few errors that cannot be handled, but the log for the gem should contain the clues.
>
Ok, I got it again. It is suddenly all /about/syntax pages in all instances. I have no glue what has been changed. But it is an endless loop between PRValueLink and PRDeepRenderer.

> GemSource has been running continuously (except for a couple of power outages/machine crashes) for nearly 3 years now. One of the "tricks" that I use is to automatically restart the gem whenever it exits. If a particular file exists the restart loop is exited (so I can shut down the system).
>
If this is a shell script you can do just

while [ /bin/true ];
do
   ...start process...
   GEMPID=$!

   trap 'kill -15 $GEMPID' EXIT HUP INT QUIT TERM
   wait $GEMPID
done

This shuts down both intertwingled processes if the outer one gets a signal. Just in case this is interesting for you :)

> I really feel that for most of the cases when a gem crashes this is the correct behavior ... one of the error that cannot be handled is an out-of-memory condition that can occur if there is even a small "object leak".
>
In this case a stackoverflow.

> I haven't published the script, because it isn't bullet proof (If you start a gem and the stone is gone, the autstart script runs in a very hot loop)...
>
Is the gem providing exit status code? Then it should be fairly easy to restart based on this.

> I remember that folks have posted potential solutions in the past and I have worked a bit on improving the restart logic, but I don't think I've gotten to a bullet proof solution yet...
>
Well, suggesting solutions seems to be quite attractive. My super cow powers on shell scripts aren't that as they used to be but ... :)

> This isn't a direct solution to your problem, but if you had some restart logic for the gems, the entire system probably wouldn't fall over (unless the error involves the stone itself:)
>
No, my problem is not to have a debuggable stack to see the problem.

Norbert

> Dale
> ----- "Norbert Hartl" <[hidden email]> wrote:
>
> | I created a new pier instance today. A few minutes after there was a
> | 500 error. It wasn't just one page. It was stone wide. Every pier
> | instance just  showed 500. By restarting the stone it could be solved.
> | It took me a while to find the cause. In my newly created instance the
> | /about/syntax page caused it. On click on this link and I have to
> | restart the whole gemstone process.
> |
> | To be sure I deleted the hierarchy under /about and recreated it. But
> | if this will happen again I'm not sure where to look at. Is there a
> | way to enable fast cgi to snapp off a continuation of an error
> | occurrs? Too spooky for me.
> |
> | Norbert

Reply | Threaded
Open this post in threaded view
|

Re: one click death

otto
We've been using daemontools (http://cr.yp.to/daemontools.html). It
has its own little quirks, but works *very* well.

If you'd like some example scripts / more info, let me know.

On Fri, Feb 26, 2010 at 3:33 PM, Norbert Hartl <[hidden email]> wrote:

>
> On 25.02.2010, at 18:45, Dale Henrichs wrote:
>
>> I have tried to ensure that at a minimum there is a stack that gets dumped to the gem log file. The 500 is an indication that a gem has died because of a error that could not be handled. there are only a few errors that cannot be handled, but the log for the gem should contain the clues.
>>
> Ok, I got it again. It is suddenly all /about/syntax pages in all instances. I have no glue what has been changed. But it is an endless loop between PRValueLink and PRDeepRenderer.
>
>> GemSource has been running continuously (except for a couple of power outages/machine crashes) for nearly 3 years now. One of the "tricks" that I use is to automatically restart the gem whenever it exits. If a particular file exists the restart loop is exited (so I can shut down the system).
>>
> If this is a shell script you can do just
>
> while [ /bin/true ];
> do
>   ...start process...
>   GEMPID=$!
>
>   trap 'kill -15 $GEMPID' EXIT HUP INT QUIT TERM
>   wait $GEMPID
> done
>
> This shuts down both intertwingled processes if the outer one gets a signal. Just in case this is interesting for you :)
>
>> I really feel that for most of the cases when a gem crashes this is the correct behavior ... one of the error that cannot be handled is an out-of-memory condition that can occur if there is even a small "object leak".
>>
> In this case a stackoverflow.
>
>> I haven't published the script, because it isn't bullet proof (If you start a gem and the stone is gone, the autstart script runs in a very hot loop)...
>>
> Is the gem providing exit status code? Then it should be fairly easy to restart based on this.
>
>> I remember that folks have posted potential solutions in the past and I have worked a bit on improving the restart logic, but I don't think I've gotten to a bullet proof solution yet...
>>
> Well, suggesting solutions seems to be quite attractive. My super cow powers on shell scripts aren't that as they used to be but ... :)
>
>> This isn't a direct solution to your problem, but if you had some restart logic for the gems, the entire system probably wouldn't fall over (unless the error involves the stone itself:)
>>
> No, my problem is not to have a debuggable stack to see the problem.
>
> Norbert
>
>> Dale
>> ----- "Norbert Hartl" <[hidden email]> wrote:
>>
>> | I created a new pier instance today. A few minutes after there was a
>> | 500 error. It wasn't just one page. It was stone wide. Every pier
>> | instance just  showed 500. By restarting the stone it could be solved.
>> | It took me a while to find the cause. In my newly created instance the
>> | /about/syntax page caused it. On click on this link and I have to
>> | restart the whole gemstone process.
>> |
>> | To be sure I deleted the hierarchy under /about and recreated it. But
>> | if this will happen again I'm not sure where to look at. Is there a
>> | way to enable fast cgi to snapp off a continuation of an error
>> | occurrs? Too spooky for me.
>> |
>> | Norbert
>
>
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl
Hi Otto,

On 26.02.2010, at 15:18, Otto Behrens wrote:

> We've been using daemontools (http://cr.yp.to/daemontools.html). It
> has its own little quirks, but works *very* well.
>
yes, that is another (not so unix style) approach to handle that. My very old deployment with squeak images used this approach. You could just use /etc/inittab directly but the djb tools are quite more flexible if you need socket binding assistance, environment setup and such.

> If you'd like some example scripts / more info, let me know.
>
I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.

So many names, so many ways to do :)

Norbert

> On Fri, Feb 26, 2010 at 3:33 PM, Norbert Hartl <[hidden email]> wrote:
>>
>> On 25.02.2010, at 18:45, Dale Henrichs wrote:
>>
>>> I have tried to ensure that at a minimum there is a stack that gets dumped to the gem log file. The 500 is an indication that a gem has died because of a error that could not be handled. there are only a few errors that cannot be handled, but the log for the gem should contain the clues.
>>>
>> Ok, I got it again. It is suddenly all /about/syntax pages in all instances. I have no glue what has been changed. But it is an endless loop between PRValueLink and PRDeepRenderer.
>>
>>> GemSource has been running continuously (except for a couple of power outages/machine crashes) for nearly 3 years now. One of the "tricks" that I use is to automatically restart the gem whenever it exits. If a particular file exists the restart loop is exited (so I can shut down the system).
>>>
>> If this is a shell script you can do just
>>
>> while [ /bin/true ];
>> do
>>   ...start process...
>>   GEMPID=$!
>>
>>   trap 'kill -15 $GEMPID' EXIT HUP INT QUIT TERM
>>   wait $GEMPID
>> done
>>
>> This shuts down both intertwingled processes if the outer one gets a signal. Just in case this is interesting for you :)
>>
>>> I really feel that for most of the cases when a gem crashes this is the correct behavior ... one of the error that cannot be handled is an out-of-memory condition that can occur if there is even a small "object leak".
>>>
>> In this case a stackoverflow.
>>
>>> I haven't published the script, because it isn't bullet proof (If you start a gem and the stone is gone, the autstart script runs in a very hot loop)...
>>>
>> Is the gem providing exit status code? Then it should be fairly easy to restart based on this.
>>
>>> I remember that folks have posted potential solutions in the past and I have worked a bit on improving the restart logic, but I don't think I've gotten to a bullet proof solution yet...
>>>
>> Well, suggesting solutions seems to be quite attractive. My super cow powers on shell scripts aren't that as they used to be but ... :)
>>
>>> This isn't a direct solution to your problem, but if you had some restart logic for the gems, the entire system probably wouldn't fall over (unless the error involves the stone itself:)
>>>
>> No, my problem is not to have a debuggable stack to see the problem.
>>
>> Norbert
>>
>>> Dale
>>> ----- "Norbert Hartl" <[hidden email]> wrote:
>>>
>>> | I created a new pier instance today. A few minutes after there was a
>>> | 500 error. It wasn't just one page. It was stone wide. Every pier
>>> | instance just  showed 500. By restarting the stone it could be solved.
>>> | It took me a while to find the cause. In my newly created instance the
>>> | /about/syntax page caused it. On click on this link and I have to
>>> | restart the whole gemstone process.
>>> |
>>> | To be sure I deleted the hierarchy under /about and recreated it. But
>>> | if this will happen again I'm not sure where to look at. Is there a
>>> | way to enable fast cgi to snapp off a continuation of an error
>>> | occurrs? Too spooky for me.
>>> |
>>> | Norbert
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: one click death

otto
> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>
> So many names, so many ways to do :)

Yes, just perhaps a few too many. I sometimes wonder why everyone does
this kind of work over and over. Perhaps because it's great fun!

Here are some more fun scripts that we use:
http://github.com/ottobehrens/gemstone-scripts

One idea is to use scripts like these to start gems so that starting a
gem knows if the stone is running.

Cheers
Otto
Reply | Threaded
Open this post in threaded view
|

Re: one click death

Dale
Ignorance comes into play as well (at least for me:).

Otto, does the gemstone-scripts gem launcher do the infinite restart trick?

Dale
----- "Otto Behrens" <[hidden email]> wrote:

| > I'll let you know. But I need to upgrade the ubuntu system any time
| soon anyway. Then I will give upstart a go. This would solve Dales
| problems partially. If a gem in upstart is dependent on the start of
| the stone then it won't start unless the stone is present.
| >
| > So many names, so many ways to do :)
|
| Yes, just perhaps a few too many. I sometimes wonder why everyone
| does
| this kind of work over and over. Perhaps because it's great fun!
|
| Here are some more fun scripts that we use:
| http://github.com/ottobehrens/gemstone-scripts
|
| One idea is to use scripts like these to start gems so that starting
| a
| gem knows if the stone is running.
|
| Cheers
| Otto
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl
In reply to this post by otto

On 26.02.2010, at 16:52, Otto Behrens wrote:

>> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>>
>> So many names, so many ways to do :)
>
> Yes, just perhaps a few too many. I sometimes wonder why everyone does
> this kind of work over and over. Perhaps because it's great fun!
>
> Here are some more fun scripts that we use:
> http://github.com/ottobehrens/gemstone-scripts
>
> One idea is to use scripts like these to start gems so that starting a
> gem knows if the stone is running.
>
I scanned the scripts. It is hard for me to read because I don't know ruby. But the basic stuff I got (I hope). I didn't find anything that deals with automatic restart of gems. Could you please point me to there if it included, thanks!

What I didn't understand so far is what really happens. Dale said the gem dies if a sever error occurrs. That is fine for me. But how is than responding with the 500 error? As long as the gem runs the HTTP server and it still returns error 500 that would mean that process that runs the gem has not exited. I'm asking this because I can' t easily reprodcude the problem at will.

Norbert
 
Reply | Threaded
Open this post in threaded view
|

Re: one click death

SeanTAllen
On Mon, Mar 1, 2010 at 9:31 AM, Norbert Hartl <[hidden email]> wrote:

>
> On 26.02.2010, at 16:52, Otto Behrens wrote:
>
>>> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>>>
>>> So many names, so many ways to do :)
>>
>> Yes, just perhaps a few too many. I sometimes wonder why everyone does
>> this kind of work over and over. Perhaps because it's great fun!
>>
>> Here are some more fun scripts that we use:
>> http://github.com/ottobehrens/gemstone-scripts
>>
>> One idea is to use scripts like these to start gems so that starting a
>> gem knows if the stone is running.
>>
> I scanned the scripts. It is hard for me to read because I don't know ruby. But the basic stuff I got (I hope). I didn't find anything that deals with automatic restart of gems. Could you please point me to there if it included, thanks!
>
> What I didn't understand so far is what really happens. Dale said the gem dies if a sever error occurrs. That is fine for me. But how is than responding with the 500 error? As long as the gem runs the HTTP server and it still returns error 500 that would mean that process that runs the gem has not exited. I'm asking this because I can' t easily reprodcude the problem at will.

500 is an internal server error.

I'm going to assume you are using 2.8, fast cgi and apache
If you attempt  to make a connection via a fast cgi server that is down ie

Apache => Fast CGI => Gemstone

Where Gemstone is down, the fast cgi protocol is going to see that as
an internal server error and apache will return a 500 ( not gemstone
).
If you were proxying to Gemstone ( available in the 3.0 alphas for
GLASS ) you would get a 502 bad gateway error from Apache.
This isnt apache specific, try setting up your fast cgi to go to an
unused port for glass and you should get a 500. Anything that doesnt
return the correct fast cgi headers to the webserver should result in a 500.
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl

On 01.03.2010, at 15:54, Sean Allen wrote:

> On Mon, Mar 1, 2010 at 9:31 AM, Norbert Hartl <[hidden email]> wrote:
>>
>> On 26.02.2010, at 16:52, Otto Behrens wrote:
>>
>>>> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>>>>
>>>> So many names, so many ways to do :)
>>>
>>> Yes, just perhaps a few too many. I sometimes wonder why everyone does
>>> this kind of work over and over. Perhaps because it's great fun!
>>>
>>> Here are some more fun scripts that we use:
>>> http://github.com/ottobehrens/gemstone-scripts
>>>
>>> One idea is to use scripts like these to start gems so that starting a
>>> gem knows if the stone is running.
>>>
>> I scanned the scripts. It is hard for me to read because I don't know ruby. But the basic stuff I got (I hope). I didn't find anything that deals with automatic restart of gems. Could you please point me to there if it included, thanks!
>>
>> What I didn't understand so far is what really happens. Dale said the gem dies if a sever error occurrs. That is fine for me. But how is than responding with the 500 error? As long as the gem runs the HTTP server and it still returns error 500 that would mean that process that runs the gem has not exited. I'm asking this because I can' t easily reprodcude the problem at will.
>
> 500 is an internal server error.
>
> I'm going to assume you are using 2.8, fast cgi and apache
> If you attempt  to make a connection via a fast cgi server that is down ie
>
> Apache => Fast CGI => Gemstone
>
> Where Gemstone is down, the fast cgi protocol is going to see that as
> an internal server error and apache will return a 500 ( not gemstone
> ).
> If you were proxying to Gemstone ( available in the 3.0 alphas for
> GLASS ) you would get a 502 bad gateway error from Apache.
> This isnt apache specific, try setting up your fast cgi to go to an
> unused port for glass and you should get a 500. Anything that doesnt
> return the correct fast cgi headers to the webserver should result in a 500.

Thanks for the hint. I forgot that I switched to lighttpd for serving. As I didn't see the usual apache 500 error page I assumed it is the gem.
But now my case is even more difficult. I got another 500 error. This time it is only one pier instance that throws a 500. The gem is still running and all other pier instances run fine. And I don't get anything in the log files which could cause the error. And this time a restart of stone does not help.

How can I debug something like that?

Norbert


Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl

On 01.03.2010, at 18:25, Norbert Hartl wrote:

>
> On 01.03.2010, at 15:54, Sean Allen wrote:
>
>> On Mon, Mar 1, 2010 at 9:31 AM, Norbert Hartl <[hidden email]> wrote:
>>>
>>> On 26.02.2010, at 16:52, Otto Behrens wrote:
>>>
>>>>> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>>>>>
>>>>> So many names, so many ways to do :)
>>>>
>>>> Yes, just perhaps a few too many. I sometimes wonder why everyone does
>>>> this kind of work over and over. Perhaps because it's great fun!
>>>>
>>>> Here are some more fun scripts that we use:
>>>> http://github.com/ottobehrens/gemstone-scripts
>>>>
>>>> One idea is to use scripts like these to start gems so that starting a
>>>> gem knows if the stone is running.
>>>>
>>> I scanned the scripts. It is hard for me to read because I don't know ruby. But the basic stuff I got (I hope). I didn't find anything that deals with automatic restart of gems. Could you please point me to there if it included, thanks!
>>>
>>> What I didn't understand so far is what really happens. Dale said the gem dies if a sever error occurrs. That is fine for me. But how is than responding with the 500 error? As long as the gem runs the HTTP server and it still returns error 500 that would mean that process that runs the gem has not exited. I'm asking this because I can' t easily reprodcude the problem at will.
>>
>> 500 is an internal server error.
>>
>> I'm going to assume you are using 2.8, fast cgi and apache
>> If you attempt  to make a connection via a fast cgi server that is down ie
>>
>> Apache => Fast CGI => Gemstone
>>
>> Where Gemstone is down, the fast cgi protocol is going to see that as
>> an internal server error and apache will return a 500 ( not gemstone
>> ).
>> If you were proxying to Gemstone ( available in the 3.0 alphas for
>> GLASS ) you would get a 502 bad gateway error from Apache.
>> This isnt apache specific, try setting up your fast cgi to go to an
>> unused port for glass and you should get a 500. Anything that doesnt
>> return the correct fast cgi headers to the webserver should result in a 500.
>
> Thanks for the hint. I forgot that I switched to lighttpd for serving. As I didn't see the usual apache 500 error page I assumed it is the gem.
> But now my case is even more difficult. I got another 500 error. This time it is only one pier instance that throws a 500. The gem is still running and all other pier instances run fine. And I don't get anything in the log files which could cause the error. And this time a restart of stone does not help.
>
> How can I debug something like that?

Ok, I take it back. I think I need some defined test scenario to get this right. After some testing I saw an old gem process laying around. I needed to do a kill -9 to get rid of it. I don't understand why the other instance have worked then. A suspect could be lighttpd as I have less experience with it.

After shutting down everything, killing the stale process and restarting all again every instance is back online. So I assume that the gem process will go away if something wrong happens. Enough to start my own restart script.

Norbert
Reply | Threaded
Open this post in threaded view
|

Re: one click death

SeanTAllen
If you can do round robin for fast cgi/proxying scenarios, I would
recommend nginx over lighttpd, we found nginx to be far more stable
than lighttpd.
Your mileage my vary but from talking to others who have used both, I
have heard that anecdotal story many times.


On Mon, Mar 1, 2010 at 12:45 PM, Norbert Hartl <[hidden email]> wrote:

>
> On 01.03.2010, at 18:25, Norbert Hartl wrote:
>
>>
>> On 01.03.2010, at 15:54, Sean Allen wrote:
>>
>>> On Mon, Mar 1, 2010 at 9:31 AM, Norbert Hartl <[hidden email]> wrote:
>>>>
>>>> On 26.02.2010, at 16:52, Otto Behrens wrote:
>>>>
>>>>>> I'll let you know. But I need to upgrade the ubuntu system any time soon anyway. Then I will give upstart a go. This would solve Dales problems partially. If a gem in upstart is dependent on the start of the stone then it won't start unless the stone is present.
>>>>>>
>>>>>> So many names, so many ways to do :)
>>>>>
>>>>> Yes, just perhaps a few too many. I sometimes wonder why everyone does
>>>>> this kind of work over and over. Perhaps because it's great fun!
>>>>>
>>>>> Here are some more fun scripts that we use:
>>>>> http://github.com/ottobehrens/gemstone-scripts
>>>>>
>>>>> One idea is to use scripts like these to start gems so that starting a
>>>>> gem knows if the stone is running.
>>>>>
>>>> I scanned the scripts. It is hard for me to read because I don't know ruby. But the basic stuff I got (I hope). I didn't find anything that deals with automatic restart of gems. Could you please point me to there if it included, thanks!
>>>>
>>>> What I didn't understand so far is what really happens. Dale said the gem dies if a sever error occurrs. That is fine for me. But how is than responding with the 500 error? As long as the gem runs the HTTP server and it still returns error 500 that would mean that process that runs the gem has not exited. I'm asking this because I can' t easily reprodcude the problem at will.
>>>
>>> 500 is an internal server error.
>>>
>>> I'm going to assume you are using 2.8, fast cgi and apache
>>> If you attempt  to make a connection via a fast cgi server that is down ie
>>>
>>> Apache => Fast CGI => Gemstone
>>>
>>> Where Gemstone is down, the fast cgi protocol is going to see that as
>>> an internal server error and apache will return a 500 ( not gemstone
>>> ).
>>> If you were proxying to Gemstone ( available in the 3.0 alphas for
>>> GLASS ) you would get a 502 bad gateway error from Apache.
>>> This isnt apache specific, try setting up your fast cgi to go to an
>>> unused port for glass and you should get a 500. Anything that doesnt
>>> return the correct fast cgi headers to the webserver should result in a 500.
>>
>> Thanks for the hint. I forgot that I switched to lighttpd for serving. As I didn't see the usual apache 500 error page I assumed it is the gem.
>> But now my case is even more difficult. I got another 500 error. This time it is only one pier instance that throws a 500. The gem is still running and all other pier instances run fine. And I don't get anything in the log files which could cause the error. And this time a restart of stone does not help.
>>
>> How can I debug something like that?
>
> Ok, I take it back. I think I need some defined test scenario to get this right. After some testing I saw an old gem process laying around. I needed to do a kill -9 to get rid of it. I don't understand why the other instance have worked then. A suspect could be lighttpd as I have less experience with it.
>
> After shutting down everything, killing the stale process and restarting all again every instance is back online. So I assume that the gem process will go away if something wrong happens. Enough to start my own restart script.
>
> Norbert
Reply | Threaded
Open this post in threaded view
|

Re: one click death

Yanni Chiu
In reply to this post by NorbertHartl
Norbert Hartl wrote:
> Thanks for the hint. I forgot that I switched to lighttpd for serving. As I didn't see the usual apache 500 error page I assumed it is the gem.
> But now my case is even more difficult. I got another 500 error. This time it is only one pier instance that throws a 500. The gem is still running and all other pier instances run fine. And I don't get anything in the log files which could cause the error. And this time a restart of stone does not help.
>
> How can I debug something like that?

The old-fashioned way, and it's ugly. Take a guess at where the problem
is, put some log messages there (i.e. print statements). If lucky,
you'll find the spot to begin step two. If not, then take another guess.
In step two, refine the log messages to narrow down the problem.

Afterwards, decide which log messages you should keep permanently, to
give you a head start on tracing the next problem that occurs.

If you're lucky, you can enable some logging that's already built in,
but if the problem is in your own code, it'll only take you part of the way.

I'd start with logging each request/response, which should give you some
idea of where to start looking for the cause of your problem.

Reply | Threaded
Open this post in threaded view
|

Re: one click death

SeanTAllen
On Mon, Mar 1, 2010 at 1:11 PM, Yanni Chiu <[hidden email]> wrote:

> Norbert Hartl wrote:
>>
>> Thanks for the hint. I forgot that I switched to lighttpd for serving. As
>> I didn't see the usual apache 500 error page I assumed it is the gem. But
>> now my case is even more difficult. I got another 500 error. This time it is
>> only one pier instance that throws a 500. The gem is still running and all
>> other pier instances run fine. And I don't get anything in the log files
>> which could cause the error. And this time a restart of stone does not help.
>>
>> How can I debug something like that?
>
> The old-fashioned way, and it's ugly. Take a guess at where the problem is,
> put some log messages there (i.e. print statements). If lucky, you'll find
> the spot to begin step two. If not, then take another guess. In step two,
> refine the log messages to narrow down the problem.
>
> Afterwards, decide which log messages you should keep permanently, to give
> you a head start on tracing the next problem that occurs.
>
> If you're lucky, you can enable some logging that's already built in, but if
> the problem is in your own code, it'll only take you part of the way.
>
> I'd start with logging each request/response, which should give you some
> idea of where to start looking for the cause of your problem.
>

This is why I greatly prefer proxying to fast cgi, when things get fubar,
proxying is much easier to debug.
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl

On 01.03.2010, at 19:17, Sean Allen wrote:

> On Mon, Mar 1, 2010 at 1:11 PM, Yanni Chiu <[hidden email]> wrote:
>> Norbert Hartl wrote:
>>>
>>> Thanks for the hint. I forgot that I switched to lighttpd for serving. As
>>> I didn't see the usual apache 500 error page I assumed it is the gem. But
>>> now my case is even more difficult. I got another 500 error. This time it is
>>> only one pier instance that throws a 500. The gem is still running and all
>>> other pier instances run fine. And I don't get anything in the log files
>>> which could cause the error. And this time a restart of stone does not help.
>>>
>>> How can I debug something like that?
>>
>> The old-fashioned way, and it's ugly. Take a guess at where the problem is,
>> put some log messages there (i.e. print statements). If lucky, you'll find
>> the spot to begin step two. If not, then take another guess. In step two,
>> refine the log messages to narrow down the problem.
>>
>> Afterwards, decide which log messages you should keep permanently, to give
>> you a head start on tracing the next problem that occurs.
>>
>> If you're lucky, you can enable some logging that's already built in, but if
>> the problem is in your own code, it'll only take you part of the way.
>>
>> I'd start with logging each request/response, which should give you some
>> idea of where to start looking for the cause of your problem.
>>
>
> This is why I greatly prefer proxying to fast cgi, when things get fubar,
> proxying is much easier to debug.

Well, me too. What http Server do you use?

Norbert

Reply | Threaded
Open this post in threaded view
|

Re: one click death

SeanTAllen
On Mon, Mar 1, 2010 at 2:11 PM, Norbert Hartl <[hidden email]> wrote:

>
> On 01.03.2010, at 19:17, Sean Allen wrote:
>
>> On Mon, Mar 1, 2010 at 1:11 PM, Yanni Chiu <[hidden email]> wrote:
>>> Norbert Hartl wrote:
>>>>
>>>> Thanks for the hint. I forgot that I switched to lighttpd for serving. As
>>>> I didn't see the usual apache 500 error page I assumed it is the gem. But
>>>> now my case is even more difficult. I got another 500 error. This time it is
>>>> only one pier instance that throws a 500. The gem is still running and all
>>>> other pier instances run fine. And I don't get anything in the log files
>>>> which could cause the error. And this time a restart of stone does not help.
>>>>
>>>> How can I debug something like that?
>>>
>>> The old-fashioned way, and it's ugly. Take a guess at where the problem is,
>>> put some log messages there (i.e. print statements). If lucky, you'll find
>>> the spot to begin step two. If not, then take another guess. In step two,
>>> refine the log messages to narrow down the problem.
>>>
>>> Afterwards, decide which log messages you should keep permanently, to give
>>> you a head start on tracing the next problem that occurs.
>>>
>>> If you're lucky, you can enable some logging that's already built in, but if
>>> the problem is in your own code, it'll only take you part of the way.
>>>
>>> I'd start with logging each request/response, which should give you some
>>> idea of where to start looking for the cause of your problem.
>>>
>>
>> This is why I greatly prefer proxying to fast cgi, when things get fubar,
>> proxying is much easier to debug.
>
> Well, me too. What http Server do you use?

nginx to swazoo ( we are running seaside 3 )
Reply | Threaded
Open this post in threaded view
|

Re: one click death

otto
In reply to this post by Dale
> Otto, does the gemstone-scripts gem launcher do the infinite restart trick?

No, gemstone-scripts does not. We use daemontools
(http://cr.yp.to/daemontools.html); the scripts  invoke svc (from
daemontools) to start / stop topaz sessions.

I changed gemstone-scripts/glass_stone.rb on github to include all our
lighty and daemontools stuff. It's not 100% yet, still some small
things to sort out. You should get a good idea how we use it.

If you've got daemontools installed, you'll need a /service directory.
You must run svscanboot /service&, which will bootstrap the thing. The
scripts creates the daemontools structures for the services.

start_hypers actually sets a flag in the /service directory
structures, which will ensure that the hyper is always running, "the
infinite restart trick".

Let us know if you find this useful and if you think we should do some
work on it.

Cheers
Otto
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl
Otto,

thanks again for your scripts. Unfortunately I'm not in favour of using daemontools nor do I like to learn ruby now. I don't have the feeling I get a quick result if I throw another language in the mix (one that I don't know). And I think I don't need the capabilities of daemontools.

I will definetely have a look at your scripts to learn what you are doing. But my own stuff will use bash as far as it goes. If there are more complex things to do I would give gnu smalltalk a try.

Norbert

On 03.03.2010, at 11:55, Otto Behrens wrote:

>> Otto, does the gemstone-scripts gem launcher do the infinite restart trick?
>
> No, gemstone-scripts does not. We use daemontools
> (http://cr.yp.to/daemontools.html); the scripts  invoke svc (from
> daemontools) to start / stop topaz sessions.
>
> I changed gemstone-scripts/glass_stone.rb on github to include all our
> lighty and daemontools stuff. It's not 100% yet, still some small
> things to sort out. You should get a good idea how we use it.
>
> If you've got daemontools installed, you'll need a /service directory.
> You must run svscanboot /service&, which will bootstrap the thing. The
> scripts creates the daemontools structures for the services.
>
> start_hypers actually sets a flag in the /service directory
> structures, which will ensure that the hyper is always running, "the
> infinite restart trick".
>
> Let us know if you find this useful and if you think we should do some
> work on it.
>
> Cheers
> Otto

Reply | Threaded
Open this post in threaded view
|

Re: one click death

otto
Hi Norbert,

> thanks again for your scripts. Unfortunately I'm not in favour of using daemontools nor do I like to learn ruby now. I don't have the feeling I get a quick result if I throw another language in the mix (one that I don't know). And I think I don't need the capabilities of daemontools.

No problem. I can certainly vouch for the usefulness and stability of
daemontools. In concept, you'll need to do something similar to
daemontools in bash then.

> I will definetely have a look at your scripts to learn what you are doing. But my own stuff will use bash as far as it goes. If there are more complex things to do I would give gnu smalltalk a try.

Interesting idea, haven't really thought of using Smalltalk for things
like this. Yes, I also like the idea of one language.

We just thought that bash is also a language. And chose ruby because
it is a proper language and better to program in than bash. We were
also looking for something with a good OS and file system set of
libraries. This panned out nicely.

Let us know if there are more scripts (in concept) that you'd like to look at.

Cheers
Otto
Reply | Threaded
Open this post in threaded view
|

Re: one click death

NorbertHartl

On 04.03.2010, at 09:51, Otto Behrens wrote:

> Hi Norbert,
>
>> thanks again for your scripts. Unfortunately I'm not in favour of using daemontools nor do I like to learn ruby now. I don't have the feeling I get a quick result if I throw another language in the mix (one that I don't know). And I think I don't need the capabilities of daemontools.
>
> No problem. I can certainly vouch for the usefulness and stability of
> daemontools. In concept, you'll need to do something similar to
> daemontools in bash then.
>
Yes, but I don't think it is much to do.

>> I will definetely have a look at your scripts to learn what you are doing. But my own stuff will use bash as far as it goes. If there are more complex things to do I would give gnu smalltalk a try.
>
> Interesting idea, haven't really thought of using Smalltalk for things
> like this. Yes, I also like the idea of one language.
>
> We just thought that bash is also a language. And chose ruby because
> it is a proper language and better to program in than bash. We were
> also looking for something with a good OS and file system set of
> libraries. This panned out nicely.
>
Yes, bash is also a language. In my head it goes something like this: I like to run a service in smalltalk on a unix OS. My service is about handling my use cases. My OS is about handling resources. Bash is an interface to the OS layer. It is easy to check files, check processes, start processes,.... Regarding the OS you need a libc based C program or a shell to do it right. Any scripting language with system callouts is a poor replacement for this. Don't get me wrong I can see that you need some sort of interface between those both.
Second I like to be able to handle things in the language of my choice at the earliest possible time. And this should be smalltalk.
The conclusion for me is that all resource related stuff I better do in bash and if more expressive powers are needed I start gnu smalltalk. If I imagine building a topaz interface in gnu smalltalk then I really have everything I need. I could interface to the OS with smalltalk and could interface to topaz with smalltalk. Oh, well, what you will do in topaz to control gemstone will also be smalltalk. This should feel more like a complete integration.
But at this point I'm just dreaming :)

> Let us know if there are more scripts (in concept) that you'd like to look at.

I will. Thanks again,

Norbert