Smalltalk › Gemtalk › GLASS

[Glass] Monit scripts for gemstone?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

20 messages Options

Mariano Martinez Peck

[Glass] Monit scripts for gemstone?

Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?

The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.

Thoughts?

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Johan Brichau-3

Re: [Glass] Monit scripts for gemstone?

Hi Mariano,

This is what I setup for each stone.
Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.

Johan

#################################
## TEMPLATE
#################################
check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT1
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT2
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT3
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
if size > 4 GB then alert
GROUP TEMPLATE
check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
GROUP TEMPLATE

On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:

> Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
>
> The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
>
> Thoughts?
>
> --
> Mariano
> http://marianopeck.wordpress.com
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

Hi Johan,

Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how?

I was planning to use: gslist -p -n myStoneName >> myStoneName.pid

or something like that.

Thanks in advance.

On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:

Hi Mariano,

This is what I setup for each stone.
Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.

Johan

#################################
## TEMPLATE
#################################
check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT1
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT2
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT3
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
if size > 4 GB then alert
GROUP TEMPLATE
check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
GROUP TEMPLATE

On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:

> Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
>
> The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
>
> Thoughts?
>
> --
> Mariano
> http://marianopeck.wordpress.com

> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Johan Brichau-3

Re: [Glass] Monit scripts for gemstone?

Hi Mariano,

Yes, that is the procedure we use for the stone pid.
In our script, this happens at the end:

gslist -p -n $GEMSTONE_NAME > $GEMSTONE_DATADIR/$GEMSTONE_NAME.pid

cheers,
Johan

On 18 Aug 2014, at 15:31, Mariano Martinez Peck <[hidden email]> wrote:

> Hi Johan,
>
> Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how?
>
> I was planning to use: gslist -p -n myStoneName >> myStoneName.pid
> or something like that.
>
> Thanks in advance.
>
>
>
> On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:
> Hi Mariano,
>
> This is what I setup for each stone.
> Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.
>
> Johan
>
> #################################
> ## TEMPLATE
> #################################
> check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT1
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT2
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT3
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
> if size > 4 GB then alert
> GROUP TEMPLATE
> check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> GROUP TEMPLATE
>
>
> On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:
>
> > Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
> >
> > The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
> >
> > Thoughts?
> >
> > --
> > Mariano
> > http://marianopeck.wordpress.com
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

Thanks Johan, Monit is working nice :)

On Mon, Aug 18, 2014 at 7:59 PM, Johan Brichau <[hidden email]> wrote:

Hi Mariano,

Yes, that is the procedure we use for the stone pid.
In our script, this happens at the end:

gslist -p -n $GEMSTONE_NAME > $GEMSTONE_DATADIR/$GEMSTONE_NAME.pid

cheers,
Johan

On 18 Aug 2014, at 15:31, Mariano Martinez Peck <[hidden email]> wrote:

> Hi Johan,
>
> Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how?
>
> I was planning to use: gslist -p -n myStoneName >> myStoneName.pid
> or something like that.
>
> Thanks in advance.
>
>
>
> On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:
> Hi Mariano,
>
> This is what I setup for each stone.
> Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.
>
> Johan
>
> #################################
> ## TEMPLATE
> #################################
> check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT1
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT2
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT3
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
> if size > 4 GB then alert
> GROUP TEMPLATE
> check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> GROUP TEMPLATE
>
>
> On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:
>
> > Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
> >
> > The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
> >
> > Thoughts?
> >
> > --
> > Mariano
> > http://marianopeck.wordpress.com
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

Johan,

I have been having some problems when running monit scripts together with init.d scripts.

So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?

if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.

In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.

I can explain with more details what I am doing exactly, but just wanted to know what you were doing.

Thanks!

On Tue, Aug 19, 2014 at 3:52 PM, Mariano Martinez Peck <[hidden email]> wrote:

Thanks Johan, Monit is working nice :)

On Mon, Aug 18, 2014 at 7:59 PM, Johan Brichau <[hidden email]> wrote:

Hi Mariano,

Yes, that is the procedure we use for the stone pid.
In our script, this happens at the end:

gslist -p -n $GEMSTONE_NAME > $GEMSTONE_DATADIR/$GEMSTONE_NAME.pid

cheers,
Johan

On 18 Aug 2014, at 15:31, Mariano Martinez Peck <[hidden email]> wrote:

> Hi Johan,
>
> Sorry for the late answer. Thank you very much for this very useful Monit script. I am adapting it for my usage. I have only one small question. You check the stone PID, but in my case, because of the way I start the stone, I get no pid created for the stone. Do you do anything special upon stone startup to write pid file and remove it upon stop? If true, can I see how?
>
> I was planning to use: gslist -p -n myStoneName >> myStoneName.pid
> or something like that.
>
> Thanks in advance.
>
>
>
> On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:
> Hi Mariano,
>
> This is what I setup for each stone.
> Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.
>
> Johan
>
> #################################
> ## TEMPLATE
> #################################
> check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT1
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT2
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> # Empty FastCGI request
> if failed port PORT3
> # Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
> # padding 8 bytes (0x08), followed by 8xNULLs padding
> send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
> # Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
> expect "\0x01\0x0A"
> timeout 40 seconds
> then restart
> check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
> DEPENDS on stone_TEMPLATE
> GROUP TEMPLATE
> check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
> if size > 4 GB then alert
> GROUP TEMPLATE
> check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
> start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
> GROUP TEMPLATE
>
>
> On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:
>
> > Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
> >
> > The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
> >
> > Thoughts?
> >
> > --
> > Mariano
> > http://marianopeck.wordpress.com
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com

--
Mariano
http://marianopeck.wordpress.com

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Johan Brichau-3

Re: [Glass] Monit scripts for gemstone?

On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote:

> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?

In case of reboot, I have found that monit does not work well to get everything up and running fast.
So, yes, I am using init.d to start all stones and gems.

> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.

Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that...

> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
>
> I can explain with more details what I am doing exactly, but just wanted to know what you were doing.

At the moment, I'm not doing anything automated with the lock files.
If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them.

It would be great to know and develop a common strategy for everyone to follow...

Johan
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

James Foster-9

Re: [Glass] Monit scripts for gemstone?

The presence of lock files will prevent processes from starting, so they might need to be cleared. They can be cleared by gslist with the ‘-c’ option and/or you could clear them as part of an init.d script at system boot time. Obviously, they should not be always cleared as part of a general start-up script since this would defeat the purpose, though a ‘gslist -c’ could be done anytime (my environments typically have an alias for gslist=‘gslist -clv’).

James

On Aug 20, 2014, at 4:17 AM, Johan Brichau <[hidden email]> wrote:

> On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote:
>
>> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?
>
> In case of reboot, I have found that monit does not work well to get everything up and running fast.
> So, yes, I am using init.d to start all stones and gems.
>
>> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.
>
> Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that...
>
>> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
>>
>> I can explain with more details what I am doing exactly, but just wanted to know what you were doing.
>
> At the moment, I'm not doing anything automated with the lock files.
> If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them.
>
> It would be great to know and develop a common strategy for everyone to follow...
>
> Johan
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

In reply to this post by Johan Brichau-3

On Wed, Aug 20, 2014 at 5:17 AM, Johan Brichau <[hidden email]> wrote:

On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote:

> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?

In case of reboot, I have found that monit does not work well to get everything up and running fast.
So, yes, I am using init.d to start all stones and gems.

Yes, me too. Ok, so agree here.

But...I had yet another problem. Upon startup...no problem. But upon shutdown, it seemed that monit was shutting down AFTER my init.d script. Therefore...I was shutting down in init.d and monit was alive putting them alive again. And that (I think), let me to strange situations. So I had to change the priorities of my init script to be sure upon startup it runs before monit and in shutdown, after.

You didn't have problems with this? Or maybe just by chance your startup/shutdown order is like that?

> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.

Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that...

OK. I see.

> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
>
> I can explain with more details what I am doing exactly, but just wanted to know what you were doing.

At the moment, I'm not doing anything automated with the lock files.
If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them.

It would be great to know and develop a common strategy for everyone to follow...

What I did is in my init script I do:

case "$1" in

start)

printf "%-50s" "Starting $NAME..."

find /opt/gemstone/locks/ -type f -not -name 'gemstone.hostid' -delete

/bin/bash /opt/gemstoneAdditions/scripts/startStonesAll.sh

/bin/bash /opt/gemstoneAdditions/scripts/startSeasideGemsAll.sh

;;

note the: find /opt/gemstone/locks/ -type f -not -name 'gemstone.hostid' -delete

Why? Basically, I assume that if the system is starting, none process should be alive, hence all locks files should be deleted.

Is this assumption safe?

In any case, I think this is a workaround. I would like to know why some lock files are not being deleted. From what I understand, lock files should always be removed, unless you do a kill -9 or something. And I am not doing that.

Thanks,

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

In reply to this post by James Foster-9

On Wed, Aug 20, 2014 at 7:56 AM, James Foster <[hidden email]> wrote:

The presence of lock files will prevent processes from starting, so they might need to be cleared. They can be cleared by gslist with the ‘-c’ option and/or you could clear them as part of an init.d script at system boot time. Obviously, they should not be always cleared as part of a general start-up script since this would defeat the purpose,

mmmm why? I do run it always in my init.t script as part of my startup. See my other answer to Johan.

Is my assumption wrong?

though a ‘gslist -c’ could be done anytime (my environments typically have an alias for gslist=‘gslist -clv’).

I tried that yesterday in my server when I had these lock files of dead process, and gslist -c did nothing. It simply displayed the stones and it didn't delete anything from /opt/gemstone/locks and I am sure the processes associated to those locks are dead.

Thanks

James

On Aug 20, 2014, at 4:17 AM, Johan Brichau <[hidden email]> wrote:

> On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote:
>
>> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?
>
> In case of reboot, I have found that monit does not work well to get everything up and running fast.
> So, yes, I am using init.d to start all stones and gems.
>
>> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.
>
> Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that...
>
>> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
>>
>> I can explain with more details what I am doing exactly, but just wanted to know what you were doing.
>
> At the moment, I'm not doing anything automated with the lock files.
> If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them.
>
> It would be great to know and develop a common strategy for everyone to follow...
>
> Johan

> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] Monit scripts for gemstone?

Ohh and I have yet another question. At night, I have some scripts that run for GC , backups, and other stuff. For this, what I do is to stop all seaside gems so that I don't get gslocks or waiting for vote or whatever during MFC or the other tasks. So at the very beginning I turn off seaside gems, and then, once I am done with everything, I start them again.

Of course, now with monit it could happen that monit starts again the seaside gems I turned off. So, what do you do?

I was thinking to execute:

sudo monit unmonitor xxxx

where xxxx is every name of every monit file (one per stone)...

Shutting down monit is also an option but maybe too much.

Thoughts?

On Wed, Aug 20, 2014 at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

On Wed, Aug 20, 2014 at 7:56 AM, James Foster <[hidden email]> wrote:

The presence of lock files will prevent processes from starting, so they might need to be cleared. They can be cleared by gslist with the ‘-c’ option and/or you could clear them as part of an init.d script at system boot time. Obviously, they should not be always cleared as part of a general start-up script since this would defeat the purpose,

mmmm why? I do run it always in my init.t script as part of my startup. See my other answer to Johan.
Is my assumption wrong?

though a ‘gslist -c’ could be done anytime (my environments typically have an alias for gslist=‘gslist -clv’).

I tried that yesterday in my server when I had these lock files of dead process, and gslist -c did nothing. It simply displayed the stones and it didn't delete anything from /opt/gemstone/locks and I am sure the processes associated to those locks are dead.

Thanks

James

On Aug 20, 2014, at 4:17 AM, Johan Brichau <[hidden email]> wrote:

> On 20 Aug 2014, at 06:11, Mariano Martinez Peck <[hidden email]> wrote:
>
>> So...before everything...are you using init.d to start your stones as well? or you let monit do it lazily?
>
> In case of reboot, I have found that monit does not work well to get everything up and running fast.
> So, yes, I am using init.d to start all stones and gems.
>
>> if the later...how do you handle a proper shutdown? it happens to me that some lock files remind alive but the process are gone. So upon reboot, I cannot start my process again.
>
> Yes, I do recall that this happened but I have contradicting experiences. On my laptop I often need to remove the lock files when they are present. In contrast, last june all our servers got rebooted because of a SAN network problem and there were no issues with the lock files. Maybe someone from GemTalk can shed more light on that...
>
>> In summary.... I am having problems with locks file in /opt/gemstone/locks and the presence of both, init.d and monit scripts.
>>
>> I can explain with more details what I am doing exactly, but just wanted to know what you were doing.
>
> At the moment, I'm not doing anything automated with the lock files.
> If there is an issue starting a stone, it is one of the things I look at but I don't have an automated strategy for handling them.
>
> It would be great to know and develop a common strategy for everyone to follow...
>
> Johan

> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Richard Sargent

Re: [Glass] Monit scripts for gemstone?

Administrator

In reply to this post by Mariano Martinez Peck

Mariano Martinez Peck wrote

But...I had yet another problem. Upon startup...no problem. But upon
shutdown, it seemed that monit was shutting down AFTER my init.d script.
Therefore...I was shutting down in init.d and monit was alive putting them
alive again. And that (I think), let me to strange situations. So I had to
change the priorities of my init script to be sure upon startup it runs
before monit and in shutdown, after.

You didn't have problems with this? Or maybe just by chance your
startup/shutdown order is like that?

There is a golden rule for programming by contract which should apply here, too. Initialisation and termination should have reverse sequences.

A subclass initialiser needs to perform super class initialisation first, but when cleaning up, it cleans up its own stuff first, then the rest according to its super class rules.

The same pattern should apply in test set up and tear down sequences, and it should apply to system start up and shutdown. I don't know about Unix and init.d, so I don't know if the system design stops things in reverse from how it starts them of if you need to manage it. If I had to guess, I would expect the system to already work the right way, by design.

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

In reply to this post by Johan Brichau-3

On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:

Hi Mariano,

This is what I setup for each stone.
Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.

Johan

#################################
## TEMPLATE
#################################
check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT1
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart

Hi Johan,

Sorry for reviving this very old thread. But I have a doubt. Why do you do that special send / expect of the FastCGI protocol rather than a simple:

if failed port PORT1 then restart

Is this in case that the gem could be somehow "hanged" yet still having the port still open? Any other reason?

I ask because I am experimenting something which I am not sure this is the expected behavior. Basically, what happens to me is that if the gem is up and alive, monit (when it checks the status of the process) will be waiting for those 40 seconds before continue checking the next process. Is this expect? It is happening the same to you?

which timeout you have in your webserver for the upstreams? 40 seconds too?

Thanks in advance,

check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT2
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT3
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
if size > 4 GB then alert
GROUP TEMPLATE
check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
GROUP TEMPLATE

On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:

> Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
>
> The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
>
> Thoughts?
>
> --
> Mariano
> http://marianopeck.wordpress.com

> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

On Wed, Feb 18, 2015 at 7:07 PM, Mariano Martinez Peck <[hidden email]> wrote:

On Thu, Dec 19, 2013 at 11:09 AM, Johan Brichau <[hidden email]> wrote:
Hi Mariano,

This is what I setup for each stone.
Change the file paths, the user/group names, fill in the TEMPLATE and PORTxx stuff, adapt the timeouts to your need and you should be good to go.

Johan

#################################
## TEMPLATE
#################################
check process fastcgi_PORT1_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT1.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT1" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT1
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart

Hi Johan,

Sorry for reviving this very old thread. But I have a doubt. Why do you do that special send / expect of the FastCGI protocol rather than a simple:

if failed port PORT1 then restart

Is this in case that the gem could be somehow "hanged" yet still having the port still open? Any other reason?

I ask because I am experimenting something which I am not sure this is the expected behavior. Basically, what happens to me is that if the gem is up and alive, monit (when it checks the status of the process) will be waiting for those 40 seconds before continue checking the next process. Is this expect? It is happening the same to you?

which timeout you have in your webserver for the upstreams? 40 seconds too?

Let me re-ask what is the more important question I have.... for that monit timeout "timeout 40 seconds" I must setup a number like that, or exactly the timeout I define at my web server. For example, I have loooong request (yes, I know, I should be using service VM...but until then..), so my timeout for the upstreams at nginx level is 5 minutes. So which timeout do I need to set here? 5 minutes too? Or still 40 seconds?

I ask because I don't want the scenario where I am processing a long request (imagine one of 4 minutes) and then monit after 40 seconds thinks the server is down and hence triggers a restar. So I wonder...is the FastCGI server able to answer the Monit send even if it is processing a request? or the return to Monit will only be once it finishes processing current request? That might tell me which timeout should I set here.

Thanks in advance!!

Thanks in advance,

Thanks in advance,

check process fastcgi_PORT2_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT2.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT2" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT2
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process fastcgi_PORT3_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/FastCGI_server-PORT3.pid
start program = "/home/yesplan/yesplanscripts/startYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanGems TEMPLATE PORT3" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
# Empty FastCGI request
if failed port PORT3
# Send FastCGI packet: version 1 (0x01), cmd FCGI_GET_VALUES (0x09)
# padding 8 bytes (0x08), followed by 8xNULLs padding
send "\0x01\0x09\0x00\0x00\0x00\0x00\0x08\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00\0x00"
# Expect FastCGI packet: version 1 (0x01), resp FCGI_GET_VALUES_RESULT (0x0A)
expect "\0x01\0x0A"
timeout 40 seconds
then restart
check process servicevm_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/service.pid
start program = "/home/yesplan/yesplanscripts/startYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanServiceVM TEMPLATE" as uid yesplan and gid yesplan
DEPENDS on stone_TEMPLATE
GROUP TEMPLATE
check file extent_TEMPLATE with path /opt/gemstone/stones/TEMPLATE/data/extent0.dbf
if size > 4 GB then alert
GROUP TEMPLATE
check process stone_TEMPLATE with pidfile /opt/gemstone/stones/TEMPLATE/data/TEMPLATE.pid
start program = "/home/yesplan/yesplanscripts/startYesplanStone TEMPLATE" as uid yesplan and gid yesplan
stop program = "/home/yesplan/yesplanscripts/stopYesplanStone TEMPLATE" as uid yesplan and gid yesplan
GROUP TEMPLATE

On 19 Dec 2013, at 15:03, Mariano Martinez Peck <[hidden email]> wrote:

> Ok, I could start writing monit scripts for gemstone, but I guess many people already did that, right?
>
> The thing is that there are many processes to monit: the stone, netldi, each gem, etc.. From what I could see most of them, if not all, have a place for the pid, so it could be easy to write the monit scripts. For the seaside gems we could check the port.
>
> Thoughts?
>
> --
> Mariano
> http://marianopeck.wordpress.com

> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

In reply to this post by GLASS mailing list

On 18 Feb 2015, at 23:07, Mariano Martinez Peck <[hidden email]> wrote:

if failed port PORT1 then restart

Is this in case that the gem could be somehow "hanged" yet still having the port still open? Any other reason?

We do sometimes have a gem that stops being responsive or a gem that is just processing a request that takes way too long (blocking other users).

In those cases, I prefer to kill it and have it restarted.

I ask because I am experimenting something which I am not sure this is the expected behavior. Basically, what happens to me is that if the gem is up and alive, monit (when it checks the status of the process) will be waiting for those 40 seconds before continue checking the next process. Is this expect? It is happening the same to you?

aha… I did not notice that actually. I never actually checked either.

I just assumed monit has it’s ‘cycle’ doing these checks every xx minutes (this is configurable).

So… as long as the check is done every xxx time, I’m comfortable with it.

which timeout you have in your webserver for the upstreams? 40 seconds too?

Approximately yes. I do some experimenting with these numbers to see what happens but they are mostly set at 40 or 50s for Yesplan. If monit has to wait that long to get a response, it means a user has to do that too and, hence, we let monit reboot the gem. If it is was processing stuff, we just killed that guy and if not, we made it responsive again.

I’m sure there are better ways but this is what we are doing for now.

Johan

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

In reply to this post by GLASS mailing list

On 19 Feb 2015, at 00:01, Mariano Martinez Peck <[hidden email]> wrote:

Let me re-ask what is the more important question I have.... for that monit timeout "timeout 40 seconds" I must setup a number like that, or exactly the timeout I define at my web server. For example, I have loooong request (yes, I know, I should be using service VM...but until then..), so my timeout for the upstreams at nginx level is 5 minutes. So which timeout do I need to set here? 5 minutes too? Or still 40 seconds?

Yes, you should use a service vm :)

Anyway, our nginx timeout is set to 60s and I set the monit timeout always lower than that.

We stick to the 60s timeout as the hard deadline for any request in Yesplan. Monit might sometimes cut one shorted but this raises an alarm for us to take a look what’s happening and see if we can fix the performance.

I ask because I don't want the scenario where I am processing a long request (imagine one of 4 minutes) and then monit after 40 seconds thinks the server is down and hence triggers a restar. So I wonder...is the FastCGI server able to answer the Monit send even if it is processing a request? or the return to Monit will only be once it finishes processing current request? That might tell me which timeout should I set here.

If you want to allow longer requests, then make sure to set the timeout longer.

GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request.

So, what you say is correct: if a gem is busy processing requests, it will not answer to monit.

That’s how we work right but I have to admit this is still the same way as how we got started a few years ago. I should try to explore if there are better ways, so don’t take my advice for granted :)

Johan

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

On Thu, Feb 19, 2015 at 5:05 PM, Johan Brichau <[hidden email]> wrote:

On 19 Feb 2015, at 00:01, Mariano Martinez Peck <[hidden email]> wrote:

Let me re-ask what is the more important question I have.... for that monit timeout "timeout 40 seconds" I must setup a number like that, or exactly the timeout I define at my web server. For example, I have loooong request (yes, I know, I should be using service VM...but until then..), so my timeout for the upstreams at nginx level is 5 minutes. So which timeout do I need to set here? 5 minutes too? Or still 40 seconds?

Yes, you should use a service vm :)
Anyway, our nginx timeout is set to 60s and I set the monit timeout always lower than that.
We stick to the 60s timeout as the hard deadline for any request in Yesplan. Monit might sometimes cut one shorted but this raises an alarm for us to take a look what’s happening and see if we can fix the performance.

OK..good points.

I ask because I don't want the scenario where I am processing a long request (imagine one of 4 minutes) and then monit after 40 seconds thinks the server is down and hence triggers a restar. So I wonder...is the FastCGI server able to answer the Monit send even if it is processing a request? or the return to Monit will only be once it finishes processing current request? That might tell me which timeout should I set here.

If you want to allow longer requests, then make sure to set the timeout longer.
GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request.
So, what you say is correct: if a gem is busy processing requests, it will not answer to monit.

mmmmm are we 100% sure about this? I mean...at which level is the "lock" inside the gem to block requests? I thought it was at kind of at seaside session level. If monit request is a plain fast cgi request (outside seaside context) maybe the gem CAN answer because the lock happens further?

Let me CC Dale just in case ;)

That’s how we work right but I have to admit this is still the same way as how we got started a few years ago. I should try to explore if there are better ways, so don’t take my advice for granted :)

hehehe okok. Thanks in either case!

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

On 2/19/15 1:36 PM, Mariano Martinez Peck wrote:

On Thu, Feb 19, 2015 at 5:05 PM, Johan Brichau <[hidden email]> wrote:

If you want to allow longer requests, then make sure to set the timeout longer.

GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request.

So, what you say is correct: if a gem is busy processing requests, it will not answer to monit.

mmmmm are we 100% sure about this? I mean...at which level is the "lock" inside the gem to block requests? I thought it was at kind of at seaside session level. If monit request is a plain fast cgi request (outside seaside context) maybe the gem CAN answer because the lock happens further?

Let me CC Dale just in case ;)

Depending upon how deep the monit request goes into the request handling stack for FastCGI, yes.

There is a session lock that is applied once we've processed enough of the request to know what session we're dealing with. I think if the session is already locked we kick out and retry the http request (don't recall exactly).

Then down the stack a ways in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we acquire a transaction mutex, and this is where we ensure that only one request is processed at a time (inside the transaction). If monit makes it this far, then it will wait for the transaction mutex to be released ...

For FastCGI, there is a gateway semaphore that only lets 10 concurrent threads to be forked for queuing up on the transaction mutex ... I found in testing that under high enough loads, the gem would blow up with an out of memory that was caused by too many threads sitting idle waiting for the transaction mutex ...

So monit could be blocking here, if there a number of outstanding requests ... of course there isn't much difference between waiting on the gateway semaphore and blocking on the transaction mutex ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

Hi Dale,

I’m a suspicious about the ’10 concurrent threads’ per gem.

I know this is what the code is supposed to do, but I always encounter a deadlock when a (seaside) request in process sends a request that is to be treated by the same seaside gem, but for a completely different application (and thus certainly not the same seaside session). I would expect that the gem accepts the request and is able to process it because 10 concurrent requests can be accepted, nevertheless I always get a timeout.

On the other hand, if you remember, we did hit and fix a bug with the semaphore back at ESUG in Edinburgh that clearly indicated that another request _can_ be processed while another one is sitting idle. However, there does not seem to be a guarantee in my experience. What I suspect (but you can probably tell me more :) is that the request in process not always yields to a process that would accept the incoming request.

Just yesterday I got at such a situation because I am fine-tuning the automated tests of our application. In that scenario, there is a single Swazoo (or Zinc) server running and a request in process is making a (Zinc) http call to the localhost for a different Seaside application (but thus served by the same gem). In a real usecase, this call is done in the service vm but for testing purposes this http call is made synchronously. However, I get a timeout for this second call. I hit this both in Gemstone 2.4 and 3.1

Johan

On 19 Feb 2015, at 23:16, Dale Henrichs <[hidden email]> wrote:

On 2/19/15 1:36 PM, Mariano Martinez Peck wrote:

On Thu, Feb 19, 2015 at 5:05 PM, Johan Brichau <[hidden email]> wrote:

If you want to allow longer requests, then make sure to set the timeout longer.

GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request.

So, what you say is correct: if a gem is busy processing requests, it will not answer to monit.

mmmmm are we 100% sure about this? I mean...at which level is the "lock" inside the gem to block requests? I thought it was at kind of at seaside session level. If monit request is a plain fast cgi request (outside seaside context) maybe the gem CAN answer because the lock happens further?

Let me CC Dale just in case ;)

Depending upon how deep the monit request goes into the request handling stack for FastCGI, yes.

There is a session lock that is applied once we've processed enough of the request to know what session we're dealing with. I think if the session is already locked we kick out and retry the http request (don't recall exactly).

Then down the stack a ways in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we acquire a transaction mutex, and this is where we ensure that only one request is processed at a time (inside the transaction). If monit makes it this far, then it will wait for the transaction mutex to be released ...

For FastCGI, there is a gateway semaphore that only lets 10 concurrent threads to be forked for queuing up on the transaction mutex ... I found in testing that under high enough loads, the gem would blow up with an out of memory that was caused by too many threads sitting idle waiting for the transaction mutex ...

So monit could be blocking here, if there a number of outstanding requests ... of course there isn't much difference between waiting on the gateway semaphore and blocking on the transaction mutex ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: [Glass] Monit scripts for gemstone?

Sorry for the late response, I've been sick the last week or so and I'm just starting to dig out of my email hole:)

On 2/21/15 6:40 AM, Johan Brichau wrote:

Hi Dale,

I’m a suspicious about the ’10 concurrent threads’ per gem.

I know this is what the code is supposed to do, but I always encounter a deadlock when a (seaside) request in process sends a request that is to be treated by the same seaside gem, but for a completely different application (and thus certainly not the same seaside session). I would expect that the gem accepts the request and is able to process it because 10 concurrent requests can be accepted, nevertheless I always get a timeout.

The 10 concurrent threads is a function of how many accept() calls are processed (gateway semaphore)... the gem is still only going to process 1 request at a time (transaction mutex) so I guess I'd like to see some stacks with these errors ... is the timeout happening because of an accept() timeout or is the timout happening because the overall processing time has taken too lon?

On the other hand, if you remember, we did hit and fix a bug with the semaphore back at ESUG in Edinburgh that clearly indicated that another request _can_ be processed while another one is sitting idle. However, there does not seem to be a guarantee in my experience. What I suspect (but you can probably tell me more :) is that the request in process not always yields to a process that would accept the incoming request.

I don't remember those details ... do you have a reference to the code we fixed?

Just yesterday I got at such a situation because I am fine-tuning the automated tests of our application. In that scenario, there is a single Swazoo (or Zinc) server running and a request in process is making a (Zinc) http call to the localhost for a different Seaside application (but thus served by the same gem). In a real usecase, this call is done in the service vm but for testing purposes this http call is made synchronously. However, I get a timeout for this second call. I hit this both in Gemstone 2.4 and 3.1

Okay ... you are being deadlocked by the transaction mutex ... with the transaction mutex there will only ever be on request being processes at any one time ... so with a single gem and making a client http call from within the single active seaside request (with the transaction mutex) you are guaranteed to get deadlock ...

Johan

On 19 Feb 2015, at 23:16, Dale Henrichs <[hidden email]> wrote:

On 2/19/15 1:36 PM, Mariano Martinez Peck wrote:

On Thu, Feb 19, 2015 at 5:05 PM, Johan Brichau <[hidden email]> wrote:

If you want to allow longer requests, then make sure to set the timeout longer.

GLASS is made to process only a single seaside request at a time and thus a request will get blocked until the previous one was processed. That includes the monit request.

So, what you say is correct: if a gem is busy processing requests, it will not answer to monit.

mmmmm are we 100% sure about this? I mean...at which level is the "lock" inside the gem to block requests? I thought it was at kind of at seaside session level. If monit request is a plain fast cgi request (outside seaside context) maybe the gem CAN answer because the lock happens further?

Let me CC Dale just in case ;)

Depending upon how deep the monit request goes into the request handling stack for FastCGI, yes.

There is a session lock that is applied once we've processed enough of the request to know what session we're dealing with. I think if the session is already locked we kick out and retry the http request (don't recall exactly).

Then down the stack a ways in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we acquire a transaction mutex, and this is where we ensure that only one request is processed at a time (inside the transaction). If monit makes it this far, then it will wait for the transaction mutex to be released ...

For FastCGI, there is a gateway semaphore that only lets 10 concurrent threads to be forked for queuing up on the transaction mutex ... I found in testing that under high enough loads, the gem would blow up with an out of memory that was caused by too many threads sitting idle waiting for the transaction mutex ...

So monit could be blocking here, if there a number of outstanding requests ... of course there isn't much difference between waiting on the gateway semaphore and blocking on the transaction mutex ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass