[Glass] GC causing system reboot?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Glass] GC causing system reboot?

Mariano Martinez Peck
Hi guys, 

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine). 

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log  shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18  PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
   1 reclaims  40 pagesProcessed  40 pagesReclaimed 2 allValidPages  38 singleObjPages
   60 processedObjs  21 liveObjs  39 shadowObjs 0 deadObjs    1.5 avgObjsPerPa


All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).


And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,



--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GC causing system reboot?

Dale Henrichs-3
By default the maintenance vm  runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin ... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale


From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?

Hi guys, 

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine). 

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log  shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18  PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
   1 reclaims  40 pagesProcessed  40 pagesReclaimed 2 allValidPages  38 singleObjPages
   60 processedObjs  21 liveObjs  39 shadowObjs 0 deadObjs    1.5 avgObjsPerPa


All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).


And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,



--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GC causing system reboot?

Mariano Martinez Peck



On Thu, Dec 19, 2013 at 12:40 PM, Dale K. Henrichs <[hidden email]> wrote:
By default the maintenance vm  runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin

Dale, I am JUST NOW facing this problem in my local machine. I have a topaz running taking 100% since 1:30 hours. The extent file is 4GB. Is this normal to take 1:30 hours??? mmm I don't think so.

My grep tells me that it is the gc indeed:

 501 12858     1   0 12:38PM ??        99:12.24 /opt/gemstone/product/bin/topaz -l -e /opt/gemstone/product/seaside/etc/maintenance30.conf

maintenance30.conf has a GEM_TEMPOBJ_CACHE_SIZE = 200000;

(although from the OSX monitor such a process is using 600MB)

My system.conf has:

SHR_PAGE_CACHE_SIZE_KB = 2000000;
GEM_TEMPOBJ_CACHE_SIZE = 1800000;

the big temp cache is just because I had some big reports that were crashing with an out of memory.

My sysctl has:

kern.sysv.shmmax=68719476736
kern.sysv.shmmin=1
kern.sysv.shmmni=256
kern.sysv.shmseg=64
kern.sysv.shmall=4294967296

And my machine has 8GB of RAM.  yes, I know, maybe those limits are too much.

So...should i let the topaz running? Since it started, I started having entries in maintenance_gem.log and  seaside_11810reclaimgcgem0.log but nothing strange to me. 

seaside_11810reclaimgcgem0.log shows a reclaiming like the following every 15 mins:

12/19/2013 15:04:48 ART
   90 reclaims  3953 pagesProcessed  3953 pagesReclaimed 0 allValidPages  0 singleObjPages
   1831336 processedObjs  219 liveObjs  0 shadowObjs 1831117 deadObjs  463.3 avgObjsPerPage

maintenance_gem.log shows:

Unregistering...2013-12-19T08:38:37.92392206192017-08:00
...Expired: 0 sessions.
--transcript--'Starting markForCollect.: 2013-12-19T08:38:37.94464492797852-08:00'
--transcript--'Warning: markForCollection found 17142463 live objects, 17298 dead objects(occupying approx 1556820 bytes)'
--transcript--'...finished markForCollect.2013-12-19T10:21:02.28886604309082-08:00'
Unregistering...2013-12-19

So as you can see, it took almost 2 hours to do the markForCollect: in a 4GB repository. 

Normal or problem?

Thanks!
 
... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale


From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?


Hi guys, 

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine). 

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log  shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18  PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
   1 reclaims  40 pagesProcessed  40 pagesReclaimed 2 allValidPages  38 singleObjPages
   60 processedObjs  21 liveObjs  39 shadowObjs 0 deadObjs    1.5 avgObjsPerPa


All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).


And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,



--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GC causing system reboot?

Dale Henrichs-3
Mariano,

This does sound way too long ... with a 2GB SPC and a 4GB repo the mfc should be done at once:)

I'm suspicious that your Virtual Linux Machine is somehow paging it's brains out on your mac? I am not an expert in getting these thing setup, but if OSx is only giving your virtual machine 600Mb and your virtual machine thinks it has 2GB+ then that is probably your problem .... you need to find a way to give your virtual machine more real memory in OSx ... I think...

Dale


From: "Mariano Martinez Peck" <[hidden email]>
To: "Dale K. Henrichs" <[hidden email]>
Cc: [hidden email]
Sent: Thursday, December 19, 2013 10:24:51 AM
Subject: Re: [Glass] GC causing system reboot?




On Thu, Dec 19, 2013 at 12:40 PM, Dale K. Henrichs <[hidden email]> wrote:
By default the maintenance vm  runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin

Dale, I am JUST NOW facing this problem in my local machine. I have a topaz running taking 100% since 1:30 hours. The extent file is 4GB. Is this normal to take 1:30 hours??? mmm I don't think so.

My grep tells me that it is the gc indeed:

 501 12858     1   0 12:38PM ??        99:12.24 /opt/gemstone/product/bin/topaz -l -e /opt/gemstone/product/seaside/etc/maintenance30.conf

maintenance30.conf has a GEM_TEMPOBJ_CACHE_SIZE = 200000;

(although from the OSX monitor such a process is using 600MB)

My system.conf has:

SHR_PAGE_CACHE_SIZE_KB = 2000000;
GEM_TEMPOBJ_CACHE_SIZE = 1800000;

the big temp cache is just because I had some big reports that were crashing with an out of memory.

My sysctl has:

kern.sysv.shmmax=68719476736
kern.sysv.shmmin=1
kern.sysv.shmmni=256
kern.sysv.shmseg=64
kern.sysv.shmall=4294967296

And my machine has 8GB of RAM.  yes, I know, maybe those limits are too much.

So...should i let the topaz running? Since it started, I started having entries in maintenance_gem.log and  seaside_11810reclaimgcgem0.log but nothing strange to me. 

seaside_11810reclaimgcgem0.log shows a reclaiming like the following every 15 mins:

12/19/2013 15:04:48 ART
   90 reclaims  3953 pagesProcessed  3953 pagesReclaimed 0 allValidPages  0 singleObjPages
   1831336 processedObjs  219 liveObjs  0 shadowObjs 1831117 deadObjs  463.3 avgObjsPerPage

maintenance_gem.log shows:

Unregistering...2013-12-19T08:38:37.92392206192017-08:00
...Expired: 0 sessions.
--transcript--'Starting markForCollect.: 2013-12-19T08:38:37.94464492797852-08:00'
--transcript--'Warning: markForCollection found 17142463 live objects, 17298 dead objects(occupying approx 1556820 bytes)'
--transcript--'...finished markForCollect.2013-12-19T10:21:02.28886604309082-08:00'
Unregistering...2013-12-19

So as you can see, it took almost 2 hours to do the markForCollect: in a 4GB repository. 

Normal or problem?

Thanks!
 
... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale


From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?


Hi guys, 

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine). 

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log  shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18  PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
   1 reclaims  40 pagesProcessed  40 pagesReclaimed 2 allValidPages  38 singleObjPages
   60 processedObjs  21 liveObjs  39 shadowObjs 0 deadObjs    1.5 avgObjsPerPa


All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).


And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,



--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GC causing system reboot?

baulamon
In reply to this post by Mariano Martinez Peck
Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs