Smalltalk › Gemtalk › GLASS

[Glass] GC causing system reboot?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Mariano Martinez Peck

[Glass] GC causing system reboot?

Hi guys,

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine).

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log shows:

--- 12/19/2013 06:08:21.317 EST ---

[Info]: Successfully acquired GC lockKind 6 on attempt 1.

Starting SweepWsUnion, WSU size=18 PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST

1 reclaims 40 pagesProcessed 40 pagesReclaimed 2 allValidPages 38 singleObjPages

60 processedObjs 21 liveObjs 39 shadowObjs 0 deadObjs 1.5 avgObjsPerPa

All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00

...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).

And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?

I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Dale Henrichs-3

Re: [Glass] GC causing system reboot?

By default the maintenance vm runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin ... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale

From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?

Hi guys,

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine).

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18 PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
1 reclaims 40 pagesProcessed 40 pagesReclaimed 2 allValidPages 38 singleObjPages
60 processedObjs 21 liveObjs 39 shadowObjs 0 deadObjs 1.5 avgObjsPerPa

All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).

And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano Martinez Peck

Re: [Glass] GC causing system reboot?

On Thu, Dec 19, 2013 at 12:40 PM, Dale K. Henrichs <[hidden email]> wrote:

By default the maintenance vm runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin

Dale, I am JUST NOW facing this problem in my local machine. I have a topaz running taking 100% since 1:30 hours. The extent file is 4GB. Is this normal to take 1:30 hours??? mmm I don't think so.

My grep tells me that it is the gc indeed:

501 12858 1 0 12:38PM ?? 99:12.24 /opt/gemstone/product/bin/topaz -l -e /opt/gemstone/product/seaside/etc/maintenance30.conf

maintenance30.conf has a GEM_TEMPOBJ_CACHE_SIZE = 200000;

(although from the OSX monitor such a process is using 600MB)

My system.conf has:

SHR_PAGE_CACHE_SIZE_KB = 2000000;

GEM_TEMPOBJ_CACHE_SIZE = 1800000;

the big temp cache is just because I had some big reports that were crashing with an out of memory.

My sysctl has:

kern.sysv.shmmax=68719476736

kern.sysv.shmmin=1

kern.sysv.shmmni=256

kern.sysv.shmseg=64

kern.sysv.shmall=4294967296

And my machine has 8GB of RAM. yes, I know, maybe those limits are too much.

So...should i let the topaz running? Since it started, I started having entries in maintenance_gem.log and seaside_11810reclaimgcgem0.log but nothing strange to me.

seaside_11810reclaimgcgem0.log shows a reclaiming like the following every 15 mins:

12/19/2013 15:04:48 ART

90 reclaims 3953 pagesProcessed 3953 pagesReclaimed 0 allValidPages 0 singleObjPages

1831336 processedObjs 219 liveObjs 0 shadowObjs 1831117 deadObjs 463.3 avgObjsPerPage

maintenance_gem.log shows:

Unregistering...2013-12-19T08:38:37.92392206192017-08:00

...Expired: 0 sessions.

--transcript--'Starting markForCollect.: 2013-12-19T08:38:37.94464492797852-08:00'

--transcript--'Warning: markForCollection found 17142463 live objects, 17298 dead objects(occupying approx 1556820 bytes)'

--transcript--'...finished markForCollect.2013-12-19T10:21:02.28886604309082-08:00'

Unregistering...2013-12-19

So as you can see, it took almost 2 hours to do the markForCollect: in a 4GB repository.

Normal or problem?

Thanks!

... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale

From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?

Hi guys,

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine).

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18 PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
1 reclaims 40 pagesProcessed 40 pagesReclaimed 2 allValidPages 38 singleObjPages
60 processedObjs 21 liveObjs 39 shadowObjs 0 deadObjs 1.5 avgObjsPerPa

All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).

And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Dale Henrichs-3

Re: [Glass] GC causing system reboot?

Mariano,

This does sound way too long ... with a 2GB SPC and a 4GB repo the mfc should be done at once:)

I'm suspicious that your Virtual Linux Machine is somehow paging it's brains out on your mac? I am not an expert in getting these thing setup, but if OSx is only giving your virtual machine 600Mb and your virtual machine thinks it has 2GB+ then that is probably your problem .... you need to find a way to give your virtual machine more real memory in OSx ... I think...

Dale

From: "Mariano Martinez Peck" <[hidden email]>
To: "Dale K. Henrichs" <[hidden email]>
Cc: [hidden email]
Sent: Thursday, December 19, 2013 10:24:51 AM
Subject: Re: [Glass] GC causing system reboot?

On Thu, Dec 19, 2013 at 12:40 PM, Dale K. Henrichs <[hidden email]> wrote:

By default the maintenance vm runs an mfc once an hour, so if you haven't changed the frequncy for that I'm not sure what the "once a day job would be". The MFC's should show up in your maintenance vm log .... from your description a backup or mfc could cause "sluggishness" on a machine if the repository exceeds the size of your SPC by a significant margin

Dale, I am JUST NOW facing this problem in my local machine. I have a topaz running taking 100% since 1:30 hours. The extent file is 4GB. Is this normal to take 1:30 hours??? mmm I don't think so.

My grep tells me that it is the gc indeed:

501 12858 1 0 12:38PM ?? 99:12.24 /opt/gemstone/product/bin/topaz -l -e /opt/gemstone/product/seaside/etc/maintenance30.conf

maintenance30.conf has a GEM_TEMPOBJ_CACHE_SIZE = 200000;

(although from the OSX monitor such a process is using 600MB)

My system.conf has:

SHR_PAGE_CACHE_SIZE_KB = 2000000;
GEM_TEMPOBJ_CACHE_SIZE = 1800000;

the big temp cache is just because I had some big reports that were crashing with an out of memory.

My sysctl has:

kern.sysv.shmmax=68719476736
kern.sysv.shmmin=1
kern.sysv.shmmni=256
kern.sysv.shmseg=64
kern.sysv.shmall=4294967296

And my machine has 8GB of RAM. yes, I know, maybe those limits are too much.

So...should i let the topaz running? Since it started, I started having entries in maintenance_gem.log and seaside_11810reclaimgcgem0.log but nothing strange to me.

seaside_11810reclaimgcgem0.log shows a reclaiming like the following every 15 mins:

12/19/2013 15:04:48 ART
90 reclaims 3953 pagesProcessed 3953 pagesReclaimed 0 allValidPages 0 singleObjPages

1831336 processedObjs 219 liveObjs 0 shadowObjs 1831117 deadObjs 463.3 avgObjsPerPage

maintenance_gem.log shows:

Unregistering...2013-12-19T08:38:37.92392206192017-08:00

...Expired: 0 sessions.
--transcript--'Starting markForCollect.: 2013-12-19T08:38:37.94464492797852-08:00'
--transcript--'Warning: markForCollection found 17142463 live objects, 17298 dead objects(occupying approx 1556820 bytes)'

--transcript--'...finished markForCollect.2013-12-19T10:21:02.28886604309082-08:00'
Unregistering...2013-12-19

So as you can see, it took almost 2 hours to do the markForCollect: in a 4GB repository.

Normal or problem?

Thanks!

... Nothing GemStone does should cause a system to reboot, however, if you start running low on real memory (how much swap space do you nave allocated) linux will start killing processes that are memory hogs and perhaps it killed a process that resulted in a reboot?

Dale

From: "Mariano Martinez Peck" <[hidden email]>
To: [hidden email]
Sent: Thursday, December 19, 2013 5:28:34 AM
Subject: [Glass] GC causing system reboot?

Hi guys,

Today morning I found out that my server was restarted automatically at 6:54 am. I was weird, since it was an unexpected restart. I tried to find the cause and I didn't find anything from the OS itself nor from VirtualBox logs (yes, the Linux runs over a VirtualBox machine).

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs (and the timestamps):

seaside_20477admingcgem.log shows:

--- 12/19/2013 06:08:21.317 EST ---
[Info]: Successfully acquired GC lockKind 6 on attempt 1.
Starting SweepWsUnion, WSU size=18 PD size=7

seaside_20475reclaimgcgem0.log shows:

12/19/2013 06:19:51 EST
1 reclaims 40 pagesProcessed 40 pagesReclaimed 2 allValidPages 38 singleObjPages
60 processedObjs 21 liveObjs 39 shadowObjs 0 deadObjs 1.5 avgObjsPerPa

All 3 fastcgi logs have several lines like:

--transcript--'handled sigabort: 2013-12-19T03:46:52.31182599067688-08:00'

Here I think the difference in the time is that the previous ones were the server local time while in this one is the default timezone I have in GemStone (could that be it?).

Finally, maintenance_gem.log has modification timestamp at 6:52 but no meaningful entry. It has plenty of entries like this:

Unregistering...2013-12-19T03:52:42.86958003044128-08:00
...Expired: 0 sessions.

(again, different timestampt, but maybe the difference I mention above).

And at 6:54 (2 minutes after timestamp of mainenance_gem.log), the system was rebooted. So....is there anything particular strange here?
I have the same setup in my machine (OSX) and I notice that every day at some point, a topaz process starts to run in my machine and takes 100% of CPU. It literally kills my machine, I mean, I cannot do anything else. And the process takes quite a while. I GUESS that is the GC running. So maybe the GC was simply trying to run in my server and the fact of being CPU intensive put it down? mmmmm

Thanks in advance for any help,

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

baulamon

Re: [Glass] GC causing system reboot?

In reply to this post by Mariano Martinez Peck

Then I started to take a look to see if GemStone could have been the cause...and indeed, there might be something related. Look at the logs