standby

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

standby

otto
Hi,

We are running a warm standby for some of our production systems. We
run GS 2.4.4.4 on ubuntu 10.04 LTS.

Today our database crashed when we replayed a tranlog, with the
following in the stone log:

UTL_GUARANTEE failed, File
/export/toronto3/users/buildgss/244x-1/src/shrpcclient.c line 72

We pushed the ram usage on the machine over its limits, I think. We
created large ram disks (tmpfs) which totaled 15.7GB, on a machine
with 16GB of ram.

We ran 2 GS databases + the standby on the machine, each with SPC's of
around 1 GB,

The tmpfs mounts are not full, so some of the ram is available for the
OS to allocate to SPC's. But when the filesystem fills up those ram
disks, we get it to break. I guess it has something to do with the OS
trying to swap.

Cheers
Otto
Reply | Threaded
Open this post in threaded view
|

Re: standby

Dale Henrichs
Otto,

The UTL_GUARANTEE is hit when the connection to the ShrPCMonitor is interrupted. When a linux system is about to run out of swap space, it picks some processes to kill and in the case of GemStone this process can be the ShrPCMonitor ... When the ShrPCMonitor is killed the stone will come tumbling down ...

So as you suggest, you should make sure that you've got plenty of swap space allocated ....

If you want to send us the logs (stone and other processes) we could take a closer look and see if there's something else that might be going on ...

Dale

----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Cc: "Pieter Jacobs" <[hidden email]>
| Sent: Thursday, October 4, 2012 3:19:24 AM
| Subject: [GS/SS Beta] standby
|
| Hi,
|
| We are running a warm standby for some of our production systems. We
| run GS 2.4.4.4 on ubuntu 10.04 LTS.
|
| Today our database crashed when we replayed a tranlog, with the
| following in the stone log:
|
| UTL_GUARANTEE failed, File
| /export/toronto3/users/buildgss/244x-1/src/shrpcclient.c line 72
|
| We pushed the ram usage on the machine over its limits, I think. We
| created large ram disks (tmpfs) which totaled 15.7GB, on a machine
| with 16GB of ram.
|
| We ran 2 GS databases + the standby on the machine, each with SPC's
| of
| around 1 GB,
|
| The tmpfs mounts are not full, so some of the ram is available for
| the
| OS to allocate to SPC's. But when the filesystem fills up those ram
| disks, we get it to break. I guess it has something to do with the OS
| trying to swap.
|
| Cheers
| Otto
|