GemSource server troubles

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

GemSource server troubles

Dale Henrichs
The machine hosting GemSource crashed this morning ... two (count 'em)
two disk failed at the same time which ended up corrupting both the
extent file and tranlog file ... we're running RAID 5 on the machine so
this sort of thing shouldn't happen ... but it did ...

We have not been moved to the VMWare data center, so we're still running
on our older GemStone hardware ...

The IS guys suspect disk controller issues, but have no evidence so
there's no hardware to be replaced (yet).

We have a backup from 06/21/10 17:41:11 PDT (last night at 5:41 pm PDT),
but without a good tranlog we lost any commits that were made after
5:41pm ... If you made commits last night, then you will need to commit
again.

Most of today has been spent putting together and testing a collection
of scripts that will backup and validate extent and tranlogs on a more
regular basis (we can't trust the disk ... sheesh) to minimize the
amount of data lost if we experience more corruption before we can move
GemSource to the VMWAre datacenter or to more reliable hardware.

I expect GemSource to be back up in an hour or so ...

I sure am glad that I added secondary repository support to Metacello
... I did a Seaside30 build this morning on Pharo with no trouble ... it
wasn't until I tried to load Seaside30 on Squeak 4.1 that I noticed that
GemSource was down ... apparently there is a bug with the
Exception>>return: in Squeak4.1. The #return: is supposed to return
control to the block being protected by the exception, but it appears
that #return: acts like #resume:, since control returned to the source
of the error when I stepped through using the debugger....

To top things off, my desktop machine crashed not long after I started
investigating the GemSource crash and when it came back up, I couldn't
run the pharo or squeak vms ... needless to say the IS guys have been
too busy to take care of my poor desktop ...

When it rains it pours ... oh and the weather here in Portland has been
mostly cloudy, but it hasn't physically rained here, that would have
really taken the cake:)

I'll send mail to a wider audience when GemSource is back up...

Dale

Reply | Threaded
Open this post in threaded view
|

Re: GemSource server troubles

Dale Henrichs
Hmmm, we also lost pieces of our mailman setup with this crash...

Dale


Dale Henrichs wrote:

> The machine hosting GemSource crashed this morning ... two (count 'em)
> two disk failed at the same time which ended up corrupting both the
> extent file and tranlog file ... we're running RAID 5 on the machine so
> this sort of thing shouldn't happen ... but it did ...
>
> We have not been moved to the VMWare data center, so we're still running
> on our older GemStone hardware ...
>
> The IS guys suspect disk controller issues, but have no evidence so
> there's no hardware to be replaced (yet).
>
> We have a backup from 06/21/10 17:41:11 PDT (last night at 5:41 pm PDT),
> but without a good tranlog we lost any commits that were made after
> 5:41pm ... If you made commits last night, then you will need to commit
> again.
>
> Most of today has been spent putting together and testing a collection
> of scripts that will backup and validate extent and tranlogs on a more
> regular basis (we can't trust the disk ... sheesh) to minimize the
> amount of data lost if we experience more corruption before we can move
> GemSource to the VMWAre datacenter or to more reliable hardware.
>
> I expect GemSource to be back up in an hour or so ...
>
> I sure am glad that I added secondary repository support to Metacello
> ... I did a Seaside30 build this morning on Pharo with no trouble ... it
> wasn't until I tried to load Seaside30 on Squeak 4.1 that I noticed that
> GemSource was down ... apparently there is a bug with the
> Exception>>return: in Squeak4.1. The #return: is supposed to return
> control to the block being protected by the exception, but it appears
> that #return: acts like #resume:, since control returned to the source
> of the error when I stepped through using the debugger....
>
> To top things off, my desktop machine crashed not long after I started
> investigating the GemSource crash and when it came back up, I couldn't
> run the pharo or squeak vms ... needless to say the IS guys have been
> too busy to take care of my poor desktop ...
>
> When it rains it pours ... oh and the weather here in Portland has been
> mostly cloudy, but it hasn't physically rained here, that would have
> really taken the cake:)
>
> I'll send mail to a wider audience when GemSource is back up...
>
> Dale
>