Problem Starting Gemstone on Backed Up Extent

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem Starting Gemstone on Backed Up Extent

Joel Turnbull-3

Nightly, we do a runBackup on our production machines, and copy the resulting extent to a different location. We keep 7 days worth of those.

The final step of our testing process involves shutting down gemstone in our development vm, swapping in the latest backed up extent from our production machine, starting up gemstone with startstone -N, loading any new packages, running any migrations, and testing the results.

Yesterday when I went to start gemstone on a particular backup it failed. An attempt to add -R to startstone gave the same failure. The output is below. Any particular reason why a particular backed up extent would fail to start? If it had become corrupted in some way ( other than during the runBackup ) would you expect startstone to through that error? The backed up extent from a day prior started fine, Are there additional steps we can do to make sure our backed up extents are clean?


startstone[Error]: Stone process (id=4745) has died.
startstone[Error]: Examine '/opt/gemstone/log/seaside.log' for more information.  Excerpt follows:
    Space available = 1695 Mbytes = 108505 pages

    Totals
    ------
    Repository Size = 1834 Mbytes = 117376 pages
    Free Space = 1695 Mbytes = 108505 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown; recovery is needed.
    In extent 0, file size is inconsistent; recovery is needed.
    Extent changes detected, recovery needed.
    Repository was not shutdown cleanly, recovery needed.
    In recoverExtentSizes: extent 0 has been truncated.
      required size = 1834 bytes, actual size = 1048 bytes

    Stone startup has failed.
Reply | Threaded
Open this post in threaded view
|

Re: Problem Starting Gemstone on Backed Up Extent

Dale
Joel,

That error implies that you are copying extents to do your backup. I assume that you are using the runBackup script? That script doesn't explicitly check to see that the 'cp' operation took longer than the 5 minutes allocated for the backup. If the cp takes longer than 5 minutes, then it is very possible that you'll get that error.

If I remember correctly the only way to be certain that an online backup (cp the extent while checkpoints are suspended) is to test the backup by trying to start a stone with it ... Perhaps some of the other GemStone guys could chime in here.

I'm curious what error you got when you ran 'startstone -R'...

Dale
----- "Joel Turnbull" <[hidden email]> wrote:

| Nightly, we do a runBackup on our production machines, and copy the
| resulting extent to a different location. We keep 7 days worth of
| those.
|
| The final step of our testing process involves shutting down gemstone
| in our
| development vm, swapping in the latest backed up extent from our
| production
| machine, starting up gemstone with startstone -N, loading any new
| packages,
| running any migrations, and testing the results.
|
| Yesterday when I went to start gemstone on a particular backup it
| failed. An
| attempt to add -R to startstone gave the same failure. The output is
| below.
| Any particular reason why a particular backed up extent would fail to
| start?
| If it had become corrupted in some way ( other than during the
| runBackup )
| would you expect startstone to through that error? The backed up
| extent from
| a day prior started fine, Are there additional steps we can do to make
| sure
| our backed up extents are clean?
|
|
| startstone[Error]: Stone process (id=4745) has died.
| startstone[Error]: Examine '/opt/gemstone/log/seaside.log' for more
| information.  Excerpt follows:
|     Space available = 1695 Mbytes = 108505 pages
|
|     Totals
|     ------
|     Repository Size = 1834 Mbytes = 117376 pages
|     Free Space = 1695 Mbytes = 108505 pages
|     ---------------------------------------------------
|     Extent 0 was not cleanly shutdown; recovery is needed.
|     In extent 0, file size is inconsistent; recovery is needed.
|     Extent changes detected, recovery needed.
|     Repository was not shutdown cleanly, recovery needed.
|     In recoverExtentSizes: extent 0 has been truncated.
|       required size = 1834 bytes, actual size = 1048 bytes
|
|     Stone startup has failed.
Reply | Threaded
Open this post in threaded view
|

Re: Problem Starting Gemstone on Backed Up Extent

Joel Turnbull-3


On Tue, Mar 2, 2010 at 1:31 PM, Dale Henrichs <[hidden email]> wrote:
Joel,

That error implies that you are copying extents to do your backup. I assume that you are using the runBackup script?

that's  correct
 
That script doesn't explicitly check to see that the 'cp' operation took longer than the 5 minutes allocated for the backup. If the cp takes longer than 5 minutes, then it is very possible that you'll get that error.

It is possible that the copy took longer than five minutes. I haven't investigated why the extents get  so large, but I know Sean has and is working on remedying that.

so would you recommend trying to increase that 5 minute limit for now?
 

If I remember correctly the only way to be certain that an online backup (cp the extent while checkpoints are suspended) is to test the backup by trying to start a stone with it ... Perhaps some of the other GemStone guys could chime in here.

I'm curious what error you got when you ran 'startstone -R'...

well, unfortunately I deleted the backup. I'm pulling it down again but it's large. What I did was to modify the startGemstone script and add the -N after startstone. When I got the error, I tried adding -R after -N, and I believe the error was exactly the same.
 

Dale
----- "Joel Turnbull" <[hidden email]> wrote:

| Nightly, we do a runBackup on our production machines, and copy the
| resulting extent to a different location. We keep 7 days worth of
| those.
|
| The final step of our testing process involves shutting down gemstone
| in our
| development vm, swapping in the latest backed up extent from our
| production
| machine, starting up gemstone with startstone -N, loading any new
| packages,
| running any migrations, and testing the results.
|
| Yesterday when I went to start gemstone on a particular backup it
| failed. An
| attempt to add -R to startstone gave the same failure. The output is
| below.
| Any particular reason why a particular backed up extent would fail to
| start?
| If it had become corrupted in some way ( other than during the
| runBackup )
| would you expect startstone to through that error? The backed up
| extent from
| a day prior started fine, Are there additional steps we can do to make
| sure
| our backed up extents are clean?
|
|
| startstone[Error]: Stone process (id=4745) has died.
| startstone[Error]: Examine '/opt/gemstone/log/seaside.log' for more
| information.  Excerpt follows:
|     Space available = 1695 Mbytes = 108505 pages
|
|     Totals
|     ------
|     Repository Size = 1834 Mbytes = 117376 pages
|     Free Space = 1695 Mbytes = 108505 pages
|     ---------------------------------------------------
|     Extent 0 was not cleanly shutdown; recovery is needed.
|     In extent 0, file size is inconsistent; recovery is needed.
|     Extent changes detected, recovery needed.
|     Repository was not shutdown cleanly, recovery needed.
|     In recoverExtentSizes: extent 0 has been truncated.
|       required size = 1834 bytes, actual size = 1048 bytes
|
|     Stone startup has failed.

Reply | Threaded
Open this post in threaded view
|

Re: Problem Starting Gemstone on Backed Up Extent

Dale
In reply to this post by Joel Turnbull-3
Joel,

I think you should have used the -R without the -N. It might be a bug to allow you to specify both, since they are contradictory directives.

For production backups, I would transition to using Repository>>fullBackupTo:. The backup files will be smaller than the extents (extents include empty pages and don't normally shrink).

For making copies to use in development, the copy while checkpoints are suspended is probably more convenient. So bumping up the time limit should do the trick.

Dale
----- "Joel Turnbull" <[hidden email]> wrote:

| On Tue, Mar 2, 2010 at 1:31 PM, Dale Henrichs
| <[hidden email]>wrote:
|
| > Joel,
| >
| > That error implies that you are copying extents to do your backup. I
| assume
| > that you are using the runBackup script?
|
|
| that's  correct
|
|
| > That script doesn't explicitly check to see that the 'cp' operation
| took
| > longer than the 5 minutes allocated for the backup. If the cp takes
| longer
| > than 5 minutes, then it is very possible that you'll get that
| error.
| >
|
| It is possible that the copy took longer than five minutes. I haven't
| investigated why the extents get  so large, but I know Sean has and
| is
| working on remedying that.
|
| so would you recommend trying to increase that 5 minute limit for
| now?
|
|
| >
| > If I remember correctly the only way to be certain that an online
| backup
| > (cp the extent while checkpoints are suspended) is to test the
| backup by
| > trying to start a stone with it ... Perhaps some of the other
| GemStone guys
| > could chime in here.
| >
| > I'm curious what error you got when you ran 'startstone -R'...
| >
|
| well, unfortunately I deleted the backup. I'm pulling it down again
| but it's
| large. What I did was to modify the startGemstone script and add the
| -N
| after startstone. When I got the error, I tried adding -R after -N,
| and I
| believe the error was exactly the same.
|
|
| >
| > Dale
| > ----- "Joel Turnbull" <[hidden email]> wrote:
| >
| > | Nightly, we do a runBackup on our production machines, and copy
| the
| > | resulting extent to a different location. We keep 7 days worth of
| > | those.
| > |
| > | The final step of our testing process involves shutting down
| gemstone
| > | in our
| > | development vm, swapping in the latest backed up extent from our
| > | production
| > | machine, starting up gemstone with startstone -N, loading any new
| > | packages,
| > | running any migrations, and testing the results.
| > |
| > | Yesterday when I went to start gemstone on a particular backup it
| > | failed. An
| > | attempt to add -R to startstone gave the same failure. The output
| is
| > | below.
| > | Any particular reason why a particular backed up extent would fail
| to
| > | start?
| > | If it had become corrupted in some way ( other than during the
| > | runBackup )
| > | would you expect startstone to through that error? The backed up
| > | extent from
| > | a day prior started fine, Are there additional steps we can do to
| make
| > | sure
| > | our backed up extents are clean?
| > |
| > |
| > | startstone[Error]: Stone process (id=4745) has died.
| > | startstone[Error]: Examine '/opt/gemstone/log/seaside.log' for
| more
| > | information.  Excerpt follows:
| > |     Space available = 1695 Mbytes = 108505 pages
| > |
| > |     Totals
| > |     ------
| > |     Repository Size = 1834 Mbytes = 117376 pages
| > |     Free Space = 1695 Mbytes = 108505 pages
| > |     ---------------------------------------------------
| > |     Extent 0 was not cleanly shutdown; recovery is needed.
| > |     In extent 0, file size is inconsistent; recovery is needed.
| > |     Extent changes detected, recovery needed.
| > |     Repository was not shutdown cleanly, recovery needed.
| > |     In recoverExtentSizes: extent 0 has been truncated.
| > |       required size = 1834 bytes, actual size = 1048 bytes
| > |
| > |     Stone startup has failed.
| >