[Glass] Out of space and now cannot start stone again :(

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck
Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck



On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




--
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

James Foster-9
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.

James

On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck



On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.


Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:

    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
    Starting up without recovery and the log file 24 exists
      Remove log files and restart.
    Terminating stone.

So I needed to delete the tranlog file and start again. And there it worked.

Thanks!


 
James


On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

James Foster-9
On a production system I would replace “delete the tranlog file” with “move the tranlog file elsewhere”!

On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.

Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:

    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
    Starting up without recovery and the log file 24 exists
      Remove log files and restart.
    Terminating stone.

So I needed to delete the tranlog file and start again. And there it worked.

Thanks!


 
James


On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck



On Wed, May 21, 2014 at 7:02 PM, James Foster <[hidden email]> wrote:
On a production system I would replace “delete the tranlog file” with “move the tranlog file elsewhere”!


Indeed that was what I did. Now I wonder...say copydbf failed to make the tranlog to work again (like in my case), is the tranlog still useful?
What kind of black magic could you do? I guess there is something because you are saying to keep it instead of remove it. 

Thanks!
 

On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.

Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:

    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
    Starting up without recovery and the log file 24 exists
      Remove log files and restart.
    Terminating stone.

So I needed to delete the tranlog file and start again. And there it worked.

Thanks!


 
James


On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

James Foster-9
On May 22, 2014, at 7:25 AM, Mariano Martinez Peck <[hidden email]> wrote:

> Now I wonder...say copydbf failed to make the tranlog to work again (like in my case), is the tranlog still useful?
> What kind of black magic could you do? I guess there is something because you are saying to keep it instead of remove it.

First, the general rule is to not deleting anything during a time of crisis. It might be that you delete the wrong file; it might be that the tranlog isn’t corrupt (you misunderstood the message or it was for a different time, stone, or situation); …

Second, there is the possibility of using the tranlog in a restore from backup. You can restore to a checkpoint (point in time) and explicitly ignore the records after the checkpoint (including the corrupt record). See the SAG for details.

Third, we might be able to manually repair the tranlog or trick the stone into ignoring the invalid record (for a price!).

James
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

James Foster-9
In reply to this post by Mariano Martinez Peck
Mariano,

Just a follow-up on this issue. As you might notice from my recent answer to a question from Johan, an out-of-disk-space situation should not result in a corrupt log—just an incomplete log. At worst, it should result in an incomplete final record and the system should be smart enough to read everything after the most recent checkpoint up to the last complete record (or up to an incomplete record at the end). You can demonstrate this by committing something simple and then do a ‘kill -9’ on the stone pid (anything less will just do a clean shutdown!). Then you can use the ‘truncate -s’ command to trim a few bytes from the end of the last transaction log. On restart the stone should report that recovery is needed and it should recover without complaints and when you login the final transaction should be missing.

Obviously, this isn’t what happened in your case. What we think happened is that the final tranlog record was not actually a regular transaction but a record of a checkpoint (on an otherwise quiet system with a checkpoint every five minutes this would be quite reasonable). We believe that we have a bug in the system (I’ve entered this as #44318) in which the system writes the “root page” to the extent with the most recent checkpoint information before noticing that the matching tranlog record failed to write. The checkpoint could succeed because it doesn’t need new space on the disk—it is just flushing dirty pages from the cache to disk in the existing extent files. During the restart it recognized that the shutdown was not clean and it got the checkpoint log record ID from the root page and went to the transaction log to try to recover things after the checkpoint. What it found is that the checkpoint record in the root page did not exist in the transaction log and it (rightly) reported this as an error. (The error was in writing the extent’s root page without checking for an error on the tranlog write.) The good news is that a restart from the most recent checkpoint (with -N) did not result in any lost transactions since the last record was marking the checkpoint from which recovery would have been attempted.

James

On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.

Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:

    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
    Starting up without recovery and the log file 24 exists
      Remove log files and restart.
    Terminating stone.

So I needed to delete the tranlog file and start again. And there it worked.

Thanks!


 
James


On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Johan Brichau-3
Hi James,

Thanks for your answers (in both threads).

In our experience, on unexpected shutdowns, we do experience the 'EOF encountered while reading log record.' regularly.
Over the past year, we had a couple of server crashes and there would always be a couple of stones with this problem.

What you say makes sense because these crashes always occurred when the system was relatively idle (i.e. at night).

If I understand correctly, such a tranlog would still be usable for a restore from tranlogs if it would be needed.

thx
Johan

On 03 Jun 2014, at 02:06, James Foster <[hidden email]> wrote:

> Mariano,
>
> Just a follow-up on this issue. As you might notice from my recent answer to a question from Johan, an out-of-disk-space situation should not result in a corrupt log—just an incomplete log. At worst, it should result in an incomplete final record and the system should be smart enough to read everything after the most recent checkpoint up to the last complete record (or up to an incomplete record at the end). You can demonstrate this by committing something simple and then do a ‘kill -9’ on the stone pid (anything less will just do a clean shutdown!). Then you can use the ‘truncate -s’ command to trim a few bytes from the end of the last transaction log. On restart the stone should report that recovery is needed and it should recover without complaints and when you login the final transaction should be missing.
>
> Obviously, this isn’t what happened in your case. What we think happened is that the final tranlog record was not actually a regular transaction but a record of a checkpoint (on an otherwise quiet system with a checkpoint every five minutes this would be quite reasonable). We believe that we have a bug in the system (I’ve entered this as #44318) in which the system writes the “root page” to the extent with the most recent checkpoint information before noticing that the matching tranlog record failed to write. The checkpoint could succeed because it doesn’t need new space on the disk—it is just flushing dirty pages from the cache to disk in the existing extent files. During the restart it recognized that the shutdown was not clean and it got the checkpoint log record ID from the root page and went to the transaction log to try to recover things after the checkpoint. What it found is that the checkpoint record in the root page did not exist in the transaction log and it (rightly) reported this as an error. (The error was in writing the extent’s root page without checking for an error on the tranlog write.) The good news is that a restart from the most recent checkpoint (with -N) did not result in any lost transactions since the last record was marking the checkpoint from which recovery would have been attempted.
>
> James
>
> On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:
>
>>
>>
>>
>> On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
>> Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.
>>
>> Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:
>>
>>     Opened a transaction log file for read_nolocks.
>>        filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
>>     Starting up without recovery and the log file 24 exists
>>       Remove log files and restart.
>>     Terminating stone.
>>
>> So I needed to delete the tranlog file and start again. And there it worked.
>>
>> Thanks!
>>
>>
>>  
>> James
>>
>>
>> On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:
>>
>>>
>>>
>>>
>>> On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
>>> Hi Mariano,
>>>
>>> When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options.
>>>
>>>
>>> Hi James, thanks for this great answer!!
>>>
>>> Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.
>>>
>>>  
>>> It is best to try to keep the stone running so that you do not lose data.
>>>
>>> OK...didn't know. I learned today.
>>>  
>>> If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).
>>>
>>>
>>> Indeed, that's exactly what happened in my case.
>>>  
>>> If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).
>>>
>>>
>>> Thanks, I have just checked and yes, it is 5 minutes (300 seconds).
>>>  
>>> Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint.
>>>
>>>
>>> OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
>>>  
>>> It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine.
>>>
>>>
>>> OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work.
>>>  
>>> Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)
>>>
>>>
>>> :) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly.
>>>
>>> Thanks james!
>>>  
>>> Let us know how it goes,
>>>
>>> James Foster
>>>
>>>
>>> On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> My server got out of disk space and gemstone had the following error while trying to connect via topaz:
>>>>
>>>> Login denied to other than SystemUser or DataCurator. All tranlog
>>>> directories are full. Stone process waiting for operator to archive
>>>> tranlogs or add more directories.,
>>>>
>>>> OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog.
>>>>
>>>> What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request?
>>>>
>>>>
>>>> Thanks in advance,
>>>>
>>>>
>>>>
>>>>
>>>> ========================================================================
>>>>     Now starting GemStone monitor.
>>>>
>>>> Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
>>>> No server process protection from OOM killer
>>>>  _____________________________________________________________________________
>>>> |     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
>>>> |_____________________________________________________________________________|
>>>>
>>>>     Attaching the Shared Cache using Stone name: XXX
>>>>     Successfully started 1 free frame page servers.
>>>>
>>>>     -------------------------------------------------------
>>>>     Summary of Configured Transaction Logs
>>>>       Directory   0:
>>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>>         configuredSize 1000 MB
>>>>       Directory   1:
>>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>>         configuredSize 1000 MB
>>>>     -------------------------------------------------------
>>>>
>>>>     Extent #0
>>>>     -----------
>>>>     Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
>>>>     Maximum size = NONE
>>>>     File size = 10586 Mbytes = 677504 pages
>>>>     Space available = 7927 Mbytes = 507353 pages
>>>>
>>>>     Totals
>>>>     ------
>>>>     Repository Size = 10586 Mbytes = 677504 pages
>>>>     Free Space = 7927 Mbytes = 507353 pages
>>>>     ---------------------------------------------------
>>>>     Extent 0 was not cleanly shutdown.
>>>>
>>>>
>>>>     Repository startup statistics:
>>>>         Pages Need Reclaiming =10
>>>>         Free Oops=133758598
>>>>         Oop Number High Water Mark=157752737
>>>>         Possible Dead Objects=12405000
>>>>         Dead Objects=0
>>>>         Epoch Transaction Count=0
>>>>         Epoch New Objects Union=0
>>>>         Epoch Written Objects Union=0
>>>>         Epoch DependencyMap Objects Union=0
>>>>
>>>>     Repository startup is from checkpoint = (fileId 13, blockId 388232)
>>>>
>>>>    SearchForMostRecentLog did not find any tranlogs
>>>>
>>>>  :: (wildcard) found in listening addresses, ignoring other addresses
>>>> created listening socket for ::, on :: port 43405
>>>>
>>>>
>>>>     Opened a transaction log file for read_nolocks.
>>>>        filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
>>>> EOF while reading log record.
>>>> EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
>>>>     Waiting for all tranlog writes to complete
>>>>
>>>>     Stone startup has failed.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>> _______________________________________________
>>>> Glass mailing list
>>>> [hidden email]
>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck
In reply to this post by James Foster-9



On Mon, Jun 2, 2014 at 9:06 PM, James Foster <[hidden email]> wrote:
Mariano,

Just a follow-up on this issue.


Hi James, thanks for the follow-up!
 
As you might notice from my recent answer to a question from Johan, an out-of-disk-space situation should not result in a corrupt log—just an incomplete log. At worst, it should result in an incomplete final record and the system should be smart enough to read everything after the most recent checkpoint up to the last complete record (or up to an incomplete record at the end). You can demonstrate this by committing something simple and then do a ‘kill -9’ on the stone pid (anything less will just do a clean shutdown!). Then you can use the ‘truncate -s’ command to trim a few bytes from the end of the last transaction log. On restart the stone should report that recovery is needed and it should recover without complaints and when you login the final transaction should be missing.


I understand.
 
Obviously, this isn’t what happened in your case. What we think happened is that the final tranlog record was not actually a regular transaction but a record of a checkpoint (on an otherwise quiet system with a checkpoint every five minutes this would be quite reasonable). We believe that we have a bug in the system (I’ve entered this as #44318) in which the system writes the “root page” to the extent with the most recent checkpoint information before noticing that the matching tranlog record failed to write. The checkpoint could succeed because it doesn’t need new space on the disk—it is just flushing dirty pages from the cache to disk in the existing extent files. During the restart it recognized that the shutdown was not clean and it got the checkpoint log record ID from the root page and went to the transaction log to try to recover things after the checkpoint. What it found is that the checkpoint record in the root page did not exist in the transaction log and it (rightly) reported this as an error. (The error was in writing the extent’s root page without checking for an error on the tranlog write.) The good news is that a restart from the most recent checkpoint (with -N) did not result in any lost transactions since the last record was marking the checkpoint from which recovery would have been attempted.

Wow...nice catch!!! I can confirm you that as much as I could talk with the users that were using my app, they told me they did NOT loose any of the "edits" they were doing in the app (thing I thought it was going to happen). So it seems I have experienced what you have just described!

Thanks!

 

James


On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.

Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:

    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
    Starting up without recovery and the log file 24 exists
      Remove log files and restart.
    Terminating stone.

So I needed to delete the tranlog file and start again. And there it worked.

Thanks!


 
James


On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:




On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 


Hi James, thanks for this great answer!! 

Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.

 
It is best to try to keep the stone running so that you do not lose data.

OK...didn't know. I learned today.
 
If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).


Indeed, that's exactly what happened in my case. 
 
If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).


Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
 
Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 


OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
 
It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 


OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
 
Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)


:) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 

Thanks james!
 
Let us know how it goes,

James Foster


On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi guys,

My server got out of disk space and gemstone had the following error while trying to connect via topaz:

Login denied to other than SystemUser or DataCurator. All tranlog
directories are full. Stone process waiting for operator to archive
tranlogs or add more directories.,

OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 

What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 


Thanks in advance,




========================================================================
    Now starting GemStone monitor.

Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
No server process protection from OOM killer
 _____________________________________________________________________________
|     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
|_____________________________________________________________________________|

    Attaching the Shared Cache using Stone name: XXX
    Successfully started 1 free frame page servers.

    -------------------------------------------------------
    Summary of Configured Transaction Logs
      Directory   0:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
      Directory   1:
        configured name /XXX/Sites/XXX/gemstone/data
        expanded name /XXX/Sites/XXX/gemstone/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    Extent #0
    -----------
    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
    Maximum size = NONE
    File size = 10586 Mbytes = 677504 pages
    Space available = 7927 Mbytes = 507353 pages

    Totals
    ------
    Repository Size = 10586 Mbytes = 677504 pages
    Free Space = 7927 Mbytes = 507353 pages
    ---------------------------------------------------
    Extent 0 was not cleanly shutdown.


    Repository startup statistics:
        Pages Need Reclaiming =10
        Free Oops=133758598
        Oop Number High Water Mark=157752737
        Possible Dead Objects=12405000
        Dead Objects=0
        Epoch Transaction Count=0
        Epoch New Objects Union=0
        Epoch Written Objects Union=0
        Epoch DependencyMap Objects Union=0

    Repository startup is from checkpoint = (fileId 13, blockId 388232)

   SearchForMostRecentLog did not find any tranlogs

 :: (wildcard) found in listening addresses, ignoring other addresses
created listening socket for ::, on :: port 43405


    Opened a transaction log file for read_nolocks.
       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
EOF while reading log record.
EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
    Waiting for all tranlog writes to complete

    Stone startup has failed.




-- 
Mariano
http://marianopeck.wordpress.com
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




-- 
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

James Foster-9
In reply to this post by Johan Brichau-3
On Jun 2, 2014, at 10:26 PM, Johan Brichau <[hidden email]> wrote:

> Hi James,
>
> Thanks for your answers (in both threads).
>
> In our experience, on unexpected shutdowns, we do experience the 'EOF encountered while reading log record.' regularly.
> Over the past year, we had a couple of server crashes and there would always be a couple of stones with this problem.

I’ve been unable to reproduce this problem internally (all my efforts to truncate the log still result in the stone starting successfully—this is with 3.1.0.6 and 3.2.0). We will continue to investigate this but if this happens again and you are willing to share the extent and transaction log with us please let me know.

> What you say makes sense because these crashes always occurred when the system was relatively idle (i.e. at night).
>
> If I understand correctly, such a tranlog would still be usable for a restore from tranlogs if it would be needed.

That is certainly the design and what we do in our internal testing. On the other hand, you might have a situation that we can’t reproduce so it would be prudent to try the restore situation yourself (always a good practice!).

James

> thx
> Johan
>
> On 03 Jun 2014, at 02:06, James Foster <[hidden email]> wrote:
>
>> Mariano,
>>
>> Just a follow-up on this issue. As you might notice from my recent answer to a question from Johan, an out-of-disk-space situation should not result in a corrupt log—just an incomplete log. At worst, it should result in an incomplete final record and the system should be smart enough to read everything after the most recent checkpoint up to the last complete record (or up to an incomplete record at the end). You can demonstrate this by committing something simple and then do a ‘kill -9’ on the stone pid (anything less will just do a clean shutdown!). Then you can use the ‘truncate -s’ command to trim a few bytes from the end of the last transaction log. On restart the stone should report that recovery is needed and it should recover without complaints and when you login the final transaction should be missing.
>>
>> Obviously, this isn’t what happened in your case. What we think happened is that the final tranlog record was not actually a regular transaction but a record of a checkpoint (on an otherwise quiet system with a checkpoint every five minutes this would be quite reasonable). We believe that we have a bug in the system (I’ve entered this as #44318) in which the system writes the “root page” to the extent with the most recent checkpoint information before noticing that the matching tranlog record failed to write. The checkpoint could succeed because it doesn’t need new space on the disk—it is just flushing dirty pages from the cache to disk in the existing extent files. During the restart it recognized that the shutdown was not clean and it got the checkpoint log record ID from the root page and went to the transaction log to try to recover things after the checkpoint. What it found is that the checkpoint record in the root page did not exist in the transaction log and it (rightly) reported this as an error. (The error was in writing the extent’s root page without checking for an error on the tranlog write.) The good news is that a restart from the most recent checkpoint (with -N) did not result in any lost transactions since the last record was marking the checkpoint from which recovery would have been attempted.
>>
>> James
>>
>> On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <[hidden email]> wrote:
>>
>>>
>>>
>>>
>>> On Wed, May 21, 2014 at 6:26 PM, James Foster <[hidden email]> wrote:
>>> Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well.
>>>
>>> Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:
>>>
>>>    Opened a transaction log file for read_nolocks.
>>>       filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
>>>    Starting up without recovery and the log file 24 exists
>>>      Remove log files and restart.
>>>    Terminating stone.
>>>
>>> So I needed to delete the tranlog file and start again. And there it worked.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> James
>>>
>>>
>>> On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <[hidden email]> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 21, 2014 at 3:47 PM, James Foster <[hidden email]> wrote:
>>>> Hi Mariano,
>>>>
>>>> When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options.
>>>>
>>>>
>>>> Hi James, thanks for this great answer!!
>>>>
>>>> Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.
>>>>
>>>>
>>>> It is best to try to keep the stone running so that you do not lose data.
>>>>
>>>> OK...didn't know. I learned today.
>>>>
>>>> If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).
>>>>
>>>>
>>>> Indeed, that's exactly what happened in my case.
>>>>
>>>> If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).
>>>>
>>>>
>>>> Thanks, I have just checked and yes, it is 5 minutes (300 seconds).
>>>>
>>>> Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint.
>>>>
>>>>
>>>> OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
>>>>
>>>> It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine.
>>>>
>>>>
>>>> OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work.
>>>>
>>>> Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)
>>>>
>>>>
>>>> :) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly.
>>>>
>>>> Thanks james!
>>>>
>>>> Let us know how it goes,
>>>>
>>>> James Foster
>>>>
>>>>
>>>> On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <[hidden email]> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> My server got out of disk space and gemstone had the following error while trying to connect via topaz:
>>>>>
>>>>> Login denied to other than SystemUser or DataCurator. All tranlog
>>>>> directories are full. Stone process waiting for operator to archive
>>>>> tranlogs or add more directories.,
>>>>>
>>>>> OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog.
>>>>>
>>>>> What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request?
>>>>>
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ========================================================================
>>>>>    Now starting GemStone monitor.
>>>>>
>>>>> Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
>>>>> No server process protection from OOM killer
>>>>> _____________________________________________________________________________
>>>>> |     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
>>>>> |_____________________________________________________________________________|
>>>>>
>>>>>    Attaching the Shared Cache using Stone name: XXX
>>>>>    Successfully started 1 free frame page servers.
>>>>>
>>>>>    -------------------------------------------------------
>>>>>    Summary of Configured Transaction Logs
>>>>>      Directory   0:
>>>>>        configured name /XXX/Sites/XXX/gemstone/data
>>>>>        expanded name /XXX/Sites/XXX/gemstone/data/
>>>>>        configuredSize 1000 MB
>>>>>      Directory   1:
>>>>>        configured name /XXX/Sites/XXX/gemstone/data
>>>>>        expanded name /XXX/Sites/XXX/gemstone/data/
>>>>>        configuredSize 1000 MB
>>>>>    -------------------------------------------------------
>>>>>
>>>>>    Extent #0
>>>>>    -----------
>>>>>    Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
>>>>>    Maximum size = NONE
>>>>>    File size = 10586 Mbytes = 677504 pages
>>>>>    Space available = 7927 Mbytes = 507353 pages
>>>>>
>>>>>    Totals
>>>>>    ------
>>>>>    Repository Size = 10586 Mbytes = 677504 pages
>>>>>    Free Space = 7927 Mbytes = 507353 pages
>>>>>    ---------------------------------------------------
>>>>>    Extent 0 was not cleanly shutdown.
>>>>>
>>>>>
>>>>>    Repository startup statistics:
>>>>>        Pages Need Reclaiming =10
>>>>>        Free Oops=133758598
>>>>>        Oop Number High Water Mark=157752737
>>>>>        Possible Dead Objects=12405000
>>>>>        Dead Objects=0
>>>>>        Epoch Transaction Count=0
>>>>>        Epoch New Objects Union=0
>>>>>        Epoch Written Objects Union=0
>>>>>        Epoch DependencyMap Objects Union=0
>>>>>
>>>>>    Repository startup is from checkpoint = (fileId 13, blockId 388232)
>>>>>
>>>>>   SearchForMostRecentLog did not find any tranlogs
>>>>>
>>>>> :: (wildcard) found in listening addresses, ignoring other addresses
>>>>> created listening socket for ::, on :: port 43405
>>>>>
>>>>>
>>>>>    Opened a transaction log file for read_nolocks.
>>>>>       filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
>>>>> EOF while reading log record.
>>>>> EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
>>>>>    Waiting for all tranlog writes to complete
>>>>>
>>>>>    Stone startup has failed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mariano
>>>>> http://marianopeck.wordpress.com
>>>>> _______________________________________________
>>>>> Glass mailing list
>>>>> [hidden email]
>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>
>> _______________________________________________
>> Glass mailing list
>> [hidden email]
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Out of space and now cannot start stone again :(

Johan Brichau-3
Hi James,

On 03 Jun 2014, at 20:32, James Foster <[hidden email]> wrote:

> I’ve been unable to reproduce this problem internally (all my efforts to truncate the log still result in the stone starting successfully—this is with 3.1.0.6 and 3.2.0). We will continue to investigate this but if this happens again and you are willing to share the extent and transaction log with us please let me know.

We are still on 2.4.4.1 for Yesplan. So it's probably not present anymore in 3.1 and 3.2
Working on upgrading to 3.1...

Thanks for your efforts!

Johan
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass