[Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
My development repository occasionally gets into a state where the session
dies and GemTools tells me "Please report to your GemStone Administrator:
Fatal Internal Error, 5015, 15621, IO Error" when running certain
Smalltalk expressions or doing a search for strings in source. Same in Topaz.

This persists across restarting the stone, I need to rebuild the extent
from scratch.

The only strange thing I did is "WAFastCGIAdaptor startOn: 8383" which I
then needed to interrupt by pressing ALT-. Thereafter any Smalltalk, even
just evaluating '1' would cause a stacktrace which I hope to reproduce
here when it happens again.

Logfile produced by using GEM_HALT_ON_ERROR = 4126 attached.



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

gemnetobject10663revelations.finworks.biz.log (31K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
Pieter,

Thanks for the report ... I've passed this along ... I'm out of the office for the day and most folks are out with their kids for "spring break" ... I'll get back to you with more information when I have more information ...

Dale


On Thu, Mar 27, 2014 at 7:00 AM, Pieter Nagel <[hidden email]> wrote:
My development repository occasionally gets into a state where the session
dies and GemTools tells me "Please report to your GemStone Administrator:
Fatal Internal Error, 5015, 15621, IO Error" when running certain
Smalltalk expressions or doing a search for strings in source. Same in Topaz.

This persists across restarting the stone, I need to rebuild the extent
from scratch.

The only strange thing I did is "WAFastCGIAdaptor startOn: 8383" which I
then needed to interrupt by pressing ALT-. Thereafter any Smalltalk, even
just evaluating '1' would cause a stacktrace which I hope to reproduce
here when it happens again.

Logfile produced by using GEM_HALT_ON_ERROR = 4126 attached.



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
In reply to this post by Pieter Nagel-3
Pieter,

In order to get to the bottom of this please run a page audit and object audit on the extent producing the IO errors and send us any error output.

after that we may need a copy of the extent which is producing the errors,
along with tranlogs.

Dale


On Thu, Mar 27, 2014 at 7:00 AM, Pieter Nagel <[hidden email]> wrote:
My development repository occasionally gets into a state where the session
dies and GemTools tells me "Please report to your GemStone Administrator:
Fatal Internal Error, 5015, 15621, IO Error" when running certain
Smalltalk expressions or doing a search for strings in source. Same in Topaz.

This persists across restarting the stone, I need to rebuild the extent
from scratch.

The only strange thing I did is "WAFastCGIAdaptor startOn: 8383" which I
then needed to interrupt by pressing ALT-. Thereafter any Smalltalk, even
just evaluating '1' would cause a stacktrace which I hope to reproduce
here when it happens again.

Logfile produced by using GEM_HALT_ON_ERROR = 4126 attached.



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
Dale

Pageaudit found errors, log attached.

Object audit fails to run because it hits the same Invalid pageKind error
that the page audit already reported.

Not sure what else I can do to run an object audit, since the documented
way to recover from the page audit is to restore from backup, at which
point the repository is in its pre-error state anyway.


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

pageaudit.log (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
These fatal errors are still going on. so far it affects only development
and staging repositories, so we can work around it, but we're growing
increasingly nervous that it might hit production as well.

To recap: repositories are becoming randomly corrupted into a state where
any meaningfull Smalltalk evaluation causes topaz to dump errors like:

RepRead Error hdr.pageId 0 != pageId
14635-----------------------------------------------------
GemStone: Error         Fatal
Please report to your GemStone Administrator:  Fatal Internal Error,
5015, 14635, IO Error
Error Category: 231169 [Internal Fatal] Number: 4126  Arg Count: 3 Context
: 20 exception : 20
Arg 1: 40122
Arg 2: 117082
Arg 3: 20

These bugs seem to be random. Rebuilding a clean development extent will
fail one time and pass the next even though there are no changes to the
code that is being filed in on top a virgin exten0.dbf in the two
situations. Sometimes reboots inbetween seem to be needed for the problem
to go away.

Any advice for getting to the bottom of this?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
Pieter,

start with a fresh extent0.seaside.dbf and
before starting stone , add
   STN_TRAN_FULL_LOGGING=true;
   STN_TRAN_LOG_DEBUG_LEVEL=3;
to the stone config file .
Then after repository gets corrupted, run pageaudit and
if errors are similar , send us the extent and the tranlogs
for analysis (if possible).  Otherwise we will want you to
run some tranlog searches for us ...


On Mon, Jul 7, 2014 at 1:47 AM, <[hidden email]> wrote:
These fatal errors are still going on. so far it affects only development
and staging repositories, so we can work around it, but we're growing
increasingly nervous that it might hit production as well.

To recap: repositories are becoming randomly corrupted into a state where
any meaningfull Smalltalk evaluation causes topaz to dump errors like:

RepRead Error hdr.pageId 0 != pageId
14635-----------------------------------------------------
GemStone: Error         Fatal
Please report to your GemStone Administrator:  Fatal Internal Error,
5015, 14635, IO Error
Error Category: 231169 [Internal Fatal] Number: 4126  Arg Count: 3 Context
: 20 exception : 20
Arg 1: 40122
Arg 2: 117082
Arg 3: 20

These bugs seem to be random. Rebuilding a clean development extent will
fail one time and pass the next even though there are no changes to the
code that is being filed in on top a virgin exten0.dbf in the two
situations. Sometimes reboots inbetween seem to be needed for the problem
to go away.

Any advice for getting to the bottom of this?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
> Pieter,
>
> start with a fresh extent0.seaside.dbf and
> before starting stone , add
>    STN_TRAN_FULL_LOGGING=true;
>    STN_TRAN_LOG_DEBUG_LEVEL=3;
> to the stone config file .
> Then after repository gets corrupted, run pageaudit and
> if errors are similar , send us the extent and the tranlogs
> for analysis (if possible).  Otherwise we will want you to
> run some tranlog searches for us ...

Will do.

But note that we are using /dev/null as STN_TRAN_LOG_DIRECTORIES, so there
will not be any tranlogs for us to send/search through. I sometimes wonder
if using /dev/null in itself may be behind this bug?

Do you recommend taking pains that STN_TRAN_LOG_DEBUG_LEVEL is *not* set
on production repositories?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
Pieter,

Recommendations from engineering:

You need to write tranlogs to actual file system if we are to
have tranlogs to debug.  Suggest reducing to
  STN_TRAN_LOG_DEBUG_LEVEL=1;
if performance with STN_TRAN_LOG_DEBUG_LEVEL=3;
is a problem.
For production we recommend
  STN_TRAN_FULL_LOGGING=true;
  STN_TRAN_LOG_DEBUG_LEVEL=0;
Suggest tranlogs written to separate file system from extents
to reduce performance impact of writing them.

You should probably start thinking of upgrading to 3.1.0.6 or 3.2.1 there are no specific bugfixes for the issue you are seeing, but when hitting bugs, it is always prudent  to be running with the latest version.

Dale


On Tue, Jul 8, 2014 at 6:14 AM, <[hidden email]> wrote:
> Pieter,
>
> start with a fresh extent0.seaside.dbf and
> before starting stone , add
>    STN_TRAN_FULL_LOGGING=true;
>    STN_TRAN_LOG_DEBUG_LEVEL=3;
> to the stone config file .
> Then after repository gets corrupted, run pageaudit and
> if errors are similar , send us the extent and the tranlogs
> for analysis (if possible).  Otherwise we will want you to
> run some tranlog searches for us ...

Will do.

But note that we are using /dev/null as STN_TRAN_LOG_DIRECTORIES, so there
will not be any tranlogs for us to send/search through. I sometimes wonder
if using /dev/null in itself may be behind this bug?

Do you recommend taking pains that STN_TRAN_LOG_DEBUG_LEVEL is *not* set
on production repositories?



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
Just an update:

So far it seems as if the bug goes away when we roll out the
STN_TRAN_LOG_DEBUG_LEVEL setting - hard to say for sure though with a
Heisenbug.

As part of setting STN_TRAN_LOG_DEBUG_LEVEL, we also have to switch
tranlogs from /dev/null to real directories.

So I strongly suspect that tranlogging to /dev/null is the root cause of
these RepRead errors.

I won't be surprised if the code paths around that are not as well tested
- I think we here are a bit unusual in the GemStone world in spending so
much time in that "mode" which other people only use occasionally to speed
up bulk imports.

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
It really seems this bug only occurs when tranlogging to /dev/null.

It's now two weeks later after switching tranlogs to real directories, and
this bug still has not resurfaced. Used to happen a handful of times a
week.

But on development test-suites etc. run about 3 times faster when
tranlogging to /dev/null.

Is there other way to get debug information on this while still
tranlogging to /dev/null?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Bill.Erickson
Setting GEM_HALT_ON_ERROR in the session's config file to the number of the error you want more diagnostic data on will cause a stack trace to be printed out to the gem log.  That would be a start...


------------------------------------------------------------------------
Bill Erickson
GemTalk Systems Engineering
15220 NW Greenbrier Parkway #240, Beaverton OR 97006
------------------------------------------------------------------------


On Mon, Jul 28, 2014 at 2:28 AM, <[hidden email]> wrote:
It really seems this bug only occurs when tranlogging to /dev/null.

It's now two weeks later after switching tranlogs to real directories, and
this bug still has not resurfaced. Used to happen a handful of times a
week.

But on development test-suites etc. run about 3 times faster when
tranlogging to /dev/null.

Is there other way to get debug information on this while still
tranlogging to /dev/null?

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
In reply to this post by Pieter Nagel-3
Pieter,

Despite your observations that /dev/null seems to be implicated in the page faults, we don't really think that it is the root cause.

We have had some bugs that we have fixed in 3.1.0.6 and 3.2.1 that might address the a problem where timing issues with stopping and restarting stones could cause corruption. If you could send us the stone logs for the old sessions when you saw the corruption we might be able to see a familiar pattern ...

If it is this problem, then /dev/null tranlogs could make things a bit worse...

Dale


On Mon, Jul 28, 2014 at 2:28 AM, <[hidden email]> wrote:
It really seems this bug only occurs when tranlogging to /dev/null.

It's now two weeks later after switching tranlogs to real directories, and
this bug still has not resurfaced. Used to happen a handful of times a
week.

But on development test-suites etc. run about 3 times faster when
tranlogging to /dev/null.

Is there other way to get debug information on this while still
tranlogging to /dev/null?



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
Dale,

We're going to upgrade to 3.1.0.6 or 3.2.1 first, and get back to you if
the bug persists.

But note that this bug only ever occured to stones that had never been
shut down at all. In other words, to a pristine extent0 that was fired up,
used to restore or had our code loaded to it, and had not yet the first
shutdown of its lifetime.

Once a stone was broken, not even stopstone could log in to it.
Conversely, any stone that could be shutdown and restarted would
thereafter be totally immune to this bug, in my experience.

So I'm not sure if the shutdown code is involved here at all.


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Dale Henrichs-3
Pieter,

You should be glad to know that based on your feedback and experience we decided to turn on /dev/null tranlogs in some of our stress tests and have indeed found some problems that could lead to the errors that you are seeing ... no schedule for when the bugfixes will show up ...

BTW the fact that you never stopped/restarted your stone kinda sealed the deal on implicating /dev/null tranlogs:) 

So thanks for your patience and help in isolating the problem ... 

Dale


On Tue, Jul 29, 2014 at 6:30 AM, <[hidden email]> wrote:
Dale,

We're going to upgrade to 3.1.0.6 or 3.2.1 first, and get back to you if
the bug persists.

But note that this bug only ever occured to stones that had never been
shut down at all. In other words, to a pristine extent0 that was fired up,
used to restore or had our code loaded to it, and had not yet the first
shutdown of its lifetime.

Once a stone was broken, not even stopstone could log in to it.
Conversely, any stone that could be shutdown and restarted would
thereafter be totally immune to this bug, in my experience.

So I'm not sure if the shutdown code is involved here at all.




_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] GS 3.1.0.5: Fatal Internal Error, 5015, 15621, IO Error (RepRead Error)

Pieter Nagel-3
Dale,

I see from the release notes that 3.2.2 fixes this.

Thanks for all the help.


> Pieter,
>
> You should be glad to know that based on your feedback and experience we
> decided to turn on /dev/null tranlogs in some of our stress tests and have
> indeed found some problems that could lead to the errors that you are
> seeing ... no schedule for when the bugfixes will show up ...
>
> BTW the fact that you never stopped/restarted your stone kinda sealed the
> deal on implicating /dev/null tranlogs:)
>
> So thanks for your patience and help in isolating the problem ...
>
> Dale


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass