MFC Hangs in 3.1.0.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

MFC Hangs in 3.1.0.1

Ken Treis
We're running 3.1.0.1, and I've seen processes running MFC hang almost indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom finishes. I've been able to run MFC if I shut down my Zinc Server and service VMs (we have several), but obviously this isn't a good long-term solution.

My first thought was that I just wasn't running MFC at a high enough priority, but even the aggressive #fastMarkForCollection hangs on me. When it's running, I see server load averages up around 10, but after a while server load drops back to normal. At this point, none of my ctrl-C presses in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I end up hitting it with a `kill -9`.

Questions:

* Is this a known issue in 3.1.0.1?
* Is there any good way to diagnose what the cause might be?

I've been holding off on an upgrade in the hope that 3.1.0.3 will be out soon, and nothing in the release notes for 3.1.0.2 mentioned anything about this problem. But maybe my un-trained eye just missed it.

--
Ken Treis
Miriam Technologies, Inc.
(866) 652-2040 x221

Reply | Threaded
Open this post in threaded view
|

Re: MFC Hangs in 3.1.0.1

Dale Henrichs
Ken,

I'm checking into this ... I'll let you know...

Dale

----- Original Message -----
| From: "Ken Treis" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, April 9, 2013 10:05:03 AM
| Subject: [GS/SS Beta] MFC Hangs in 3.1.0.1
|
| We're running 3.1.0.1, and I've seen processes running MFC hang almost
| indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom
| finishes. I've been able to run MFC if I shut down my Zinc Server and
| service VMs (we have several), but obviously this isn't a good long-term
| solution.
|
| My first thought was that I just wasn't running MFC at a high enough
| priority, but even the aggressive #fastMarkForCollection hangs on me. When
| it's running, I see server load averages up around 10, but after a while
| server load drops back to normal. At this point, none of my ctrl-C presses
| in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I
| end up hitting it with a `kill -9`.
|
| Questions:
|
| * Is this a known issue in 3.1.0.1?
| * Is there any good way to diagnose what the cause might be?
|
| I've been holding off on an upgrade in the hope that 3.1.0.3 will be out
| soon, and nothing in the release notes for 3.1.0.2 mentioned anything about
| this problem. But maybe my un-trained eye just missed it.
|
| --
| Ken Treis
| Miriam Technologies, Inc.
| (866) 652-2040 x221
|
|
Reply | Threaded
Open this post in threaded view
|

Re: MFC Hangs in 3.1.0.1

Dale Henrichs
In reply to this post by Ken Treis
Ken,

No known problems, but it's possible that the issue is fixed in 3.1.0.3. We've got a candidate build that we've provided a customer for evaluation, so we are getting real close to a 3.1.0.3 release, but not quite there yet.

To characterize the problem we'd like you to run a statmon with 1 second samples, then fire up the mfc (normal mfc is fine) and do periodic pstacks on the topaz process ($GEMSTONE.bin/pstack <topaz pid>) until it is firmly hung.

Then send us the pstack output and the statmon file and we'll go from there.

Dale

----- Original Message -----
| From: "Ken Treis" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, April 9, 2013 10:05:03 AM
| Subject: [GS/SS Beta] MFC Hangs in 3.1.0.1
|
| We're running 3.1.0.1, and I've seen processes running MFC hang almost
| indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom
| finishes. I've been able to run MFC if I shut down my Zinc Server and
| service VMs (we have several), but obviously this isn't a good long-term
| solution.
|
| My first thought was that I just wasn't running MFC at a high enough
| priority, but even the aggressive #fastMarkForCollection hangs on me. When
| it's running, I see server load averages up around 10, but after a while
| server load drops back to normal. At this point, none of my ctrl-C presses
| in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I
| end up hitting it with a `kill -9`.
|
| Questions:
|
| * Is this a known issue in 3.1.0.1?
| * Is there any good way to diagnose what the cause might be?
|
| I've been holding off on an upgrade in the hope that 3.1.0.3 will be out
| soon, and nothing in the release notes for 3.1.0.2 mentioned anything about
| this problem. But maybe my un-trained eye just missed it.
|
| --
| Ken Treis
| Miriam Technologies, Inc.
| (866) 652-2040 x221
|
|


Reply | Threaded
Open this post in threaded view
|

Re: MFC Hangs in 3.1.0.1

Dale Henrichs
In reply to this post by Ken Treis
Ken,

We have finally been able to characterize this bug (another customer ran into it as was able to help us nail it down). Here's some information about the bug:

  ...the window for hitting this bug occurs when the MFC threads
  are advancing their view to avoid causing a commit record backlog.  So having
  the customer increase the STN_SIGNAL_ABORT_CR_BACKLOG
  (#StnSignalAbortCrBacklog) during the MFC will further reduce the likelihood
  of hitting this bug...

I understand that the 'oopHighWater grew more than expected' error that you saw is:

  ...likely a side effect of the aborts occurring too frequently...

The extra aborts would be coming from the MFC as it attempts to "avoid causing a commit record backlog".

At the moment I am not sure if we plan to try to fix this in the "soon to be released" 3.1.0.3.

We've been planning on releasing 3.1.0.3, but we have been hitting show stopper bugs (either isolated to 3.1.0.3 or found in earlier versions that needed to be fixed) that have caused us to reopen 3.1.0.3.

We are probably days from release of 3.1.0.3 unless we attempt to fix this bug or find another:)

Dale
----- Original Message -----
| From: "Ken Treis" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, April 9, 2013 10:05:03 AM
| Subject: [GS/SS Beta] MFC Hangs in 3.1.0.1
|
| We're running 3.1.0.1, and I've seen processes running MFC hang almost
| indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom
| finishes. I've been able to run MFC if I shut down my Zinc Server and
| service VMs (we have several), but obviously this isn't a good long-term
| solution.
|
| My first thought was that I just wasn't running MFC at a high enough
| priority, but even the aggressive #fastMarkForCollection hangs on me. When
| it's running, I see server load averages up around 10, but after a while
| server load drops back to normal. At this point, none of my ctrl-C presses
| in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I
| end up hitting it with a `kill -9`.
|
| Questions:
|
| * Is this a known issue in 3.1.0.1?
| * Is there any good way to diagnose what the cause might be?
|
| I've been holding off on an upgrade in the hope that 3.1.0.3 will be out
| soon, and nothing in the release notes for 3.1.0.2 mentioned anything about
| this problem. But maybe my un-trained eye just missed it.
|
| --
| Ken Treis
| Miriam Technologies, Inc.
| (866) 652-2040 x221
|
|
Reply | Threaded
Open this post in threaded view
|

Re: MFC Hangs in 3.1.0.1

Ken Treis
Good news, thanks for the update. What you've described makes sense with what I've observed: we have better odds of completing MFC if

1. There isn't much other activity going on; and
2. We run fastMarkForCollection.

Both of these would mean fewer situations where the MFC threads' view needed to be advanced, right? There are fewer commits overall, and MFC gets in-and-out faster.

I'd love it if this could be fixed in 3.1.0.3, but I promise not to be offended if it has to wait.

--
Ken Treis
Miriam Technologies, Inc.
(866) 652-2040 x221

On May 7, 2013, at 4:40 PM, Dale Henrichs wrote:

Ken,

We have finally been able to characterize this bug (another customer ran into it as was able to help us nail it down). Here's some information about the bug:

 ...the window for hitting this bug occurs when the MFC threads
 are advancing their view to avoid causing a commit record backlog.  So having
 the customer increase the STN_SIGNAL_ABORT_CR_BACKLOG
 (#StnSignalAbortCrBacklog) during the MFC will further reduce the likelihood
 of hitting this bug...

I understand that the 'oopHighWater grew more than expected' error that you saw is:

 ...likely a side effect of the aborts occurring too frequently...

The extra aborts would be coming from the MFC as it attempts to "avoid causing a commit record backlog".

At the moment I am not sure if we plan to try to fix this in the "soon to be released" 3.1.0.3.

We've been planning on releasing 3.1.0.3, but we have been hitting show stopper bugs (either isolated to 3.1.0.3 or found in earlier versions that needed to be fixed) that have caused us to reopen 3.1.0.3.

We are probably days from release of 3.1.0.3 unless we attempt to fix this bug or find another:)

Dale
----- Original Message -----
| From: "Ken Treis" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, April 9, 2013 10:05:03 AM
| Subject: [GS/SS Beta] MFC Hangs in 3.1.0.1
|
| We're running 3.1.0.1, and I've seen processes running MFC hang almost
| indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom
| finishes. I've been able to run MFC if I shut down my Zinc Server and
| service VMs (we have several), but obviously this isn't a good long-term
| solution.
|
| My first thought was that I just wasn't running MFC at a high enough
| priority, but even the aggressive #fastMarkForCollection hangs on me. When
| it's running, I see server load averages up around 10, but after a while
| server load drops back to normal. At this point, none of my ctrl-C presses
| in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I
| end up hitting it with a `kill -9`.
|
| Questions:
|
| * Is this a known issue in 3.1.0.1?
| * Is there any good way to diagnose what the cause might be?
|
| I've been holding off on an upgrade in the hope that 3.1.0.3 will be out
| soon, and nothing in the release notes for 3.1.0.2 mentioned anything about
| this problem. But maybe my un-trained eye just missed it.
|
| --
| Ken Treis
| Miriam Technologies, Inc.
| (866) 652-2040 x221
|
|

Reply | Threaded
Open this post in threaded view
|

Re: MFC Hangs in 3.1.0.1

Dale Henrichs-3
Looks like the fix will make it into 3.1.0.3....

Dale


From: "Ken Treis" <[hidden email]>
To: "GemStone Seaside beta discussion" <[hidden email]>
Sent: Wednesday, May 8, 2013 11:48:01 AM
Subject: Re: [GS/SS Beta] MFC Hangs in 3.1.0.1

Good news, thanks for the update. What you've described makes sense with what I've observed: we have better odds of completing MFC if

1. There isn't much other activity going on; and
2. We run fastMarkForCollection.

Both of these would mean fewer situations where the MFC threads' view needed to be advanced, right? There are fewer commits overall, and MFC gets in-and-out faster.

I'd love it if this could be fixed in 3.1.0.3, but I promise not to be offended if it has to wait.

--
Ken Treis
Miriam Technologies, Inc.
(866) 652-2040 x221

On May 7, 2013, at 4:40 PM, Dale Henrichs wrote:

Ken,

We have finally been able to characterize this bug (another customer ran into it as was able to help us nail it down). Here's some information about the bug:

 ...the window for hitting this bug occurs when the MFC threads
 are advancing their view to avoid causing a commit record backlog.  So having
 the customer increase the STN_SIGNAL_ABORT_CR_BACKLOG
 (#StnSignalAbortCrBacklog) during the MFC will further reduce the likelihood
 of hitting this bug...

I understand that the 'oopHighWater grew more than expected' error that you saw is:

 ...likely a side effect of the aborts occurring too frequently...

The extra aborts would be coming from the MFC as it attempts to "avoid causing a commit record backlog".

At the moment I am not sure if we plan to try to fix this in the "soon to be released" 3.1.0.3.

We've been planning on releasing 3.1.0.3, but we have been hitting show stopper bugs (either isolated to 3.1.0.3 or found in earlier versions that needed to be fixed) that have caused us to reopen 3.1.0.3.

We are probably days from release of 3.1.0.3 unless we attempt to fix this bug or find another:)

Dale
----- Original Message -----
| From: "Ken Treis" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, April 9, 2013 10:05:03 AM
| Subject: [GS/SS Beta] MFC Hangs in 3.1.0.1
|
| We're running 3.1.0.1, and I've seen processes running MFC hang almost
| indefinitely. Our maintenance gem runs MFC during off-hours, but it seldom
| finishes. I've been able to run MFC if I shut down my Zinc Server and
| service VMs (we have several), but obviously this isn't a good long-term
| solution.
|
| My first thought was that I just wasn't running MFC at a high enough
| priority, but even the aggressive #fastMarkForCollection hangs on me. When
| it's running, I see server load averages up around 10, but after a while
| server load drops back to normal. At this point, none of my ctrl-C presses
| in topaz seem to awaken MFC from its coma: soft break, hard break, panic … I
| end up hitting it with a `kill -9`.
|
| Questions:
|
| * Is this a known issue in 3.1.0.1?
| * Is there any good way to diagnose what the cause might be?
|
| I've been holding off on an upgrade in the hope that 3.1.0.3 will be out
| soon, and nothing in the release notes for 3.1.0.2 mentioned anything about
| this problem. But maybe my un-trained eye just missed it.
|
| --
| Ken Treis
| Miriam Technologies, Inc.
| (866) 652-2040 x221
|
|