I made a major mistake yesterday and had to restore from a backup copy of the extent file. Thankfully, the copy was quite recent, so I expected recovery to be more or less instant because there were so few transactions to restore.
But.. recovery took over 9 hours: Recovery waited 0.630 seconds for freeFrameCount to recover. I watched the DeadNotReclaimedObjs stat during this time (thanks Dale for pointing me to that), and I watched it slowly reclaim 65 million dead objects. Once the reclaim was done, things came alive really quickly. I'd finished a MFC earlier in the day, so apparently we had a long list of un-reclaimed objects. While the whole world was waiting for reclaim, the server appeared to be idle. Load average was about 0.01, GemStone processes were using very little CPU and doing no measurable I/O. Even after the entire 9-hour process, the reclaim gem had only accumulated 33 seconds of CPU time. During this agonizing process, I read up on tuning reclaim, and I fully expected to find that my system had a non-zero #sleepTimeBetweenReclaimMs:
... because that's what it was acting like. But no, after recovery finished, I checked and found that parameter to be 0. What else could cause reclaim-during-recovery to run so slowly? The reclaim gem didn't seem to be working any harder than it does during normal operation. I kind of appreciate that it doesn't over-tax the system normally, though but during recovery I want it to work its tail off. This is a GLASS app, so we have a small SPC (2 GB) but our extent is large (77 GB). But again, I didn't see that this work was either processor- or I/O-bound. We're running on an i2.xlarge EC2 instance with SSD storage, so we can pull all of that data through the small SPC pretty quickly. MFCs take us less than 20 minutes. -- Ken TreisMiriam Technologies, Inc. (866) 652-2040 x221 _______________________________________________ GemStone-Smalltalk mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
Ken, I'm pretty sure that there are some known bugs in this area with regards to slow recovery ... and we do have fixes for these bugs, but I'm not sure which version they have been fixed in, but if they are fixed they should be in 3.2.1....
I'll check with the guys in engineering to see if they recognize this problem, but there are some folks on vacation so I won't necessarily be getting quick responses:)
Dale On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <[hidden email]> wrote:
_______________________________________________ GemStone-Smalltalk mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
In reply to this post by Ken Treis
Ken, It is indeed a known bug: "Well observed on the impact of reclaimSleepTime. This parameter is not supposed to impact active reclaim, as Bill notes; it should only apply when reclaim is not busy and ready to go to sleep. We've reproduced the problem, however; it turns out that there is a bug that causes this to have an impact similar to sleepTimeBetweenReclaimMs, and so, significantly slows down reclaim. This has been reported as bug 44292." The bug _has_ been fixed in 3.2.1 and is queued up for release in 3.1.0.7, but we do not have a release date for 3.1.0.7. Here's a bugnote[1] describing the bug and workarounds...
Dale
On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <[hidden email]> wrote:
_______________________________________________ GemStone-Smalltalk mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
Great, thanks Dale. I'll drop the reclaimSleepTime to 1 as the workaround suggests.
-- Ken Treis Miriam Technologies, Inc. (866) 652-2040 x221 On Jul 18, 2014, at 11:20 AM, Dale Henrichs <[hidden email]> wrote:
_______________________________________________ GemStone-Smalltalk mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
Free forum by Nabble | Edit this page |