Solving some Garbage Collection issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Solving some Garbage Collection issues

Carla F. Griggio
Hi everyone! (Specially Dale :P)

I've been having some garbage collection issues: I think my repository should have 1GB top of stored objects, and it's always around 2.5GB (sometimes it even gets annoyingly large, like 7GB or so). The thing is that although some Mark For Collection is done from time to time, the reclaim of dead objects always fails due to the Seaside Gems running and holding on to the repository.
So last friday during Smalltalks 2011 here in Argentina I grabbed Dale and he explained what to do about it.

So, Dale, you told me to follow this list:

1) Run the "SAFE" list from this post to clean the image: http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
2) Shut Down
3) File Size Report (I think this should actually be the first step, maybe?)
4) Start Stone
5) MFC
6) Do SystemRepository reclaimAll
7) Check again the File Size Report
8) If the file size is not what I expect, try to find references holding to what I supposed should be dead objects, and get rid of them. Then repeat from step 5?

This should be a script to be executed by a cron process during moments I know the system is not being used (for example: saturday nights).

While trying to do this, I found some unexpected problems... So here's the list I actually followed and what happened:

1) File Size Report:
File size =       10502.00 Megabytes
Space available = 8968.34 Megabytes

Note: It's funny that today the repository wasn't following the pattern I described in the beggining... If everyday I had this file size report results I wouldn't really have a problem :P But well... I still continued through the list.

2) Clean Image
ObjectLogEntry emptyLog.
 MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache ].
 MCDefinition clearInstances.
 MCMethodDefinition cachedDefinitions  removeKeys:
    (MCMethodDefinition cachedDefinitions keys).
 MCMethodDefinition shutDown.
 MethodVersionHistory uniqueInstance cleanUp.

3) Here I should have done a Shut Down of Gemstone and then start only the stone, but for some reason I couldnt. Although I stoped Gemstone, some processes were still running (admingc, reclaim, shared page monitor, etc) and apparently one of them was holding the repository, so I couldn't start the stone.
I tried rebooting the whole system, and then do everything again, but I got the same results.
What I ended up doing was rebooting, and with the whole gemstone system running OK, only shut down the seaside gems and mantainance gem.

4) MFC
5) SystemRepository reclaimAll

6) File Size Report:

File size =       10502.00 Megabytes 
Space available = 8321.94 Megabytes

Wow, you can see I then had less space available!

But I waited a little while longer and then I got these results:
  
File size =       10502.00 Megabytes
Space available = 9109.39 Megabytes


:) That's better :) But I still think the size could be a little smaller.

The admingcgem log said this:

Starting doSweepWsUnion at 11/07/11 11:56:52 ART
  Starting values: WSU size=0  PD size=1584716
  Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed 0 objects from possibleDead
[Info]: WsUnion during first sweep size = 0
[Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.  Removed 0 objs from possibleDead
[Info]: SweepWsUnion ending possibleDead size = 1584716
  Finished sweeping possible dead at 11/07/11 11:56:52 ART.
  possible dead size=1584716  notDead size=0 

And the reclaimgcgem log said this:

11/07/11 11:57:55 ART
   1 reclaims  130 pagesProcessed  130 pagesReclaimed  10 allValidPages  8 singleObjPages
   20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs  155.7 avgObjsPerPage
[Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
  reclaimMinPages = 1 Pages.


Dale, you explained something to me about the relation between possible dead size and notDead size. What was that? If notDead size = 0 it means that there's nothing more to be reclaimed? Or should possible dead size be = 0?

Gracias!

Carla.

Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Johan Brichau-2
Hi Carla,

We are experiencing similar repository growth on a continuous basis.

In a steadily used GLASS installation, we see a weekly build-up of roughly 2 to 3 gigabytes of data that is continuously being collected by the MFC but never reclaimed. This starts to put a lot more stress on the MFC cycles (we do them nightly only).

The only solution we found (and which was suggested by Dale on this list) is that we restart all seaside gems (and the maintenance gem if you have one) without restarting the stone. The MFC/reclaim cycle that follows this restart is guaranteed to clean up all of the dead objects. Somehow, the seaside gems are preventing the reclaim.

We execute this restart on a weekly basis. Sometimes, I even do a backup/restore of the stone. According to the manual, this compacts the object table. I am not sure if this backup/restore really helps, but sometimes I got the impression that MFC operations became quicker after that.

I am also not very comfortable with this behavior and, from time to time, I try to investigate the cause but I have been unsuccessful so far.
Dale also suggested the following in a previous mail on this list. I did not try this yet.

> You might try the following in your system.conf:
>
>  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>
> #=========================================================================
> # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
> #   to be cleared when voting on possible dead objects.
> #   If value is > 0 and < 100, subspaces of pom generation older
> #   than 5 minutes are cleared; the number of subspaces cleared is
> #   the specified percentage of spaces in use rounded down to an
> #   integral number of spaces.
> #   If value == 100, all subspaces of pom generation are cleared without
> #   regard to their age.
> #
> # If this value is not specified, or the specified value is out of range,
> # the default is used.
>
> You won't get a complete flush of POM objects on vote but eventually you'll flush out the older references... If you are experiencing this problem.


cheers
Johan

On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:

> Hi everyone! (Specially Dale :P)
>
> I've been having some garbage collection issues: I think my repository should have 1GB top of stored objects, and it's always around 2.5GB (sometimes it even gets annoyingly large, like 7GB or so). The thing is that although some Mark For Collection is done from time to time, the reclaim of dead objects always fails due to the Seaside Gems running and holding on to the repository.
> So last friday during Smalltalks 2011 here in Argentina I grabbed Dale and he explained what to do about it.
>
> So, Dale, you told me to follow this list:
>
> 1) Run the "SAFE" list from this post to clean the image: http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
> 2) Shut Down
> 3) File Size Report (I think this should actually be the first step, maybe?)
> 4) Start Stone
> 5) MFC
> 6) Do SystemRepository reclaimAll
> 7) Check again the File Size Report
> 8) If the file size is not what I expect, try to find references holding to what I supposed should be dead objects, and get rid of them. Then repeat from step 5?
>
> This should be a script to be executed by a cron process during moments I know the system is not being used (for example: saturday nights).
>
> While trying to do this, I found some unexpected problems... So here's the list I actually followed and what happened:
>
> 1) File Size Report:
> File size =       10502.00 Megabytes
> Space available = 8968.34 Megabytes
>
> Note: It's funny that today the repository wasn't following the pattern I described in the beggining... If everyday I had this file size report results I wouldn't really have a problem :P But well... I still continued through the list.
>
> 2) Clean Image
> ObjectLogEntry emptyLog.
>  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache ].
>  MCDefinition clearInstances.
>  MCMethodDefinition cachedDefinitions  removeKeys:
>     (MCMethodDefinition cachedDefinitions keys).
>  MCMethodDefinition shutDown.
>  MethodVersionHistory uniqueInstance cleanUp.
>
> 3) Here I should have done a Shut Down of Gemstone and then start only the stone, but for some reason I couldnt. Although I stoped Gemstone, some processes were still running (admingc, reclaim, shared page monitor, etc) and apparently one of them was holding the repository, so I couldn't start the stone.
> I tried rebooting the whole system, and then do everything again, but I got the same results.
> What I ended up doing was rebooting, and with the whole gemstone system running OK, only shut down the seaside gems and mantainance gem.
>
> 4) MFC
> 5) SystemRepository reclaimAll
>
> 6) File Size Report:
>
> File size =       10502.00 Megabytes
> Space available = 8321.94 Megabytes
>
> Wow, you can see I then had less space available!
>
> But I waited a little while longer and then I got these results:
>    
> File size =       10502.00 Megabytes
> Space available = 9109.39 Megabytes
>
> :) That's better :) But I still think the size could be a little smaller.
>
> The admingcgem log said this:
>
> Starting doSweepWsUnion at 11/07/11 11:56:52 ART
>   Starting values: WSU size=0  PD size=1584716
>   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed 0 objects from possibleDead
> [Info]: WsUnion during first sweep size = 0
> [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.  Removed 0 objs from possibleDead
> [Info]: SweepWsUnion ending possibleDead size = 1584716
>   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
>   possible dead size=1584716  notDead size=0  
>
> And the reclaimgcgem log said this:
>
> 11/07/11 11:57:55 ART
>    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10 allValidPages  8 singleObjPages
>    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs  155.7 avgObjsPerPage
> [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
>   reclaimMinPages = 1 Pages.
>
>
> Dale, you explained something to me about the relation between possible dead size and notDead size. What was that? If notDead size = 0 it means that there's nothing more to be reclaimed? Or should possible dead size be = 0?
>
> Gracias!
>
> Carla.
>

Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
Johan, Carla, et.al.,

The phenomenon that you are seeing (when shutting down seaside/mtce. gems allows full collect of dead objects) is related to how GemStone manages references to persistent objects in the vm ...

During the normal mfc process after the number of possible dead is calculated, all of the running Gems are asked to check for references to possible dead objects in their temporary object space and their perm gen space. If the gem finds references to dead objects, then the objects are voted down and the transitive closure of dead objects reachable from the voted down objects are not garbage collected.

When we investigated this problem (Issue 136 and internal Bug #40842) we found that references from POM (persistent object memory in the vm) can live for pretty long periods of time and that even setting while you can set GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE to 90 may not completely clean out old (and obsolete) references to dead objects.

The fix in GemStone 3.0 was to allow setting GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE to 100, which would completely flush POM during a vote, thus removing the stale references to dead objects ... In 2.x you can set GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE to 90, but that is not always enough to force out the ancient references.

The only other way to "flush POM" is to restart the vm.

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, November 8, 2011 2:21:43 AM
| Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
|
| Hi Carla,
|
| We are experiencing similar repository growth on a continuous basis.
|
| In a steadily used GLASS installation, we see a weekly build-up of
| roughly 2 to 3 gigabytes of data that is continuously being
| collected by the MFC but never reclaimed. This starts to put a lot
| more stress on the MFC cycles (we do them nightly only).
|
| The only solution we found (and which was suggested by Dale on this
| list) is that we restart all seaside gems (and the maintenance gem
| if you have one) without restarting the stone. The MFC/reclaim cycle
| that follows this restart is guaranteed to clean up all of the dead
| objects. Somehow, the seaside gems are preventing the reclaim.
|
| We execute this restart on a weekly basis. Sometimes, I even do a
| backup/restore of the stone. According to the manual, this compacts
| the object table. I am not sure if this backup/restore really helps,
| but sometimes I got the impression that MFC operations became
| quicker after that.
|
| I am also not very comfortable with this behavior and, from time to
| time, I try to investigate the cause but I have been unsuccessful so
| far.
| Dale also suggested the following in a previous mail on this list. I
| did not try this yet.
|
| > You might try the following in your system.conf:
| >
| >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| >
| > #=========================================================================
| > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
| > #   to be cleared when voting on possible dead objects.
| > #   If value is > 0 and < 100, subspaces of pom generation older
| > #   than 5 minutes are cleared; the number of subspaces cleared is
| > #   the specified percentage of spaces in use rounded down to an
| > #   integral number of spaces.
| > #   If value == 100, all subspaces of pom generation are cleared
| > without
| > #   regard to their age.
| > #
| > # If this value is not specified, or the specified value is out of
| > range,
| > # the default is used.
| >
| > You won't get a complete flush of POM objects on vote but
| > eventually you'll flush out the older references... If you are
| > experiencing this problem.
|
|
| cheers
| Johan
|
| On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
|
| > Hi everyone! (Specially Dale :P)
| >
| > I've been having some garbage collection issues: I think my
| > repository should have 1GB top of stored objects, and it's always
| > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
| > so). The thing is that although some Mark For Collection is done
| > from time to time, the reclaim of dead objects always fails due to
| > the Seaside Gems running and holding on to the repository.
| > So last friday during Smalltalks 2011 here in Argentina I grabbed
| > Dale and he explained what to do about it.
| >
| > So, Dale, you told me to follow this list:
| >
| > 1) Run the "SAFE" list from this post to clean the image:
| > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| > 2) Shut Down
| > 3) File Size Report (I think this should actually be the first
| > step, maybe?)
| > 4) Start Stone
| > 5) MFC
| > 6) Do SystemRepository reclaimAll
| > 7) Check again the File Size Report
| > 8) If the file size is not what I expect, try to find references
| > holding to what I supposed should be dead objects, and get rid of
| > them. Then repeat from step 5?
| >
| > This should be a script to be executed by a cron process during
| > moments I know the system is not being used (for example: saturday
| > nights).
| >
| > While trying to do this, I found some unexpected problems... So
| > here's the list I actually followed and what happened:
| >
| > 1) File Size Report:
| > File size =       10502.00 Megabytes
| > Space available = 8968.34 Megabytes
| >
| > Note: It's funny that today the repository wasn't following the
| > pattern I described in the beggining... If everyday I had this
| > file size report results I wouldn't really have a problem :P But
| > well... I still continued through the list.
| >
| > 2) Clean Image
| > ObjectLogEntry emptyLog.
| >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
| >  ].
| >  MCDefinition clearInstances.
| >  MCMethodDefinition cachedDefinitions  removeKeys:
| >     (MCMethodDefinition cachedDefinitions keys).
| >  MCMethodDefinition shutDown.
| >  MethodVersionHistory uniqueInstance cleanUp.
| >
| > 3) Here I should have done a Shut Down of Gemstone and then start
| > only the stone, but for some reason I couldnt. Although I stoped
| > Gemstone, some processes were still running (admingc, reclaim,
| > shared page monitor, etc) and apparently one of them was holding
| > the repository, so I couldn't start the stone.
| > I tried rebooting the whole system, and then do everything again,
| > but I got the same results.
| > What I ended up doing was rebooting, and with the whole gemstone
| > system running OK, only shut down the seaside gems and mantainance
| > gem.
| >
| > 4) MFC
| > 5) SystemRepository reclaimAll
| >
| > 6) File Size Report:
| >
| > File size =       10502.00 Megabytes
| > Space available = 8321.94 Megabytes
| >
| > Wow, you can see I then had less space available!
| >
| > But I waited a little while longer and then I got these results:
| >    
| > File size =       10502.00 Megabytes
| > Space available = 9109.39 Megabytes
| >
| > :) That's better :) But I still think the size could be a little
| > smaller.
| >
| > The admingcgem log said this:
| >
| > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| >   Starting values: WSU size=0  PD size=1584716
| >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
| >   0 objects from possibleDead
| > [Info]: WsUnion during first sweep size = 0
| > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
| >  Removed 0 objs from possibleDead
| > [Info]: SweepWsUnion ending possibleDead size = 1584716
| >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| >   possible dead size=1584716  notDead size=0
| >
| > And the reclaimgcgem log said this:
| >
| > 11/07/11 11:57:55 ART
| >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
| >    allValidPages  8 singleObjPages
| >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
| >     155.7 avgObjsPerPage
| > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
| >   reclaimMinPages = 1 Pages.
| >
| >
| > Dale, you explained something to me about the relation between
| > possible dead size and notDead size. What was that? If notDead
| > size = 0 it means that there's nothing more to be reclaimed? Or
| > should possible dead size be = 0?
| >
| > Gracias!
| >
| > Carla.
| >
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
In reply to this post by Carla F. Griggio
Carla,

The gc process goes through the following sequence.

1. MFC results in list of possible dead objects and the following stat gives you the size of the list:

  (System cacheStatistics: 1)
    at: (System cacheStatisticsDescription indexOf: 'PossibleDeadKobjs').

2. The gems vote on the dead objects and the know dead are put on a dead not reclaimed list. The following stat gives you the size of that list:

  (System cacheStatistics: 1)
    at: (System cacheStatisticsDescription indexOf: 'DeadNotReclaimedKobjs').

3. The reclaim gems do their job and reclaim the dead objects. The total dead reclaimed is given by the following stat:

  (System cacheStatistics: 1)
    at: (System cacheStatisticsDescription indexOf: 'DeadObjsReclaimedCount').

Soooo, the mfc prints out the number of possible dead and the voting starts. Once voting is completed the dead not reclaimed list is incremented and will approach zero as objects are reclaimed ... as long as there are possible dead or dead not reclaimed, you won't see any improvements in the free space for the repository.

When the reclaimAll command finished you should have seen DeadNotReclaimedKobjs go to zero ... The final technicality is that you need to have a checkpoint complete (after the final pages are reclaimed) before the free space can be seen ... a checkpoint is performed when the stone shuts down normally ...

Does this cover your questions?

Dale

----- Original Message -----
| From: "Carla F. Griggio" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Monday, November 7, 2011 7:40:27 AM
| Subject: [GS/SS Beta] Solving some Garbage Collection issues
|
| Hi everyone! (Specially Dale :P)
|
| I've been having some garbage collection issues: I think my
| repository should have 1GB top of stored objects, and it's always
| around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
| so). The thing is that although some Mark For Collection is done
| from time to time, the reclaim of dead objects always fails due to
| the Seaside Gems running and holding on to the repository.
| So last friday during Smalltalks 2011 here in Argentina I grabbed
| Dale and he explained what to do about it.
|
| So, Dale, you told me to follow this list:
|
| 1) Run the "SAFE" list from this post to clean the image:
| http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| 2) Shut Down
| 3) File Size Report (I think this should actually be the first step,
| maybe?)
| 4) Start Stone
| 5) MFC
| 6) Do SystemRepository reclaimAll
| 7) Check again the File Size Report
| 8) If the file size is not what I expect, try to find references
| holding to what I supposed should be dead objects, and get rid of
| them. Then repeat from step 5?
|
| This should be a script to be executed by a cron process during
| moments I know the system is not being used (for example: saturday
| nights).
|
| While trying to do this, I found some unexpected problems... So
| here's the list I actually followed and what happened:
|
| 1) File Size Report:
|
|
| File size = 10502.00 Megabytes
| Space available = 8968.34 Megabytes
|
| Note: It's funny that today the repository wasn't following the
| pattern I described in the beggining... If everyday I had this file
| size report results I wouldn't really have a problem :P But well...
| I still continued through the list.
|
| 2) Clean Image
| ObjectLogEntry emptyLog.
| MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache ].
| MCDefinition clearInstances.
| MCMethodDefinition cachedDefinitions removeKeys:
| (MCMethodDefinition cachedDefinitions keys).
| MCMethodDefinition shutDown.
| MethodVersionHistory uniqueInstance cleanUp.
|
| 3) Here I should have done a Shut Down of Gemstone and then start
| only the stone, but for some reason I couldnt. Although I stoped
| Gemstone, some processes were still running (admingc, reclaim,
| shared page monitor, etc) and apparently one of them was holding the
| repository, so I couldn't start the stone.
| I tried rebooting the whole system, and then do everything again, but
| I got the same results.
| What I ended up doing was rebooting, and with the whole gemstone
| system running OK, only shut down the seaside gems and mantainance
| gem.
|
| 4) MFC
| 5) SystemRepository reclaimAll
|
| 6) File Size Report:
|
|
|
| File size = 10502.00 Megabytes
|
| Space available = 8321.94 Megabytes
|
| Wow, you can see I then had less space available!
|
| But I waited a little while longer and then I got these results:
|
|
|
| File size = 10502.00 Megabytes
| Space available = 9109.39 Megabytes
|
| :) That's better :) But I still think the size could be a little
| smaller.
|
| The admingcgem log said this:
|
|
|
| Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| Starting values: WSU size=0 PD size=1584716
| Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed 0
| objects from possibleDead
| [Info]: WsUnion during first sweep size = 0
| [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
| Removed 0 objs from possibleDead
| [Info]: SweepWsUnion ending possibleDead size = 1584716
| Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| possible dead size=1584716 notDead size=0
|
| And the reclaimgcgem log said this:
|
|
|
| 11/07/11 11:57:55 ART
| 1 reclaims 130 pagesProcessed 130 pagesReclaimed 10 allValidPages 8
| singleObjPages
| 20243 processedObjs 14940 liveObjs 13 shadowObjs 5290 deadObjs 155.7
| avgObjsPerPage
| [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
| reclaimMinPages = 1 Pages.
|
|
| Dale, you explained something to me about the relation between
| possible dead size and notDead size . What was that? If notDead size
| = 0 it means that there's nothing more to be reclaimed? Or should
| possible dead size be = 0?
|
| Gracias !
|
| Carla.
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
In reply to this post by Johan Brichau-2
Johan,

I was curious if you have set

  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90

in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, November 8, 2011 2:21:43 AM
| Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
|
| Hi Carla,
|
| We are experiencing similar repository growth on a continuous basis.
|
| In a steadily used GLASS installation, we see a weekly build-up of
| roughly 2 to 3 gigabytes of data that is continuously being
| collected by the MFC but never reclaimed. This starts to put a lot
| more stress on the MFC cycles (we do them nightly only).
|
| The only solution we found (and which was suggested by Dale on this
| list) is that we restart all seaside gems (and the maintenance gem
| if you have one) without restarting the stone. The MFC/reclaim cycle
| that follows this restart is guaranteed to clean up all of the dead
| objects. Somehow, the seaside gems are preventing the reclaim.
|
| We execute this restart on a weekly basis. Sometimes, I even do a
| backup/restore of the stone. According to the manual, this compacts
| the object table. I am not sure if this backup/restore really helps,
| but sometimes I got the impression that MFC operations became
| quicker after that.
|
| I am also not very comfortable with this behavior and, from time to
| time, I try to investigate the cause but I have been unsuccessful so
| far.
| Dale also suggested the following in a previous mail on this list. I
| did not try this yet.
|
| > You might try the following in your system.conf:
| >
| >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| >
| > #=========================================================================
| > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
| > #   to be cleared when voting on possible dead objects.
| > #   If value is > 0 and < 100, subspaces of pom generation older
| > #   than 5 minutes are cleared; the number of subspaces cleared is
| > #   the specified percentage of spaces in use rounded down to an
| > #   integral number of spaces.
| > #   If value == 100, all subspaces of pom generation are cleared
| > without
| > #   regard to their age.
| > #
| > # If this value is not specified, or the specified value is out of
| > range,
| > # the default is used.
| >
| > You won't get a complete flush of POM objects on vote but
| > eventually you'll flush out the older references... If you are
| > experiencing this problem.
|
|
| cheers
| Johan
|
| On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
|
| > Hi everyone! (Specially Dale :P)
| >
| > I've been having some garbage collection issues: I think my
| > repository should have 1GB top of stored objects, and it's always
| > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
| > so). The thing is that although some Mark For Collection is done
| > from time to time, the reclaim of dead objects always fails due to
| > the Seaside Gems running and holding on to the repository.
| > So last friday during Smalltalks 2011 here in Argentina I grabbed
| > Dale and he explained what to do about it.
| >
| > So, Dale, you told me to follow this list:
| >
| > 1) Run the "SAFE" list from this post to clean the image:
| > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| > 2) Shut Down
| > 3) File Size Report (I think this should actually be the first
| > step, maybe?)
| > 4) Start Stone
| > 5) MFC
| > 6) Do SystemRepository reclaimAll
| > 7) Check again the File Size Report
| > 8) If the file size is not what I expect, try to find references
| > holding to what I supposed should be dead objects, and get rid of
| > them. Then repeat from step 5?
| >
| > This should be a script to be executed by a cron process during
| > moments I know the system is not being used (for example: saturday
| > nights).
| >
| > While trying to do this, I found some unexpected problems... So
| > here's the list I actually followed and what happened:
| >
| > 1) File Size Report:
| > File size =       10502.00 Megabytes
| > Space available = 8968.34 Megabytes
| >
| > Note: It's funny that today the repository wasn't following the
| > pattern I described in the beggining... If everyday I had this
| > file size report results I wouldn't really have a problem :P But
| > well... I still continued through the list.
| >
| > 2) Clean Image
| > ObjectLogEntry emptyLog.
| >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
| >  ].
| >  MCDefinition clearInstances.
| >  MCMethodDefinition cachedDefinitions  removeKeys:
| >     (MCMethodDefinition cachedDefinitions keys).
| >  MCMethodDefinition shutDown.
| >  MethodVersionHistory uniqueInstance cleanUp.
| >
| > 3) Here I should have done a Shut Down of Gemstone and then start
| > only the stone, but for some reason I couldnt. Although I stoped
| > Gemstone, some processes were still running (admingc, reclaim,
| > shared page monitor, etc) and apparently one of them was holding
| > the repository, so I couldn't start the stone.
| > I tried rebooting the whole system, and then do everything again,
| > but I got the same results.
| > What I ended up doing was rebooting, and with the whole gemstone
| > system running OK, only shut down the seaside gems and mantainance
| > gem.
| >
| > 4) MFC
| > 5) SystemRepository reclaimAll
| >
| > 6) File Size Report:
| >
| > File size =       10502.00 Megabytes
| > Space available = 8321.94 Megabytes
| >
| > Wow, you can see I then had less space available!
| >
| > But I waited a little while longer and then I got these results:
| >    
| > File size =       10502.00 Megabytes
| > Space available = 9109.39 Megabytes
| >
| > :) That's better :) But I still think the size could be a little
| > smaller.
| >
| > The admingcgem log said this:
| >
| > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| >   Starting values: WSU size=0  PD size=1584716
| >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
| >   0 objects from possibleDead
| > [Info]: WsUnion during first sweep size = 0
| > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
| >  Removed 0 objs from possibleDead
| > [Info]: SweepWsUnion ending possibleDead size = 1584716
| >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| >   possible dead size=1584716  notDead size=0
| >
| > And the reclaimgcgem log said this:
| >
| > 11/07/11 11:57:55 ART
| >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
| >    allValidPages  8 singleObjPages
| >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
| >     155.7 avgObjsPerPage
| > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
| >   reclaimMinPages = 1 Pages.
| >
| >
| > Dale, you explained something to me about the relation between
| > possible dead size and notDead size. What was that? If notDead
| > size = 0 it means that there's nothing more to be reclaimed? Or
| > should possible dead size be = 0?
| >
| > Gracias!
| >
| > Carla.
| >
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Johan Brichau-2
Hi Dale,

I just now added them to the configs and will let you know in a couple of days if it worked ;-)


On 10 Nov 2011, at 18:16, Dale Henrichs wrote:

> Johan,
>
> I was curious if you have set
>
>  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>
> in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, November 8, 2011 2:21:43 AM
> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
> |
> | Hi Carla,
> |
> | We are experiencing similar repository growth on a continuous basis.
> |
> | In a steadily used GLASS installation, we see a weekly build-up of
> | roughly 2 to 3 gigabytes of data that is continuously being
> | collected by the MFC but never reclaimed. This starts to put a lot
> | more stress on the MFC cycles (we do them nightly only).
> |
> | The only solution we found (and which was suggested by Dale on this
> | list) is that we restart all seaside gems (and the maintenance gem
> | if you have one) without restarting the stone. The MFC/reclaim cycle
> | that follows this restart is guaranteed to clean up all of the dead
> | objects. Somehow, the seaside gems are preventing the reclaim.
> |
> | We execute this restart on a weekly basis. Sometimes, I even do a
> | backup/restore of the stone. According to the manual, this compacts
> | the object table. I am not sure if this backup/restore really helps,
> | but sometimes I got the impression that MFC operations became
> | quicker after that.
> |
> | I am also not very comfortable with this behavior and, from time to
> | time, I try to investigate the cause but I have been unsuccessful so
> | far.
> | Dale also suggested the following in a previous mail on this list. I
> | did not try this yet.
> |
> | > You might try the following in your system.conf:
> | >
> | >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> | >
> | > #=========================================================================
> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
> | > #   to be cleared when voting on possible dead objects.
> | > #   If value is > 0 and < 100, subspaces of pom generation older
> | > #   than 5 minutes are cleared; the number of subspaces cleared is
> | > #   the specified percentage of spaces in use rounded down to an
> | > #   integral number of spaces.
> | > #   If value == 100, all subspaces of pom generation are cleared
> | > without
> | > #   regard to their age.
> | > #
> | > # If this value is not specified, or the specified value is out of
> | > range,
> | > # the default is used.
> | >
> | > You won't get a complete flush of POM objects on vote but
> | > eventually you'll flush out the older references... If you are
> | > experiencing this problem.
> |
> |
> | cheers
> | Johan
> |
> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
> |
> | > Hi everyone! (Specially Dale :P)
> | >
> | > I've been having some garbage collection issues: I think my
> | > repository should have 1GB top of stored objects, and it's always
> | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
> | > so). The thing is that although some Mark For Collection is done
> | > from time to time, the reclaim of dead objects always fails due to
> | > the Seaside Gems running and holding on to the repository.
> | > So last friday during Smalltalks 2011 here in Argentina I grabbed
> | > Dale and he explained what to do about it.
> | >
> | > So, Dale, you told me to follow this list:
> | >
> | > 1) Run the "SAFE" list from this post to clean the image:
> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
> | > 2) Shut Down
> | > 3) File Size Report (I think this should actually be the first
> | > step, maybe?)
> | > 4) Start Stone
> | > 5) MFC
> | > 6) Do SystemRepository reclaimAll
> | > 7) Check again the File Size Report
> | > 8) If the file size is not what I expect, try to find references
> | > holding to what I supposed should be dead objects, and get rid of
> | > them. Then repeat from step 5?
> | >
> | > This should be a script to be executed by a cron process during
> | > moments I know the system is not being used (for example: saturday
> | > nights).
> | >
> | > While trying to do this, I found some unexpected problems... So
> | > here's the list I actually followed and what happened:
> | >
> | > 1) File Size Report:
> | > File size =       10502.00 Megabytes
> | > Space available = 8968.34 Megabytes
> | >
> | > Note: It's funny that today the repository wasn't following the
> | > pattern I described in the beggining... If everyday I had this
> | > file size report results I wouldn't really have a problem :P But
> | > well... I still continued through the list.
> | >
> | > 2) Clean Image
> | > ObjectLogEntry emptyLog.
> | >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
> | >  ].
> | >  MCDefinition clearInstances.
> | >  MCMethodDefinition cachedDefinitions  removeKeys:
> | >     (MCMethodDefinition cachedDefinitions keys).
> | >  MCMethodDefinition shutDown.
> | >  MethodVersionHistory uniqueInstance cleanUp.
> | >
> | > 3) Here I should have done a Shut Down of Gemstone and then start
> | > only the stone, but for some reason I couldnt. Although I stoped
> | > Gemstone, some processes were still running (admingc, reclaim,
> | > shared page monitor, etc) and apparently one of them was holding
> | > the repository, so I couldn't start the stone.
> | > I tried rebooting the whole system, and then do everything again,
> | > but I got the same results.
> | > What I ended up doing was rebooting, and with the whole gemstone
> | > system running OK, only shut down the seaside gems and mantainance
> | > gem.
> | >
> | > 4) MFC
> | > 5) SystemRepository reclaimAll
> | >
> | > 6) File Size Report:
> | >
> | > File size =       10502.00 Megabytes
> | > Space available = 8321.94 Megabytes
> | >
> | > Wow, you can see I then had less space available!
> | >
> | > But I waited a little while longer and then I got these results:
> | >    
> | > File size =       10502.00 Megabytes
> | > Space available = 9109.39 Megabytes
> | >
> | > :) That's better :) But I still think the size could be a little
> | > smaller.
> | >
> | > The admingcgem log said this:
> | >
> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
> | >   Starting values: WSU size=0  PD size=1584716
> | >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
> | >   0 objects from possibleDead
> | > [Info]: WsUnion during first sweep size = 0
> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
> | >  Removed 0 objs from possibleDead
> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
> | >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
> | >   possible dead size=1584716  notDead size=0
> | >
> | > And the reclaimgcgem log said this:
> | >
> | > 11/07/11 11:57:55 ART
> | >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
> | >    allValidPages  8 singleObjPages
> | >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
> | >     155.7 avgObjsPerPage
> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
> | >   reclaimMinPages = 1 Pages.
> | >
> | >
> | > Dale, you explained something to me about the relation between
> | > possible dead size and notDead size. What was that? If notDead
> | > size = 0 it means that there's nothing more to be reclaimed? Or
> | > should possible dead size be = 0?
> | >
> | > Gracias!
> | >
> | > Carla.
> | >
> |
> |

Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Carla F. Griggio
Thanks Dale, that covers my questions :P
And Johan, thanks for the tip also.



On Thu, Nov 10, 2011 at 5:27 PM, Johan Brichau <[hidden email]> wrote:
Hi Dale,

I just now added them to the configs and will let you know in a couple of days if it worked ;-)


On 10 Nov 2011, at 18:16, Dale Henrichs wrote:

> Johan,
>
> I was curious if you have set
>
>  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>
> in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, November 8, 2011 2:21:43 AM
> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
> |
> | Hi Carla,
> |
> | We are experiencing similar repository growth on a continuous basis.
> |
> | In a steadily used GLASS installation, we see a weekly build-up of
> | roughly 2 to 3 gigabytes of data that is continuously being
> | collected by the MFC but never reclaimed. This starts to put a lot
> | more stress on the MFC cycles (we do them nightly only).
> |
> | The only solution we found (and which was suggested by Dale on this
> | list) is that we restart all seaside gems (and the maintenance gem
> | if you have one) without restarting the stone. The MFC/reclaim cycle
> | that follows this restart is guaranteed to clean up all of the dead
> | objects. Somehow, the seaside gems are preventing the reclaim.
> |
> | We execute this restart on a weekly basis. Sometimes, I even do a
> | backup/restore of the stone. According to the manual, this compacts
> | the object table. I am not sure if this backup/restore really helps,
> | but sometimes I got the impression that MFC operations became
> | quicker after that.
> |
> | I am also not very comfortable with this behavior and, from time to
> | time, I try to investigate the cause but I have been unsuccessful so
> | far.
> | Dale also suggested the following in a previous mail on this list. I
> | did not try this yet.
> |
> | > You might try the following in your system.conf:
> | >
> | >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> | >
> | > #=========================================================================
> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
> | > #   to be cleared when voting on possible dead objects.
> | > #   If value is > 0 and < 100, subspaces of pom generation older
> | > #   than 5 minutes are cleared; the number of subspaces cleared is
> | > #   the specified percentage of spaces in use rounded down to an
> | > #   integral number of spaces.
> | > #   If value == 100, all subspaces of pom generation are cleared
> | > without
> | > #   regard to their age.
> | > #
> | > # If this value is not specified, or the specified value is out of
> | > range,
> | > # the default is used.
> | >
> | > You won't get a complete flush of POM objects on vote but
> | > eventually you'll flush out the older references... If you are
> | > experiencing this problem.
> |
> |
> | cheers
> | Johan
> |
> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
> |
> | > Hi everyone! (Specially Dale :P)
> | >
> | > I've been having some garbage collection issues: I think my
> | > repository should have 1GB top of stored objects, and it's always
> | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
> | > so). The thing is that although some Mark For Collection is done
> | > from time to time, the reclaim of dead objects always fails due to
> | > the Seaside Gems running and holding on to the repository.
> | > So last friday during Smalltalks 2011 here in Argentina I grabbed
> | > Dale and he explained what to do about it.
> | >
> | > So, Dale, you told me to follow this list:
> | >
> | > 1) Run the "SAFE" list from this post to clean the image:
> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
> | > 2) Shut Down
> | > 3) File Size Report (I think this should actually be the first
> | > step, maybe?)
> | > 4) Start Stone
> | > 5) MFC
> | > 6) Do SystemRepository reclaimAll
> | > 7) Check again the File Size Report
> | > 8) If the file size is not what I expect, try to find references
> | > holding to what I supposed should be dead objects, and get rid of
> | > them. Then repeat from step 5?
> | >
> | > This should be a script to be executed by a cron process during
> | > moments I know the system is not being used (for example: saturday
> | > nights).
> | >
> | > While trying to do this, I found some unexpected problems... So
> | > here's the list I actually followed and what happened:
> | >
> | > 1) File Size Report:
> | > File size =       10502.00 Megabytes
> | > Space available = 8968.34 Megabytes
> | >
> | > Note: It's funny that today the repository wasn't following the
> | > pattern I described in the beggining... If everyday I had this
> | > file size report results I wouldn't really have a problem :P But
> | > well... I still continued through the list.
> | >
> | > 2) Clean Image
> | > ObjectLogEntry emptyLog.
> | >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
> | >  ].
> | >  MCDefinition clearInstances.
> | >  MCMethodDefinition cachedDefinitions  removeKeys:
> | >     (MCMethodDefinition cachedDefinitions keys).
> | >  MCMethodDefinition shutDown.
> | >  MethodVersionHistory uniqueInstance cleanUp.
> | >
> | > 3) Here I should have done a Shut Down of Gemstone and then start
> | > only the stone, but for some reason I couldnt. Although I stoped
> | > Gemstone, some processes were still running (admingc, reclaim,
> | > shared page monitor, etc) and apparently one of them was holding
> | > the repository, so I couldn't start the stone.
> | > I tried rebooting the whole system, and then do everything again,
> | > but I got the same results.
> | > What I ended up doing was rebooting, and with the whole gemstone
> | > system running OK, only shut down the seaside gems and mantainance
> | > gem.
> | >
> | > 4) MFC
> | > 5) SystemRepository reclaimAll
> | >
> | > 6) File Size Report:
> | >
> | > File size =       10502.00 Megabytes
> | > Space available = 8321.94 Megabytes
> | >
> | > Wow, you can see I then had less space available!
> | >
> | > But I waited a little while longer and then I got these results:
> | >
> | > File size =       10502.00 Megabytes
> | > Space available = 9109.39 Megabytes
> | >
> | > :) That's better :) But I still think the size could be a little
> | > smaller.
> | >
> | > The admingcgem log said this:
> | >
> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
> | >   Starting values: WSU size=0  PD size=1584716
> | >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
> | >   0 objects from possibleDead
> | > [Info]: WsUnion during first sweep size = 0
> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
> | >  Removed 0 objs from possibleDead
> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
> | >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
> | >   possible dead size=1584716  notDead size=0
> | >
> | > And the reclaimgcgem log said this:
> | >
> | > 11/07/11 11:57:55 ART
> | >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
> | >    allValidPages  8 singleObjPages
> | >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
> | >     155.7 avgObjsPerPage
> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
> | >   reclaimMinPages = 1 Pages.
> | >
> | >
> | > Dale, you explained something to me about the relation between
> | > possible dead size and notDead size. What was that? If notDead
> | > size = 0 it means that there's nothing more to be reclaimed? Or
> | > should possible dead size be = 0?
> | >
> | > Gracias!
> | >
> | > Carla.
> | >
> |
> |


Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

NorbertHartl
Dale,

can you explain what the POM is used for? The basic retrieval of persistent objects is via shared memory from the shared page cache, right? So no extra memory for a gem to use. If there is some xxx-on-write strategy that creates new versions of the objects that have been altered is the POM used for this? If yes then would be roughly be the equivalent of a transaction. But then I don't understand why there is anything kept for some time.

thanks in advance,

Norbert

Am 14.11.2011 um 16:45 schrieb Carla F. Griggio:

Thanks Dale, that covers my questions :P
And Johan, thanks for the tip also.



On Thu, Nov 10, 2011 at 5:27 PM, Johan Brichau <[hidden email]> wrote:
Hi Dale,

I just now added them to the configs and will let you know in a couple of days if it worked ;-)


On 10 Nov 2011, at 18:16, Dale Henrichs wrote:

> Johan,
>
> I was curious if you have set
>
>  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>
> in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, November 8, 2011 2:21:43 AM
> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
> |
> | Hi Carla,
> |
> | We are experiencing similar repository growth on a continuous basis.
> |
> | In a steadily used GLASS installation, we see a weekly build-up of
> | roughly 2 to 3 gigabytes of data that is continuously being
> | collected by the MFC but never reclaimed. This starts to put a lot
> | more stress on the MFC cycles (we do them nightly only).
> |
> | The only solution we found (and which was suggested by Dale on this
> | list) is that we restart all seaside gems (and the maintenance gem
> | if you have one) without restarting the stone. The MFC/reclaim cycle
> | that follows this restart is guaranteed to clean up all of the dead
> | objects. Somehow, the seaside gems are preventing the reclaim.
> |
> | We execute this restart on a weekly basis. Sometimes, I even do a
> | backup/restore of the stone. According to the manual, this compacts
> | the object table. I am not sure if this backup/restore really helps,
> | but sometimes I got the impression that MFC operations became
> | quicker after that.
> |
> | I am also not very comfortable with this behavior and, from time to
> | time, I try to investigate the cause but I have been unsuccessful so
> | far.
> | Dale also suggested the following in a previous mail on this list. I
> | did not try this yet.
> |
> | > You might try the following in your system.conf:
> | >
> | >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> | >
> | > #=========================================================================
> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
> | > #   to be cleared when voting on possible dead objects.
> | > #   If value is > 0 and < 100, subspaces of pom generation older
> | > #   than 5 minutes are cleared; the number of subspaces cleared is
> | > #   the specified percentage of spaces in use rounded down to an
> | > #   integral number of spaces.
> | > #   If value == 100, all subspaces of pom generation are cleared
> | > without
> | > #   regard to their age.
> | > #
> | > # If this value is not specified, or the specified value is out of
> | > range,
> | > # the default is used.
> | >
> | > You won't get a complete flush of POM objects on vote but
> | > eventually you'll flush out the older references... If you are
> | > experiencing this problem.
> |
> |
> | cheers
> | Johan
> |
> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
> |
> | > Hi everyone! (Specially Dale :P)
> | >
> | > I've been having some garbage collection issues: I think my
> | > repository should have 1GB top of stored objects, and it's always
> | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
> | > so). The thing is that although some Mark For Collection is done
> | > from time to time, the reclaim of dead objects always fails due to
> | > the Seaside Gems running and holding on to the repository.
> | > So last friday during Smalltalks 2011 here in Argentina I grabbed
> | > Dale and he explained what to do about it.
> | >
> | > So, Dale, you told me to follow this list:
> | >
> | > 1) Run the "SAFE" list from this post to clean the image:
> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
> | > 2) Shut Down
> | > 3) File Size Report (I think this should actually be the first
> | > step, maybe?)
> | > 4) Start Stone
> | > 5) MFC
> | > 6) Do SystemRepository reclaimAll
> | > 7) Check again the File Size Report
> | > 8) If the file size is not what I expect, try to find references
> | > holding to what I supposed should be dead objects, and get rid of
> | > them. Then repeat from step 5?
> | >
> | > This should be a script to be executed by a cron process during
> | > moments I know the system is not being used (for example: saturday
> | > nights).
> | >
> | > While trying to do this, I found some unexpected problems... So
> | > here's the list I actually followed and what happened:
> | >
> | > 1) File Size Report:
> | > File size =       10502.00 Megabytes
> | > Space available = 8968.34 Megabytes
> | >
> | > Note: It's funny that today the repository wasn't following the
> | > pattern I described in the beggining... If everyday I had this
> | > file size report results I wouldn't really have a problem :P But
> | > well... I still continued through the list.
> | >
> | > 2) Clean Image
> | > ObjectLogEntry emptyLog.
> | >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
> | >  ].
> | >  MCDefinition clearInstances.
> | >  MCMethodDefinition cachedDefinitions  removeKeys:
> | >     (MCMethodDefinition cachedDefinitions keys).
> | >  MCMethodDefinition shutDown.
> | >  MethodVersionHistory uniqueInstance cleanUp.
> | >
> | > 3) Here I should have done a Shut Down of Gemstone and then start
> | > only the stone, but for some reason I couldnt. Although I stoped
> | > Gemstone, some processes were still running (admingc, reclaim,
> | > shared page monitor, etc) and apparently one of them was holding
> | > the repository, so I couldn't start the stone.
> | > I tried rebooting the whole system, and then do everything again,
> | > but I got the same results.
> | > What I ended up doing was rebooting, and with the whole gemstone
> | > system running OK, only shut down the seaside gems and mantainance
> | > gem.
> | >
> | > 4) MFC
> | > 5) SystemRepository reclaimAll
> | >
> | > 6) File Size Report:
> | >
> | > File size =       10502.00 Megabytes
> | > Space available = 8321.94 Megabytes
> | >
> | > Wow, you can see I then had less space available!
> | >
> | > But I waited a little while longer and then I got these results:
> | >
> | > File size =       10502.00 Megabytes
> | > Space available = 9109.39 Megabytes
> | >
> | > :) That's better :) But I still think the size could be a little
> | > smaller.
> | >
> | > The admingcgem log said this:
> | >
> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
> | >   Starting values: WSU size=0  PD size=1584716
> | >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
> | >   0 objects from possibleDead
> | > [Info]: WsUnion during first sweep size = 0
> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
> | >  Removed 0 objs from possibleDead
> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
> | >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
> | >   possible dead size=1584716  notDead size=0
> | >
> | > And the reclaimgcgem log said this:
> | >
> | > 11/07/11 11:57:55 ART
> | >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
> | >    allValidPages  8 singleObjPages
> | >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
> | >     155.7 avgObjsPerPage
> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
> | >   reclaimMinPages = 1 Pages.
> | >
> | >
> | > Dale, you explained something to me about the relation between
> | > possible dead size and notDead size. What was that? If notDead
> | > size = 0 it means that there's nothing more to be reclaimed? Or
> | > should possible dead size be = 0?
> | >
> | > Gracias!
> | >
> | > Carla.
> | >
> |
> |



Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
Norbert,

In the 32 bit product, the vm referenced the persistent objects directly in the SPC, so no GEM memory was used for non-modified persistent objects.

In the 64 bit product, the vm copies persistent objects from the SPC into POM space _on reference_. So each vm has copies of the objects in its memory space. If a persistent object is modified it is copied to TEMPOBJ space. So there can be references to "stale" persistent objects.

There are all kinds of performance advantages for using copy on read...at the cost of memory.

Dale

----- Original Message -----
| From: "Norbert Hartl" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Thursday, November 17, 2011 2:09:55 AM
| Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
|
| Dale,
|
|
| can you explain what the POM is used for? The basic retrieval of
| persistent objects is via shared memory from the shared page cache,
| right? So no extra memory for a gem to use. If there is some
| xxx-on-write strategy that creates new versions of the objects that
| have been altered is the POM used for this? If yes then would be
| roughly be the equivalent of a transaction. But then I don't
| understand why there is anything kept for some time.
|
|
| thanks in advance,
|
|
| Norbert
|
|
|
| Am 14.11.2011 um 16:45 schrieb Carla F. Griggio:
|
|
| Thanks Dale, that covers my questions :P
| And Johan, thanks for the tip also.
|
|
|
|
|
| On Thu, Nov 10, 2011 at 5:27 PM, Johan Brichau < [hidden email] >
| wrote:
|
|
| Hi Dale,
|
| I just now added them to the configs and will let you know in a
| couple of days if it worked ;-)
|
|
|
|
|
| On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
|
| > Johan,
| >
| > I was curious if you have set
| >
| > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| >
| > in your config files ... I'm running with this setting for SS3 and
| > I am not seeing unreasonable growth (I see a steady state number
| > of objects voted down). Either the
| > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90%
| > flushed) or the character of the data structures involved are
| > different or we are looking at a different issue ...
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Johan Brichau" < [hidden email] >
| > | To: "GemStone Seaside beta discussion" <
| > | [hidden email] >
| > | Sent: Tuesday, November 8, 2011 2:21:43 AM
| > | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
| > |
| > | Hi Carla,
| > |
| > | We are experiencing similar repository growth on a continuous
| > | basis.
| > |
| > | In a steadily used GLASS installation, we see a weekly build-up
| > | of
| > | roughly 2 to 3 gigabytes of data that is continuously being
| > | collected by the MFC but never reclaimed. This starts to put a
| > | lot
| > | more stress on the MFC cycles (we do them nightly only).
| > |
| > | The only solution we found (and which was suggested by Dale on
| > | this
| > | list) is that we restart all seaside gems (and the maintenance
| > | gem
| > | if you have one) without restarting the stone. The MFC/reclaim
| > | cycle
| > | that follows this restart is guaranteed to clean up all of the
| > | dead
| > | objects. Somehow, the seaside gems are preventing the reclaim.
| > |
| > | We execute this restart on a weekly basis. Sometimes, I even do a
| > | backup/restore of the stone. According to the manual, this
| > | compacts
| > | the object table. I am not sure if this backup/restore really
| > | helps,
| > | but sometimes I got the impression that MFC operations became
| > | quicker after that.
| > |
| > | I am also not very comfortable with this behavior and, from time
| > | to
| > | time, I try to investigate the cause but I have been unsuccessful
| > | so
| > | far.
| > | Dale also suggested the following in a previous mail on this
| > | list. I
| > | did not try this yet.
| > |
| > | > You might try the following in your system.conf:
| > | >
| > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| > | >
| > | > #=========================================================================
| > | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation
| > | > area
| > | > # to be cleared when voting on possible dead objects.
| > | > # If value is > 0 and < 100, subspaces of pom generation older
| > | > # than 5 minutes are cleared; the number of subspaces cleared
| > | > is
| > | > # the specified percentage of spaces in use rounded down to an
| > | > # integral number of spaces.
| > | > # If value == 100, all subspaces of pom generation are cleared
| > | > without
| > | > # regard to their age.
| > | > #
| > | > # If this value is not specified, or the specified value is out
| > | > of
| > | > range,
| > | > # the default is used.
| > | >
| > | > You won't get a complete flush of POM objects on vote but
| > | > eventually you'll flush out the older references... If you are
| > | > experiencing this problem.
| > |
| > |
| > | cheers
| > | Johan
| > |
| > | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
| > |
| > | > Hi everyone! (Specially Dale :P)
| > | >
| > | > I've been having some garbage collection issues: I think my
| > | > repository should have 1GB top of stored objects, and it's
| > | > always
| > | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB
| > | > or
| > | > so). The thing is that although some Mark For Collection is
| > | > done
| > | > from time to time, the reclaim of dead objects always fails due
| > | > to
| > | > the Seaside Gems running and holding on to the repository.
| > | > So last friday during Smalltalks 2011 here in Argentina I
| > | > grabbed
| > | > Dale and he explained what to do about it.
| > | >
| > | > So, Dale, you told me to follow this list:
| > | >
| > | > 1) Run the "SAFE" list from this post to clean the image:
| > | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| > | > 2) Shut Down
| > | > 3) File Size Report (I think this should actually be the first
| > | > step, maybe?)
| > | > 4) Start Stone
| > | > 5) MFC
| > | > 6) Do SystemRepository reclaimAll
| > | > 7) Check again the File Size Report
| > | > 8) If the file size is not what I expect, try to find
| > | > references
| > | > holding to what I supposed should be dead objects, and get rid
| > | > of
| > | > them. Then repeat from step 5?
| > | >
| > | > This should be a script to be executed by a cron process during
| > | > moments I know the system is not being used (for example:
| > | > saturday
| > | > nights).
| > | >
| > | > While trying to do this, I found some unexpected problems... So
| > | > here's the list I actually followed and what happened:
| > | >
| > | > 1) File Size Report:
| > | > File size = 10502.00 Megabytes
| > | > Space available = 8968.34 Megabytes
| > | >
| > | > Note: It's funny that today the repository wasn't following the
| > | > pattern I described in the beggining... If everyday I had this
| > | > file size report results I wouldn't really have a problem :P
| > | > But
| > | > well... I still continued through the list.
| > | >
| > | > 2) Clean Image
| > | > ObjectLogEntry emptyLog.
| > | > MCRepositoryGroup default repositoriesDo: [:rep | rep
| > | > flushCache
| > | > ].
| > | > MCDefinition clearInstances.
| > | > MCMethodDefinition cachedDefinitions removeKeys:
| > | > (MCMethodDefinition cachedDefinitions keys).
| > | > MCMethodDefinition shutDown.
| > | > MethodVersionHistory uniqueInstance cleanUp.
| > | >
| > | > 3) Here I should have done a Shut Down of Gemstone and then
| > | > start
| > | > only the stone, but for some reason I couldnt. Although I
| > | > stoped
| > | > Gemstone, some processes were still running (admingc, reclaim,
| > | > shared page monitor, etc) and apparently one of them was
| > | > holding
| > | > the repository, so I couldn't start the stone.
| > | > I tried rebooting the whole system, and then do everything
| > | > again,
| > | > but I got the same results.
| > | > What I ended up doing was rebooting, and with the whole
| > | > gemstone
| > | > system running OK, only shut down the seaside gems and
| > | > mantainance
| > | > gem.
| > | >
| > | > 4) MFC
| > | > 5) SystemRepository reclaimAll
| > | >
| > | > 6) File Size Report:
| > | >
| > | > File size = 10502.00 Megabytes
| > | > Space available = 8321.94 Megabytes
| > | >
| > | > Wow, you can see I then had less space available!
| > | >
| > | > But I waited a little while longer and then I got these
| > | > results:
| > | >
| > | > File size = 10502.00 Megabytes
| > | > Space available = 9109.39 Megabytes
| > | >
| > | > :) That's better :) But I still think the size could be a
| > | > little
| > | > smaller.
| > | >
| > | > The admingcgem log said this:
| > | >
| > | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| > | > Starting values: WSU size=0 PD size=1584716
| > | > Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART.
| > | > Removed
| > | > 0 objects from possibleDead
| > | > [Info]: WsUnion during first sweep size = 0
| > | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52
| > | > ART.
| > | > Removed 0 objs from possibleDead
| > | > [Info]: SweepWsUnion ending possibleDead size = 1584716
| > | > Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| > | > possible dead size=1584716 notDead size=0
| > | >
| > | > And the reclaimgcgem log said this:
| > | >
| > | > 11/07/11 11:57:55 ART
| > | > 1 reclaims 130 pagesProcessed 130 pagesReclaimed 10
| > | > allValidPages 8 singleObjPages
| > | > 20243 processedObjs 14940 liveObjs 13 shadowObjs 5290 deadObjs
| > | > 155.7 avgObjsPerPage
| > | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
| > | > reclaimMinPages = 1 Pages.
| > | >
| > | >
| > | > Dale, you explained something to me about the relation between
| > | > possible dead size and notDead size. What was that? If notDead
| > | > size = 0 it means that there's nothing more to be reclaimed? Or
| > | > should possible dead size be = 0?
| > | >
| > | > Gracias!
| > | >
| > | > Carla.
| > | >
| > |
| > |
|
|
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

NorbertHartl

Am 17.11.2011 um 17:56 schrieb Dale Henrichs:

> Norbert,
>
> In the 32 bit product, the vm referenced the persistent objects directly in the SPC, so no GEM memory was used for non-modified persistent objects.
>
> In the 64 bit product, the vm copies persistent objects from the SPC into POM space _on reference_. So each vm has copies of the objects in its memory space. If a persistent object is modified it is copied to TEMPOBJ space. So there can be references to "stale" persistent objects.
>
But temp obj memory is flushed if a transaction is committed, right?

> There are all kinds of performance advantages for using copy on read...at the cost of memory.
>
There is a big amount of possibilities for optimizing things. And there is another amount which similar but slightly bigger and that's for creating problems ;)

Norbert

> Dale
>
> ----- Original Message -----
> | From: "Norbert Hartl" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Thursday, November 17, 2011 2:09:55 AM
> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
> |
> | Dale,
> |
> |
> | can you explain what the POM is used for? The basic retrieval of
> | persistent objects is via shared memory from the shared page cache,
> | right? So no extra memory for a gem to use. If there is some
> | xxx-on-write strategy that creates new versions of the objects that
> | have been altered is the POM used for this? If yes then would be
> | roughly be the equivalent of a transaction. But then I don't
> | understand why there is anything kept for some time.
> |
> |
> | thanks in advance,
> |
> |
> | Norbert
> |
> |
> |
> | Am 14.11.2011 um 16:45 schrieb Carla F. Griggio:
> |
> |
> | Thanks Dale, that covers my questions :P
> | And Johan, thanks for the tip also.
> |
> |
> |
> |
> |
> | On Thu, Nov 10, 2011 at 5:27 PM, Johan Brichau < [hidden email] >
> | wrote:
> |
> |
> | Hi Dale,
> |
> | I just now added them to the configs and will let you know in a
> | couple of days if it worked ;-)
> |
> |
> |
> |
> |
> | On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
> |
> | > Johan,
> | >
> | > I was curious if you have set
> | >
> | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> | >
> | > in your config files ... I'm running with this setting for SS3 and
> | > I am not seeing unreasonable growth (I see a steady state number
> | > of objects voted down). Either the
> | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90%
> | > flushed) or the character of the data structures involved are
> | > different or we are looking at a different issue ...
> | >
> | > Dale
> | >
> | > ----- Original Message -----
> | > | From: "Johan Brichau" < [hidden email] >
> | > | To: "GemStone Seaside beta discussion" <
> | > | [hidden email] >
> | > | Sent: Tuesday, November 8, 2011 2:21:43 AM
> | > | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
> | > |
> | > | Hi Carla,
> | > |
> | > | We are experiencing similar repository growth on a continuous
> | > | basis.
> | > |
> | > | In a steadily used GLASS installation, we see a weekly build-up
> | > | of
> | > | roughly 2 to 3 gigabytes of data that is continuously being
> | > | collected by the MFC but never reclaimed. This starts to put a
> | > | lot
> | > | more stress on the MFC cycles (we do them nightly only).
> | > |
> | > | The only solution we found (and which was suggested by Dale on
> | > | this
> | > | list) is that we restart all seaside gems (and the maintenance
> | > | gem
> | > | if you have one) without restarting the stone. The MFC/reclaim
> | > | cycle
> | > | that follows this restart is guaranteed to clean up all of the
> | > | dead
> | > | objects. Somehow, the seaside gems are preventing the reclaim.
> | > |
> | > | We execute this restart on a weekly basis. Sometimes, I even do a
> | > | backup/restore of the stone. According to the manual, this
> | > | compacts
> | > | the object table. I am not sure if this backup/restore really
> | > | helps,
> | > | but sometimes I got the impression that MFC operations became
> | > | quicker after that.
> | > |
> | > | I am also not very comfortable with this behavior and, from time
> | > | to
> | > | time, I try to investigate the cause but I have been unsuccessful
> | > | so
> | > | far.
> | > | Dale also suggested the following in a previous mail on this
> | > | list. I
> | > | did not try this yet.
> | > |
> | > | > You might try the following in your system.conf:
> | > | >
> | > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> | > | >
> | > | > #=========================================================================
> | > | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation
> | > | > area
> | > | > # to be cleared when voting on possible dead objects.
> | > | > # If value is > 0 and < 100, subspaces of pom generation older
> | > | > # than 5 minutes are cleared; the number of subspaces cleared
> | > | > is
> | > | > # the specified percentage of spaces in use rounded down to an
> | > | > # integral number of spaces.
> | > | > # If value == 100, all subspaces of pom generation are cleared
> | > | > without
> | > | > # regard to their age.
> | > | > #
> | > | > # If this value is not specified, or the specified value is out
> | > | > of
> | > | > range,
> | > | > # the default is used.
> | > | >
> | > | > You won't get a complete flush of POM objects on vote but
> | > | > eventually you'll flush out the older references... If you are
> | > | > experiencing this problem.
> | > |
> | > |
> | > | cheers
> | > | Johan
> | > |
> | > | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
> | > |
> | > | > Hi everyone! (Specially Dale :P)
> | > | >
> | > | > I've been having some garbage collection issues: I think my
> | > | > repository should have 1GB top of stored objects, and it's
> | > | > always
> | > | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB
> | > | > or
> | > | > so). The thing is that although some Mark For Collection is
> | > | > done
> | > | > from time to time, the reclaim of dead objects always fails due
> | > | > to
> | > | > the Seaside Gems running and holding on to the repository.
> | > | > So last friday during Smalltalks 2011 here in Argentina I
> | > | > grabbed
> | > | > Dale and he explained what to do about it.
> | > | >
> | > | > So, Dale, you told me to follow this list:
> | > | >
> | > | > 1) Run the "SAFE" list from this post to clean the image:
> | > | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
> | > | > 2) Shut Down
> | > | > 3) File Size Report (I think this should actually be the first
> | > | > step, maybe?)
> | > | > 4) Start Stone
> | > | > 5) MFC
> | > | > 6) Do SystemRepository reclaimAll
> | > | > 7) Check again the File Size Report
> | > | > 8) If the file size is not what I expect, try to find
> | > | > references
> | > | > holding to what I supposed should be dead objects, and get rid
> | > | > of
> | > | > them. Then repeat from step 5?
> | > | >
> | > | > This should be a script to be executed by a cron process during
> | > | > moments I know the system is not being used (for example:
> | > | > saturday
> | > | > nights).
> | > | >
> | > | > While trying to do this, I found some unexpected problems... So
> | > | > here's the list I actually followed and what happened:
> | > | >
> | > | > 1) File Size Report:
> | > | > File size = 10502.00 Megabytes
> | > | > Space available = 8968.34 Megabytes
> | > | >
> | > | > Note: It's funny that today the repository wasn't following the
> | > | > pattern I described in the beggining... If everyday I had this
> | > | > file size report results I wouldn't really have a problem :P
> | > | > But
> | > | > well... I still continued through the list.
> | > | >
> | > | > 2) Clean Image
> | > | > ObjectLogEntry emptyLog.
> | > | > MCRepositoryGroup default repositoriesDo: [:rep | rep
> | > | > flushCache
> | > | > ].
> | > | > MCDefinition clearInstances.
> | > | > MCMethodDefinition cachedDefinitions removeKeys:
> | > | > (MCMethodDefinition cachedDefinitions keys).
> | > | > MCMethodDefinition shutDown.
> | > | > MethodVersionHistory uniqueInstance cleanUp.
> | > | >
> | > | > 3) Here I should have done a Shut Down of Gemstone and then
> | > | > start
> | > | > only the stone, but for some reason I couldnt. Although I
> | > | > stoped
> | > | > Gemstone, some processes were still running (admingc, reclaim,
> | > | > shared page monitor, etc) and apparently one of them was
> | > | > holding
> | > | > the repository, so I couldn't start the stone.
> | > | > I tried rebooting the whole system, and then do everything
> | > | > again,
> | > | > but I got the same results.
> | > | > What I ended up doing was rebooting, and with the whole
> | > | > gemstone
> | > | > system running OK, only shut down the seaside gems and
> | > | > mantainance
> | > | > gem.
> | > | >
> | > | > 4) MFC
> | > | > 5) SystemRepository reclaimAll
> | > | >
> | > | > 6) File Size Report:
> | > | >
> | > | > File size = 10502.00 Megabytes
> | > | > Space available = 8321.94 Megabytes
> | > | >
> | > | > Wow, you can see I then had less space available!
> | > | >
> | > | > But I waited a little while longer and then I got these
> | > | > results:
> | > | >
> | > | > File size = 10502.00 Megabytes
> | > | > Space available = 9109.39 Megabytes
> | > | >
> | > | > :) That's better :) But I still think the size could be a
> | > | > little
> | > | > smaller.
> | > | >
> | > | > The admingcgem log said this:
> | > | >
> | > | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
> | > | > Starting values: WSU size=0 PD size=1584716
> | > | > Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART.
> | > | > Removed
> | > | > 0 objects from possibleDead
> | > | > [Info]: WsUnion during first sweep size = 0
> | > | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52
> | > | > ART.
> | > | > Removed 0 objs from possibleDead
> | > | > [Info]: SweepWsUnion ending possibleDead size = 1584716
> | > | > Finished sweeping possible dead at 11/07/11 11:56:52 ART.
> | > | > possible dead size=1584716 notDead size=0
> | > | >
> | > | > And the reclaimgcgem log said this:
> | > | >
> | > | > 11/07/11 11:57:55 ART
> | > | > 1 reclaims 130 pagesProcessed 130 pagesReclaimed 10
> | > | > allValidPages 8 singleObjPages
> | > | > 20243 processedObjs 14940 liveObjs 13 shadowObjs 5290 deadObjs
> | > | > 155.7 avgObjsPerPage
> | > | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
> | > | > reclaimMinPages = 1 Pages.
> | > | >
> | > | >
> | > | > Dale, you explained something to me about the relation between
> | > | > possible dead size and notDead size. What was that? If notDead
> | > | > size = 0 it means that there's nothing more to be reclaimed? Or
> | > | > should possible dead size be = 0?
> | > | >
> | > | > Gracias!
> | > | >
> | > | > Carla.
> | > | >
> | > |
> | > |
> |
> |
> |
> |

Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
Norbert,

temp obj space is not flushed at commit time ... All of the temporary objects (non-committed) live in temp obj space and survive across transaction boundaries. The dirty persistent objects that are committed are left in place in temp obj space and are eligible be scavenged like any other other object in temp obj space.

At commit time the dirty objects are written to the SPC and otherwise the state of object memory is pretty much left alone (modulo dirty object bookkeeping). Objects that were changed by other sessions are marked as invalid in the vm's object space and will be faulted back into memory on the next reference.

There were some pretty nasty bugs with the old scheme as well and those bugs weren't as easy to fix:)

Dale

----- Original Message -----
| From: "Norbert Hartl" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Thursday, November 17, 2011 9:12:16 AM
| Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
|
|
| Am 17.11.2011 um 17:56 schrieb Dale Henrichs:
|
| > Norbert,
| >
| > In the 32 bit product, the vm referenced the persistent objects
| > directly in the SPC, so no GEM memory was used for non-modified
| > persistent objects.
| >
| > In the 64 bit product, the vm copies persistent objects from the
| > SPC into POM space _on reference_. So each vm has copies of the
| > objects in its memory space. If a persistent object is modified it
| > is copied to TEMPOBJ space. So there can be references to "stale"
| > persistent objects.
| >
| But temp obj memory is flushed if a transaction is committed, right?
|
| > There are all kinds of performance advantages for using copy on
| > read...at the cost of memory.
| >
| There is a big amount of possibilities for optimizing things. And
| there is another amount which similar but slightly bigger and that's
| for creating problems ;)
|
| Norbert
|
| > Dale
| >
| > ----- Original Message -----
| > | From: "Norbert Hartl" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Thursday, November 17, 2011 2:09:55 AM
| > | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
| > |
| > | Dale,
| > |
| > |
| > | can you explain what the POM is used for? The basic retrieval of
| > | persistent objects is via shared memory from the shared page
| > | cache,
| > | right? So no extra memory for a gem to use. If there is some
| > | xxx-on-write strategy that creates new versions of the objects
| > | that
| > | have been altered is the POM used for this? If yes then would be
| > | roughly be the equivalent of a transaction. But then I don't
| > | understand why there is anything kept for some time.
| > |
| > |
| > | thanks in advance,
| > |
| > |
| > | Norbert
| > |
| > |
| > |
| > | Am 14.11.2011 um 16:45 schrieb Carla F. Griggio:
| > |
| > |
| > | Thanks Dale, that covers my questions :P
| > | And Johan, thanks for the tip also.
| > |
| > |
| > |
| > |
| > |
| > | On Thu, Nov 10, 2011 at 5:27 PM, Johan Brichau <
| > | [hidden email] >
| > | wrote:
| > |
| > |
| > | Hi Dale,
| > |
| > | I just now added them to the configs and will let you know in a
| > | couple of days if it worked ;-)
| > |
| > |
| > |
| > |
| > |
| > | On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
| > |
| > | > Johan,
| > | >
| > | > I was curious if you have set
| > | >
| > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| > | >
| > | > in your config files ... I'm running with this setting for SS3
| > | > and
| > | > I am not seeing unreasonable growth (I see a steady state
| > | > number
| > | > of objects voted down). Either the
| > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90%
| > | > flushed) or the character of the data structures involved are
| > | > different or we are looking at a different issue ...
| > | >
| > | > Dale
| > | >
| > | > ----- Original Message -----
| > | > | From: "Johan Brichau" < [hidden email] >
| > | > | To: "GemStone Seaside beta discussion" <
| > | > | [hidden email] >
| > | > | Sent: Tuesday, November 8, 2011 2:21:43 AM
| > | > | Subject: Re: [GS/SS Beta] Solving some Garbage Collection
| > | > | issues
| > | > |
| > | > | Hi Carla,
| > | > |
| > | > | We are experiencing similar repository growth on a continuous
| > | > | basis.
| > | > |
| > | > | In a steadily used GLASS installation, we see a weekly
| > | > | build-up
| > | > | of
| > | > | roughly 2 to 3 gigabytes of data that is continuously being
| > | > | collected by the MFC but never reclaimed. This starts to put
| > | > | a
| > | > | lot
| > | > | more stress on the MFC cycles (we do them nightly only).
| > | > |
| > | > | The only solution we found (and which was suggested by Dale
| > | > | on
| > | > | this
| > | > | list) is that we restart all seaside gems (and the
| > | > | maintenance
| > | > | gem
| > | > | if you have one) without restarting the stone. The
| > | > | MFC/reclaim
| > | > | cycle
| > | > | that follows this restart is guaranteed to clean up all of
| > | > | the
| > | > | dead
| > | > | objects. Somehow, the seaside gems are preventing the
| > | > | reclaim.
| > | > |
| > | > | We execute this restart on a weekly basis. Sometimes, I even
| > | > | do a
| > | > | backup/restore of the stone. According to the manual, this
| > | > | compacts
| > | > | the object table. I am not sure if this backup/restore really
| > | > | helps,
| > | > | but sometimes I got the impression that MFC operations became
| > | > | quicker after that.
| > | > |
| > | > | I am also not very comfortable with this behavior and, from
| > | > | time
| > | > | to
| > | > | time, I try to investigate the cause but I have been
| > | > | unsuccessful
| > | > | so
| > | > | far.
| > | > | Dale also suggested the following in a previous mail on this
| > | > | list. I
| > | > | did not try this yet.
| > | > |
| > | > | > You might try the following in your system.conf:
| > | > | >
| > | > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| > | > | >
| > | > | > #=========================================================================
| > | > | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom
| > | > | > generation
| > | > | > area
| > | > | > # to be cleared when voting on possible dead objects.
| > | > | > # If value is > 0 and < 100, subspaces of pom generation
| > | > | > older
| > | > | > # than 5 minutes are cleared; the number of subspaces
| > | > | > cleared
| > | > | > is
| > | > | > # the specified percentage of spaces in use rounded down to
| > | > | > an
| > | > | > # integral number of spaces.
| > | > | > # If value == 100, all subspaces of pom generation are
| > | > | > cleared
| > | > | > without
| > | > | > # regard to their age.
| > | > | > #
| > | > | > # If this value is not specified, or the specified value is
| > | > | > out
| > | > | > of
| > | > | > range,
| > | > | > # the default is used.
| > | > | >
| > | > | > You won't get a complete flush of POM objects on vote but
| > | > | > eventually you'll flush out the older references... If you
| > | > | > are
| > | > | > experiencing this problem.
| > | > |
| > | > |
| > | > | cheers
| > | > | Johan
| > | > |
| > | > | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
| > | > |
| > | > | > Hi everyone! (Specially Dale :P)
| > | > | >
| > | > | > I've been having some garbage collection issues: I think my
| > | > | > repository should have 1GB top of stored objects, and it's
| > | > | > always
| > | > | > around 2.5GB (sometimes it even gets annoyingly large, like
| > | > | > 7GB
| > | > | > or
| > | > | > so). The thing is that although some Mark For Collection is
| > | > | > done
| > | > | > from time to time, the reclaim of dead objects always fails
| > | > | > due
| > | > | > to
| > | > | > the Seaside Gems running and holding on to the repository.
| > | > | > So last friday during Smalltalks 2011 here in Argentina I
| > | > | > grabbed
| > | > | > Dale and he explained what to do about it.
| > | > | >
| > | > | > So, Dale, you told me to follow this list:
| > | > | >
| > | > | > 1) Run the "SAFE" list from this post to clean the image:
| > | > | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| > | > | > 2) Shut Down
| > | > | > 3) File Size Report (I think this should actually be the
| > | > | > first
| > | > | > step, maybe?)
| > | > | > 4) Start Stone
| > | > | > 5) MFC
| > | > | > 6) Do SystemRepository reclaimAll
| > | > | > 7) Check again the File Size Report
| > | > | > 8) If the file size is not what I expect, try to find
| > | > | > references
| > | > | > holding to what I supposed should be dead objects, and get
| > | > | > rid
| > | > | > of
| > | > | > them. Then repeat from step 5?
| > | > | >
| > | > | > This should be a script to be executed by a cron process
| > | > | > during
| > | > | > moments I know the system is not being used (for example:
| > | > | > saturday
| > | > | > nights).
| > | > | >
| > | > | > While trying to do this, I found some unexpected
| > | > | > problems... So
| > | > | > here's the list I actually followed and what happened:
| > | > | >
| > | > | > 1) File Size Report:
| > | > | > File size = 10502.00 Megabytes
| > | > | > Space available = 8968.34 Megabytes
| > | > | >
| > | > | > Note: It's funny that today the repository wasn't following
| > | > | > the
| > | > | > pattern I described in the beggining... If everyday I had
| > | > | > this
| > | > | > file size report results I wouldn't really have a problem
| > | > | > :P
| > | > | > But
| > | > | > well... I still continued through the list.
| > | > | >
| > | > | > 2) Clean Image
| > | > | > ObjectLogEntry emptyLog.
| > | > | > MCRepositoryGroup default repositoriesDo: [:rep | rep
| > | > | > flushCache
| > | > | > ].
| > | > | > MCDefinition clearInstances.
| > | > | > MCMethodDefinition cachedDefinitions removeKeys:
| > | > | > (MCMethodDefinition cachedDefinitions keys).
| > | > | > MCMethodDefinition shutDown.
| > | > | > MethodVersionHistory uniqueInstance cleanUp.
| > | > | >
| > | > | > 3) Here I should have done a Shut Down of Gemstone and then
| > | > | > start
| > | > | > only the stone, but for some reason I couldnt. Although I
| > | > | > stoped
| > | > | > Gemstone, some processes were still running (admingc,
| > | > | > reclaim,
| > | > | > shared page monitor, etc) and apparently one of them was
| > | > | > holding
| > | > | > the repository, so I couldn't start the stone.
| > | > | > I tried rebooting the whole system, and then do everything
| > | > | > again,
| > | > | > but I got the same results.
| > | > | > What I ended up doing was rebooting, and with the whole
| > | > | > gemstone
| > | > | > system running OK, only shut down the seaside gems and
| > | > | > mantainance
| > | > | > gem.
| > | > | >
| > | > | > 4) MFC
| > | > | > 5) SystemRepository reclaimAll
| > | > | >
| > | > | > 6) File Size Report:
| > | > | >
| > | > | > File size = 10502.00 Megabytes
| > | > | > Space available = 8321.94 Megabytes
| > | > | >
| > | > | > Wow, you can see I then had less space available!
| > | > | >
| > | > | > But I waited a little while longer and then I got these
| > | > | > results:
| > | > | >
| > | > | > File size = 10502.00 Megabytes
| > | > | > Space available = 9109.39 Megabytes
| > | > | >
| > | > | > :) That's better :) But I still think the size could be a
| > | > | > little
| > | > | > smaller.
| > | > | >
| > | > | > The admingcgem log said this:
| > | > | >
| > | > | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| > | > | > Starting values: WSU size=0 PD size=1584716
| > | > | > Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART.
| > | > | > Removed
| > | > | > 0 objects from possibleDead
| > | > | > [Info]: WsUnion during first sweep size = 0
| > | > | > [Info]: Finished second GarRemoveNotDead at 11/07/11
| > | > | > 11:56:52
| > | > | > ART.
| > | > | > Removed 0 objs from possibleDead
| > | > | > [Info]: SweepWsUnion ending possibleDead size = 1584716
| > | > | > Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| > | > | > possible dead size=1584716 notDead size=0
| > | > | >
| > | > | > And the reclaimgcgem log said this:
| > | > | >
| > | > | > 11/07/11 11:57:55 ART
| > | > | > 1 reclaims 130 pagesProcessed 130 pagesReclaimed 10
| > | > | > allValidPages 8 singleObjPages
| > | > | > 20243 processedObjs 14940 liveObjs 13 shadowObjs 5290
| > | > | > deadObjs
| > | > | > 155.7 avgObjsPerPage
| > | > | > [Info]: Parameter changes noticed on '11/07/11 12:07:23
| > | > | > ART':
| > | > | > reclaimMinPages = 1 Pages.
| > | > | >
| > | > | >
| > | > | > Dale, you explained something to me about the relation
| > | > | > between
| > | > | > possible dead size and notDead size. What was that? If
| > | > | > notDead
| > | > | > size = 0 it means that there's nothing more to be
| > | > | > reclaimed? Or
| > | > | > should possible dead size be = 0?
| > | > | >
| > | > | > Gracias!
| > | > | >
| > | > | > Carla.
| > | > | >
| > | > |
| > | > |
| > |
| > |
| > |
| > |
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Johan Brichau-2
In reply to this post by Johan Brichau-2
Hi Dale, Carla,

It's now been a week since I set the parameter below and I have not seen any unreasonable growth since.

GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90

Thanks for the info!
Johan

On 10 Nov 2011, at 21:27, Johan Brichau wrote:

> Hi Dale,
>
> I just now added them to the configs and will let you know in a couple of days if it worked ;-)
>
>
> On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
>
>> Johan,
>>
>> I was curious if you have set
>>
>> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>>
>> in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...
>>
>> Dale
>>
>> ----- Original Message -----
>> | From: "Johan Brichau" <[hidden email]>
>> | To: "GemStone Seaside beta discussion" <[hidden email]>
>> | Sent: Tuesday, November 8, 2011 2:21:43 AM
>> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
>> |
>> | Hi Carla,
>> |
>> | We are experiencing similar repository growth on a continuous basis.
>> |
>> | In a steadily used GLASS installation, we see a weekly build-up of
>> | roughly 2 to 3 gigabytes of data that is continuously being
>> | collected by the MFC but never reclaimed. This starts to put a lot
>> | more stress on the MFC cycles (we do them nightly only).
>> |
>> | The only solution we found (and which was suggested by Dale on this
>> | list) is that we restart all seaside gems (and the maintenance gem
>> | if you have one) without restarting the stone. The MFC/reclaim cycle
>> | that follows this restart is guaranteed to clean up all of the dead
>> | objects. Somehow, the seaside gems are preventing the reclaim.
>> |
>> | We execute this restart on a weekly basis. Sometimes, I even do a
>> | backup/restore of the stone. According to the manual, this compacts
>> | the object table. I am not sure if this backup/restore really helps,
>> | but sometimes I got the impression that MFC operations became
>> | quicker after that.
>> |
>> | I am also not very comfortable with this behavior and, from time to
>> | time, I try to investigate the cause but I have been unsuccessful so
>> | far.
>> | Dale also suggested the following in a previous mail on this list. I
>> | did not try this yet.
>> |
>> | > You might try the following in your system.conf:
>> | >
>> | >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>> | >
>> | > #=========================================================================
>> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
>> | > #   to be cleared when voting on possible dead objects.
>> | > #   If value is > 0 and < 100, subspaces of pom generation older
>> | > #   than 5 minutes are cleared; the number of subspaces cleared is
>> | > #   the specified percentage of spaces in use rounded down to an
>> | > #   integral number of spaces.
>> | > #   If value == 100, all subspaces of pom generation are cleared
>> | > without
>> | > #   regard to their age.
>> | > #
>> | > # If this value is not specified, or the specified value is out of
>> | > range,
>> | > # the default is used.
>> | >
>> | > You won't get a complete flush of POM objects on vote but
>> | > eventually you'll flush out the older references... If you are
>> | > experiencing this problem.
>> |
>> |
>> | cheers
>> | Johan
>> |
>> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
>> |
>> | > Hi everyone! (Specially Dale :P)
>> | >
>> | > I've been having some garbage collection issues: I think my
>> | > repository should have 1GB top of stored objects, and it's always
>> | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
>> | > so). The thing is that although some Mark For Collection is done
>> | > from time to time, the reclaim of dead objects always fails due to
>> | > the Seaside Gems running and holding on to the repository.
>> | > So last friday during Smalltalks 2011 here in Argentina I grabbed
>> | > Dale and he explained what to do about it.
>> | >
>> | > So, Dale, you told me to follow this list:
>> | >
>> | > 1) Run the "SAFE" list from this post to clean the image:
>> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
>> | > 2) Shut Down
>> | > 3) File Size Report (I think this should actually be the first
>> | > step, maybe?)
>> | > 4) Start Stone
>> | > 5) MFC
>> | > 6) Do SystemRepository reclaimAll
>> | > 7) Check again the File Size Report
>> | > 8) If the file size is not what I expect, try to find references
>> | > holding to what I supposed should be dead objects, and get rid of
>> | > them. Then repeat from step 5?
>> | >
>> | > This should be a script to be executed by a cron process during
>> | > moments I know the system is not being used (for example: saturday
>> | > nights).
>> | >
>> | > While trying to do this, I found some unexpected problems... So
>> | > here's the list I actually followed and what happened:
>> | >
>> | > 1) File Size Report:
>> | > File size =       10502.00 Megabytes
>> | > Space available = 8968.34 Megabytes
>> | >
>> | > Note: It's funny that today the repository wasn't following the
>> | > pattern I described in the beggining... If everyday I had this
>> | > file size report results I wouldn't really have a problem :P But
>> | > well... I still continued through the list.
>> | >
>> | > 2) Clean Image
>> | > ObjectLogEntry emptyLog.
>> | >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
>> | >  ].
>> | >  MCDefinition clearInstances.
>> | >  MCMethodDefinition cachedDefinitions  removeKeys:
>> | >     (MCMethodDefinition cachedDefinitions keys).
>> | >  MCMethodDefinition shutDown.
>> | >  MethodVersionHistory uniqueInstance cleanUp.
>> | >
>> | > 3) Here I should have done a Shut Down of Gemstone and then start
>> | > only the stone, but for some reason I couldnt. Although I stoped
>> | > Gemstone, some processes were still running (admingc, reclaim,
>> | > shared page monitor, etc) and apparently one of them was holding
>> | > the repository, so I couldn't start the stone.
>> | > I tried rebooting the whole system, and then do everything again,
>> | > but I got the same results.
>> | > What I ended up doing was rebooting, and with the whole gemstone
>> | > system running OK, only shut down the seaside gems and mantainance
>> | > gem.
>> | >
>> | > 4) MFC
>> | > 5) SystemRepository reclaimAll
>> | >
>> | > 6) File Size Report:
>> | >
>> | > File size =       10502.00 Megabytes
>> | > Space available = 8321.94 Megabytes
>> | >
>> | > Wow, you can see I then had less space available!
>> | >
>> | > But I waited a little while longer and then I got these results:
>> | >    
>> | > File size =       10502.00 Megabytes
>> | > Space available = 9109.39 Megabytes
>> | >
>> | > :) That's better :) But I still think the size could be a little
>> | > smaller.
>> | >
>> | > The admingcgem log said this:
>> | >
>> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
>> | >   Starting values: WSU size=0  PD size=1584716
>> | >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
>> | >   0 objects from possibleDead
>> | > [Info]: WsUnion during first sweep size = 0
>> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
>> | >  Removed 0 objs from possibleDead
>> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
>> | >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
>> | >   possible dead size=1584716  notDead size=0
>> | >
>> | > And the reclaimgcgem log said this:
>> | >
>> | > 11/07/11 11:57:55 ART
>> | >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
>> | >    allValidPages  8 singleObjPages
>> | >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
>> | >     155.7 avgObjsPerPage
>> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
>> | >   reclaimMinPages = 1 Pages.
>> | >
>> | >
>> | > Dale, you explained something to me about the relation between
>> | > possible dead size and notDead size. What was that? If notDead
>> | > size = 0 it means that there's nothing more to be reclaimed? Or
>> | > should possible dead size be = 0?
>> | >
>> | > Gracias!
>> | >
>> | > Carla.
>> | >
>> |
>> |
>

Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Carla F. Griggio
Hi!
I added that configuration yesterday, I'll keep track of the repository growing behaviour and next week I'll tell you if I noticed any difference.

However, I 'smell' that the growth I have is very related to the design of the project...

The thing is that although I can garbage collect as explained in the first mail, for example everynight, it's not enough, because some use cases cause BIG growth (like 2GB in a few minutes), so now I have to figure out why does it grow so much in those specific use cases. Too many temporary objects, I guess...

On Sat, Nov 19, 2011 at 4:39 AM, Johan Brichau <[hidden email]> wrote:
Hi Dale, Carla,

It's now been a week since I set the parameter below and I have not seen any unreasonable growth since.

GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90

Thanks for the info!
Johan

On 10 Nov 2011, at 21:27, Johan Brichau wrote:

> Hi Dale,
>
> I just now added them to the configs and will let you know in a couple of days if it worked ;-)
>
>
> On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
>
>> Johan,
>>
>> I was curious if you have set
>>
>> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>>
>> in your config files ... I'm running with this setting for SS3 and I am not seeing unreasonable growth (I see a steady state number of objects voted down). Either the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90% flushed) or the character of the data structures involved are different or we are looking at a different issue ...
>>
>> Dale
>>
>> ----- Original Message -----
>> | From: "Johan Brichau" <[hidden email]>
>> | To: "GemStone Seaside beta discussion" <[hidden email]>
>> | Sent: Tuesday, November 8, 2011 2:21:43 AM
>> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
>> |
>> | Hi Carla,
>> |
>> | We are experiencing similar repository growth on a continuous basis.
>> |
>> | In a steadily used GLASS installation, we see a weekly build-up of
>> | roughly 2 to 3 gigabytes of data that is continuously being
>> | collected by the MFC but never reclaimed. This starts to put a lot
>> | more stress on the MFC cycles (we do them nightly only).
>> |
>> | The only solution we found (and which was suggested by Dale on this
>> | list) is that we restart all seaside gems (and the maintenance gem
>> | if you have one) without restarting the stone. The MFC/reclaim cycle
>> | that follows this restart is guaranteed to clean up all of the dead
>> | objects. Somehow, the seaside gems are preventing the reclaim.
>> |
>> | We execute this restart on a weekly basis. Sometimes, I even do a
>> | backup/restore of the stone. According to the manual, this compacts
>> | the object table. I am not sure if this backup/restore really helps,
>> | but sometimes I got the impression that MFC operations became
>> | quicker after that.
>> |
>> | I am also not very comfortable with this behavior and, from time to
>> | time, I try to investigate the cause but I have been unsuccessful so
>> | far.
>> | Dale also suggested the following in a previous mail on this list. I
>> | did not try this yet.
>> |
>> | > You might try the following in your system.conf:
>> | >
>> | >  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
>> | >
>> | > #=========================================================================
>> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation area
>> | > #   to be cleared when voting on possible dead objects.
>> | > #   If value is > 0 and < 100, subspaces of pom generation older
>> | > #   than 5 minutes are cleared; the number of subspaces cleared is
>> | > #   the specified percentage of spaces in use rounded down to an
>> | > #   integral number of spaces.
>> | > #   If value == 100, all subspaces of pom generation are cleared
>> | > without
>> | > #   regard to their age.
>> | > #
>> | > # If this value is not specified, or the specified value is out of
>> | > range,
>> | > # the default is used.
>> | >
>> | > You won't get a complete flush of POM objects on vote but
>> | > eventually you'll flush out the older references... If you are
>> | > experiencing this problem.
>> |
>> |
>> | cheers
>> | Johan
>> |
>> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
>> |
>> | > Hi everyone! (Specially Dale :P)
>> | >
>> | > I've been having some garbage collection issues: I think my
>> | > repository should have 1GB top of stored objects, and it's always
>> | > around 2.5GB (sometimes it even gets annoyingly large, like 7GB or
>> | > so). The thing is that although some Mark For Collection is done
>> | > from time to time, the reclaim of dead objects always fails due to
>> | > the Seaside Gems running and holding on to the repository.
>> | > So last friday during Smalltalks 2011 here in Argentina I grabbed
>> | > Dale and he explained what to do about it.
>> | >
>> | > So, Dale, you told me to follow this list:
>> | >
>> | > 1) Run the "SAFE" list from this post to clean the image:
>> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
>> | > 2) Shut Down
>> | > 3) File Size Report (I think this should actually be the first
>> | > step, maybe?)
>> | > 4) Start Stone
>> | > 5) MFC
>> | > 6) Do SystemRepository reclaimAll
>> | > 7) Check again the File Size Report
>> | > 8) If the file size is not what I expect, try to find references
>> | > holding to what I supposed should be dead objects, and get rid of
>> | > them. Then repeat from step 5?
>> | >
>> | > This should be a script to be executed by a cron process during
>> | > moments I know the system is not being used (for example: saturday
>> | > nights).
>> | >
>> | > While trying to do this, I found some unexpected problems... So
>> | > here's the list I actually followed and what happened:
>> | >
>> | > 1) File Size Report:
>> | > File size =       10502.00 Megabytes
>> | > Space available = 8968.34 Megabytes
>> | >
>> | > Note: It's funny that today the repository wasn't following the
>> | > pattern I described in the beggining... If everyday I had this
>> | > file size report results I wouldn't really have a problem :P But
>> | > well... I still continued through the list.
>> | >
>> | > 2) Clean Image
>> | > ObjectLogEntry emptyLog.
>> | >  MCRepositoryGroup default repositoriesDo: [:rep | rep flushCache
>> | >  ].
>> | >  MCDefinition clearInstances.
>> | >  MCMethodDefinition cachedDefinitions  removeKeys:
>> | >     (MCMethodDefinition cachedDefinitions keys).
>> | >  MCMethodDefinition shutDown.
>> | >  MethodVersionHistory uniqueInstance cleanUp.
>> | >
>> | > 3) Here I should have done a Shut Down of Gemstone and then start
>> | > only the stone, but for some reason I couldnt. Although I stoped
>> | > Gemstone, some processes were still running (admingc, reclaim,
>> | > shared page monitor, etc) and apparently one of them was holding
>> | > the repository, so I couldn't start the stone.
>> | > I tried rebooting the whole system, and then do everything again,
>> | > but I got the same results.
>> | > What I ended up doing was rebooting, and with the whole gemstone
>> | > system running OK, only shut down the seaside gems and mantainance
>> | > gem.
>> | >
>> | > 4) MFC
>> | > 5) SystemRepository reclaimAll
>> | >
>> | > 6) File Size Report:
>> | >
>> | > File size =       10502.00 Megabytes
>> | > Space available = 8321.94 Megabytes
>> | >
>> | > Wow, you can see I then had less space available!
>> | >
>> | > But I waited a little while longer and then I got these results:
>> | >
>> | > File size =       10502.00 Megabytes
>> | > Space available = 9109.39 Megabytes
>> | >
>> | > :) That's better :) But I still think the size could be a little
>> | > smaller.
>> | >
>> | > The admingcgem log said this:
>> | >
>> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
>> | >   Starting values: WSU size=0  PD size=1584716
>> | >   Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART. Removed
>> | >   0 objects from possibleDead
>> | > [Info]: WsUnion during first sweep size = 0
>> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52 ART.
>> | >  Removed 0 objs from possibleDead
>> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
>> | >   Finished sweeping possible dead at 11/07/11 11:56:52 ART.
>> | >   possible dead size=1584716  notDead size=0
>> | >
>> | > And the reclaimgcgem log said this:
>> | >
>> | > 11/07/11 11:57:55 ART
>> | >    1 reclaims  130 pagesProcessed  130 pagesReclaimed  10
>> | >    allValidPages  8 singleObjPages
>> | >    20243 processedObjs  14940 liveObjs  13 shadowObjs 5290 deadObjs
>> | >     155.7 avgObjsPerPage
>> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
>> | >   reclaimMinPages = 1 Pages.
>> | >
>> | >
>> | > Dale, you explained something to me about the relation between
>> | > possible dead size and notDead size. What was that? If notDead
>> | > size = 0 it means that there's nothing more to be reclaimed? Or
>> | > should possible dead size be = 0?
>> | >
>> | > Gracias!
>> | >
>> | > Carla.
>> | >
>> |
>> |
>


Reply | Threaded
Open this post in threaded view
|

Re: Solving some Garbage Collection issues

Dale Henrichs
Carla,

Yes it would be good to separate the object growth into two 'piles':

  1. objects persisted from session state
  2. objects persisted from object model

The stuff that is persisted from session state should be gc'd after the sessions have been expired (modulo the objects kept alive by not pruning).

The stuff that is persisted from the object model will be long lived and require object model surgery before it will go away...

Using GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90 should give you good yield on gcing objects that were persisted from session state....and after awhile you should be able to separate the big growth spurts into the correct pile ... then those spurts can be addressed.

Dale


----- Original Message -----
| From: "Carla F. Griggio" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, November 22, 2011 4:08:00 PM
| Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
|
| Hi!
| I added that configuration yesterday, I'll keep track of the
| repository growing behaviour and next week I'll tell you if I
| noticed any difference.
|
|
| However, I 'smell' that the growth I have is very related to the
| design of the project...
|
|
| The thing is that although I can garbage collect as explained in the
| first mail, for example everynight, it's not enough, because some
| use cases cause BIG growth (like 2GB in a few minutes), so now I
| have to figure out why does it grow so much in those specific use
| cases. Too many temporary objects, I guess...
|
|
| On Sat, Nov 19, 2011 at 4:39 AM, Johan Brichau < [hidden email] >
| wrote:
|
|
| Hi Dale, Carla,
|
| It's now been a week since I set the parameter below and I have not
| seen any unreasonable growth since.
|
| GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
|
| Thanks for the info!
| Johan
|
|
|
|
| On 10 Nov 2011, at 21:27, Johan Brichau wrote:
|
| > Hi Dale,
| >
| > I just now added them to the configs and will let you know in a
| > couple of days if it worked ;-)
| >
| >
| > On 10 Nov 2011, at 18:16, Dale Henrichs wrote:
| >
| >> Johan,
| >>
| >> I was curious if you have set
| >>
| >> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| >>
| >> in your config files ... I'm running with this setting for SS3 and
| >> I am not seeing unreasonable growth (I see a steady state number
| >> of objects voted down). Either the
| >> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is helping (even with only 90%
| >> flushed) or the character of the data structures involved are
| >> different or we are looking at a different issue ...
| >>
| >> Dale
| >>
| >> ----- Original Message -----
| >> | From: "Johan Brichau" < [hidden email] >
| >> | To: "GemStone Seaside beta discussion" <
| >> | [hidden email] >
| >> | Sent: Tuesday, November 8, 2011 2:21:43 AM
| >> | Subject: Re: [GS/SS Beta] Solving some Garbage Collection issues
| >> |
| >> | Hi Carla,
| >> |
| >> | We are experiencing similar repository growth on a continuous
| >> | basis.
| >> |
| >> | In a steadily used GLASS installation, we see a weekly build-up
| >> | of
| >> | roughly 2 to 3 gigabytes of data that is continuously being
| >> | collected by the MFC but never reclaimed. This starts to put a
| >> | lot
| >> | more stress on the MFC cycles (we do them nightly only).
| >> |
| >> | The only solution we found (and which was suggested by Dale on
| >> | this
| >> | list) is that we restart all seaside gems (and the maintenance
| >> | gem
| >> | if you have one) without restarting the stone. The MFC/reclaim
| >> | cycle
| >> | that follows this restart is guaranteed to clean up all of the
| >> | dead
| >> | objects. Somehow, the seaside gems are preventing the reclaim.
| >> |
| >> | We execute this restart on a weekly basis. Sometimes, I even do
| >> | a
| >> | backup/restore of the stone. According to the manual, this
| >> | compacts
| >> | the object table. I am not sure if this backup/restore really
| >> | helps,
| >> | but sometimes I got the impression that MFC operations became
| >> | quicker after that.
| >> |
| >> | I am also not very comfortable with this behavior and, from time
| >> | to
| >> | time, I try to investigate the cause but I have been
| >> | unsuccessful so
| >> | far.
| >> | Dale also suggested the following in a previous mail on this
| >> | list. I
| >> | did not try this yet.
| >> |
| >> | > You might try the following in your system.conf:
| >> | >
| >> | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| >> | >
| >> | > #=========================================================================
| >> | > # GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE: Percent of pom generation
| >> | > area
| >> | > # to be cleared when voting on possible dead objects.
| >> | > # If value is > 0 and < 100, subspaces of pom generation older
| >> | > # than 5 minutes are cleared; the number of subspaces cleared
| >> | > is
| >> | > # the specified percentage of spaces in use rounded down to an
| >> | > # integral number of spaces.
| >> | > # If value == 100, all subspaces of pom generation are cleared
| >> | > without
| >> | > # regard to their age.
| >> | > #
| >> | > # If this value is not specified, or the specified value is
| >> | > out of
| >> | > range,
| >> | > # the default is used.
| >> | >
| >> | > You won't get a complete flush of POM objects on vote but
| >> | > eventually you'll flush out the older references... If you are
| >> | > experiencing this problem.
| >> |
| >> |
| >> | cheers
| >> | Johan
| >> |
| >> | On 07 Nov 2011, at 16:40, Carla F. Griggio wrote:
| >> |
| >> | > Hi everyone! (Specially Dale :P)
| >> | >
| >> | > I've been having some garbage collection issues: I think my
| >> | > repository should have 1GB top of stored objects, and it's
| >> | > always
| >> | > around 2.5GB (sometimes it even gets annoyingly large, like
| >> | > 7GB or
| >> | > so). The thing is that although some Mark For Collection is
| >> | > done
| >> | > from time to time, the reclaim of dead objects always fails
| >> | > due to
| >> | > the Seaside Gems running and holding on to the repository.
| >> | > So last friday during Smalltalks 2011 here in Argentina I
| >> | > grabbed
| >> | > Dale and he explained what to do about it.
| >> | >
| >> | > So, Dale, you told me to follow this list:
| >> | >
| >> | > 1) Run the "SAFE" list from this post to clean the image:
| >> | > http://forum.world.st/cleaning-shrinking-the-stone-td1679041.html#a1679436
| >> | > 2) Shut Down
| >> | > 3) File Size Report (I think this should actually be the first
| >> | > step, maybe?)
| >> | > 4) Start Stone
| >> | > 5) MFC
| >> | > 6) Do SystemRepository reclaimAll
| >> | > 7) Check again the File Size Report
| >> | > 8) If the file size is not what I expect, try to find
| >> | > references
| >> | > holding to what I supposed should be dead objects, and get rid
| >> | > of
| >> | > them. Then repeat from step 5?
| >> | >
| >> | > This should be a script to be executed by a cron process
| >> | > during
| >> | > moments I know the system is not being used (for example:
| >> | > saturday
| >> | > nights).
| >> | >
| >> | > While trying to do this, I found some unexpected problems...
| >> | > So
| >> | > here's the list I actually followed and what happened:
| >> | >
| >> | > 1) File Size Report:
| >> | > File size = 10502.00 Megabytes
| >> | > Space available = 8968.34 Megabytes
| >> | >
| >> | > Note: It's funny that today the repository wasn't following
| >> | > the
| >> | > pattern I described in the beggining... If everyday I had this
| >> | > file size report results I wouldn't really have a problem :P
| >> | > But
| >> | > well... I still continued through the list.
| >> | >
| >> | > 2) Clean Image
| >> | > ObjectLogEntry emptyLog.
| >> | > MCRepositoryGroup default repositoriesDo: [:rep | rep
| >> | > flushCache
| >> | > ].
| >> | > MCDefinition clearInstances.
| >> | > MCMethodDefinition cachedDefinitions removeKeys:
| >> | > (MCMethodDefinition cachedDefinitions keys).
| >> | > MCMethodDefinition shutDown.
| >> | > MethodVersionHistory uniqueInstance cleanUp.
| >> | >
| >> | > 3) Here I should have done a Shut Down of Gemstone and then
| >> | > start
| >> | > only the stone, but for some reason I couldnt. Although I
| >> | > stoped
| >> | > Gemstone, some processes were still running (admingc, reclaim,
| >> | > shared page monitor, etc) and apparently one of them was
| >> | > holding
| >> | > the repository, so I couldn't start the stone.
| >> | > I tried rebooting the whole system, and then do everything
| >> | > again,
| >> | > but I got the same results.
| >> | > What I ended up doing was rebooting, and with the whole
| >> | > gemstone
| >> | > system running OK, only shut down the seaside gems and
| >> | > mantainance
| >> | > gem.
| >> | >
| >> | > 4) MFC
| >> | > 5) SystemRepository reclaimAll
| >> | >
| >> | > 6) File Size Report:
| >> | >
| >> | > File size = 10502.00 Megabytes
| >> | > Space available = 8321.94 Megabytes
| >> | >
| >> | > Wow, you can see I then had less space available!
| >> | >
| >> | > But I waited a little while longer and then I got these
| >> | > results:
| >> | >
| >> | > File size = 10502.00 Megabytes
| >> | > Space available = 9109.39 Megabytes
| >> | >
| >> | > :) That's better :) But I still think the size could be a
| >> | > little
| >> | > smaller.
| >> | >
| >> | > The admingcgem log said this:
| >> | >
| >> | > Starting doSweepWsUnion at 11/07/11 11:56:52 ART
| >> | > Starting values: WSU size=0 PD size=1584716
| >> | > Finished first GarRemoveNotDead at 11/07/11 11:56:52 ART.
| >> | > Removed
| >> | > 0 objects from possibleDead
| >> | > [Info]: WsUnion during first sweep size = 0
| >> | > [Info]: Finished second GarRemoveNotDead at 11/07/11 11:56:52
| >> | > ART.
| >> | > Removed 0 objs from possibleDead
| >> | > [Info]: SweepWsUnion ending possibleDead size = 1584716
| >> | > Finished sweeping possible dead at 11/07/11 11:56:52 ART.
| >> | > possible dead size=1584716 notDead size=0
| >> | >
| >> | > And the reclaimgcgem log said this:
| >> | >
| >> | > 11/07/11 11:57:55 ART
| >> | > 1 reclaims 130 pagesProcessed 130 pagesReclaimed 10
| >> | > allValidPages 8 singleObjPages
| >> | > 20243 processedObjs 14940 liveObjs 13 shadowObjs 5290 deadObjs
| >> | > 155.7 avgObjsPerPage
| >> | > [Info]: Parameter changes noticed on '11/07/11 12:07:23 ART':
| >> | > reclaimMinPages = 1 Pages.
| >> | >
| >> | >
| >> | > Dale, you explained something to me about the relation between
| >> | > possible dead size and notDead size. What was that? If notDead
| >> | > size = 0 it means that there's nothing more to be reclaimed?
| >> | > Or
| >> | > should possible dead size be = 0?
| >> | >
| >> | > Gracias!
| >> | >
| >> | > Carla.
| >> | >
| >> |
| >> |
| >
|
|
|
Reply | Threaded
Open this post in threaded view
|

Is there a way to trace what objects are being read from disk?

Johan Brichau-2
Hi everyone, 

My best wishes for 2012!

Here is my first question of the year: is there any way to trace which objects are getting read from disk (in the spc)? I would like to try to understand what exactly gemstone is loading from disk when it is doing (long) disk accesses.

The reason I have this question is that on the stones where I activated the configuration option below, we now experience a lot of page reads (disk access) when the application gets used again after the MFC/reclaim.
This does not happen on the stones where we did not activate this config option. Although it does happen occasionally on other stones, this now seems to be a consistent behavior.

GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90

It means that the first access to the application that happens after the MFC becomes *really* slow. I am talking about a delay of roughly 30-60s. During that time, there is a maximum rate of disk read access and I record a high value (compared to normal operation) in the cache statistics for FrameFromFindFree, PageRead, CacheMisses, etc.

Although I expect to have a slower response time after the MFC because the SPC needs to refill properly, I am stunned at the amount of time it takes. I do not expect a large amount of objects that needs to get loaded. This makes me wonder if something is wrong in our application or the gemstone configuration. 

Bumping up the size of the SPC did not make a difference.
I also already increased the size of the private page cache because there was a LocalCacheOverflow happing sometimes.
I tried locking the SPC in memory but did not notice any difference.

cheers,
Johan
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to trace what objects are being read from disk?

Dale Henrichs
Johan,

Best wishes to you for 2012 as well...

To answer your initial question: no there is no way to trace which objects are getting read from disk.

With that said, I would like to take a look at your statmon file before making any specific recommendations.

At first blush, setting `GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90` means that your gems are flushing their working set of persistent objects and the working set needs to be refreshed on the next application access.

LocalPageCacheMisses and LocalCacheOverflow are related to the private page cache, but it is not likely that the private page cache is the source of the i/o issues. The private page cache is only used for communication with the stone these days...

The fact that you are getting any FrameFromFindFree hits at all means that the SPC is under extreme pressure. A large enough SPC should address that problem unless there are other pressures on the cache, like an unusually large number of dirty pages, or something else (page reclaim can be a cause of additional disk i/o and cache pressure).

A look at the statmon file should make it possible for us to get a handle on the direction to head to start making recommendations...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, January 6, 2012 12:42:18 AM
| Subject: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Hi everyone,
|
|
| My best wishes for 2012!
|
|
| Here is my first question of the year: is there any way to trace
| which objects are getting read from disk (in the spc)? I would like
| to try to understand what exactly gemstone is loading from disk when
| it is doing (long) disk accesses.
|
|
|
|
| The reason I have this question is that on the stones where I
| activated the configuration option below, we now experience a lot of
| page reads (disk access) when the application gets used again after
| the MFC/reclaim.
| This does not happen on the stones where we did not activate this
| config option. Although it does happen occasionally on other stones,
| this now seems to be a consistent behavior.
|
|
|
|
| GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
|
|
| It means that the first access to the application that happens after
| the MFC becomes *really* slow. I am talking about a delay of roughly
| 30-60s. During that time, there is a maximum rate of disk read
| access and I record a high value (compared to normal operation) in
| the cache statistics for FrameFromFindFree, PageRead, CacheMisses,
| etc.
|
|
| Although I expect to have a slower response time after the MFC
| because the SPC needs to refill properly, I am stunned at the amount
| of time it takes. I do not expect a large amount of objects that
| needs to get loaded. This makes me wonder if something is wrong in
| our application or the gemstone configuration.
|
|
| Bumping up the size of the SPC did not make a difference.
| I also already increased the size of the private page cache because
| there was a LocalCacheOverflow happing sometimes.
| I tried locking the SPC in memory but did not notice any difference.
|
|
| cheers,
| Johan
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to trace what objects are being read from disk?

Johan Brichau-2
Hi Dale,

I am sending you 2 separate statmonit files in a separate email where the problem occurs.

In the "statmonitor" file:
 - Topaz 761-6 experiences a very high "time waiting for io" & "pagereads".
 - Also: very high "FramesFromFindFree" & FreeFrameCount drops fast

In the "statmonit20120102" file:
- In the first couple of minutes (9h24 - 9h28), Topaz 10-7 and Topaz 9-5 experience a very high "time waiting for io" & "pagereads".
- Only one of those also has "FramesFromFindFree" (but lower)
- The stats come from a stone where I had bumped up the "TargetFreeFrameCount" because I noticed that the FreeFrames often dropped near the FreeFrameLimit, triggering a "framefromfindfree".

Hopefully, the statistics provide some insights to identify the problem. We notice these slow responses consistently now after an MFC but they also happen sometimes during the day.

The server where these stats were taken was not under any other kind of stress (no swapping, 10 of the 20Gb RAM free, no other disk-access process,… ). The iostat/vmstat output also showed that the disks (SAN) are operating at 100% throughput during the page reads.

I have been bumping up the SPC size to 1Gb (coming from 500Mb) but I did not notice any difference. In fact, running with half the SPC size gives the same results: which is *normally* very good. Most of the time there is no disk access at all and the app is served blazingly fast. But because these irregular occurrences of long wait times even often last longer than 60s, they are very annoying.

Thanks for doing this. I really appreciate your help.

Johan



On 06 Jan 2012, at 19:56, Dale Henrichs wrote:

> Johan,
>
> Best wishes to you for 2012 as well...
>
> To answer your initial question: no there is no way to trace which objects are getting read from disk.
>
> With that said, I would like to take a look at your statmon file before making any specific recommendations.
>
> At first blush, setting `GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90` means that your gems are flushing their working set of persistent objects and the working set needs to be refreshed on the next application access.
>
> LocalPageCacheMisses and LocalCacheOverflow are related to the private page cache, but it is not likely that the private page cache is the source of the i/o issues. The private page cache is only used for communication with the stone these days...
>
> The fact that you are getting any FrameFromFindFree hits at all means that the SPC is under extreme pressure. A large enough SPC should address that problem unless there are other pressures on the cache, like an unusually large number of dirty pages, or something else (page reclaim can be a cause of additional disk i/o and cache pressure).
>
> A look at the statmon file should make it possible for us to get a handle on the direction to head to start making recommendations...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Friday, January 6, 2012 12:42:18 AM
> | Subject: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
> |
> | Hi everyone,
> |
> |
> | My best wishes for 2012!
> |
> |
> | Here is my first question of the year: is there any way to trace
> | which objects are getting read from disk (in the spc)? I would like
> | to try to understand what exactly gemstone is loading from disk when
> | it is doing (long) disk accesses.
> |
> |
> |
> |
> | The reason I have this question is that on the stones where I
> | activated the configuration option below, we now experience a lot of
> | page reads (disk access) when the application gets used again after
> | the MFC/reclaim.
> | This does not happen on the stones where we did not activate this
> | config option. Although it does happen occasionally on other stones,
> | this now seems to be a consistent behavior.
> |
> |
> |
> |
> | GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
> |
> |
> | It means that the first access to the application that happens after
> | the MFC becomes *really* slow. I am talking about a delay of roughly
> | 30-60s. During that time, there is a maximum rate of disk read
> | access and I record a high value (compared to normal operation) in
> | the cache statistics for FrameFromFindFree, PageRead, CacheMisses,
> | etc.
> |
> |
> | Although I expect to have a slower response time after the MFC
> | because the SPC needs to refill properly, I am stunned at the amount
> | of time it takes. I do not expect a large amount of objects that
> | needs to get loaded. This makes me wonder if something is wrong in
> | our application or the gemstone configuration.
> |
> |
> | Bumping up the size of the SPC did not make a difference.
> | I also already increased the size of the private page cache because
> | there was a LocalCacheOverflow happing sometimes.
> | I tried locking the SPC in memory but did not notice any difference.
> |
> |
> | cheers,
> | Johan

Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to trace what objects are being read from disk?

Dale Henrichs
Johan,

Would it be possible to send the statmon files immediately preceding the ones you sent. I'd like to be able to see what's going on between the tail end of the MFC and the high PageRead episode.

>From what I've seen so far (focusing on Topaz 761-6), dirty pages are not the issue.

The system is designed to sit right at the free frame limit, so Topaz 761-6 resorted to FramesFromFindFree because the instantaneous demand for free pages exceeded the ability of the free frame page server to respond ... adding another free frame page server (or two) should help with meeting the instantaneous demand for free pages and should eliminate the FramesFromFindFree, but it doesn't necessarily answer the question of why does Topaz 761-6 need so many pages. It shouldn't be necessary to change the TargetFreeFrameCount unless you are literally running out of free frames which doesn't appear to be the case.

If you could increase the size of the SPC to hold the entire repository then you'd obviously not have these types of PageRead issues:), but since you are only seeing spikey behavior and have good performance the majority of the time, I don't think you need to monkey with the size of the SPC ...

I'm still looking at stats, but probably need the preceding statmon files to get a better picture...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, January 6, 2012 12:01:13 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Hi Dale,
|
| I am sending you 2 separate statmonit files in a separate email where
| the problem occurs.
|
| In the "statmonitor" file:
|  - Topaz 761-6 experiences a very high "time waiting for io" &
|  "pagereads".
|  - Also: very high "FramesFromFindFree" & FreeFrameCount drops fast
|
| In the "statmonit20120102" file:
| - In the first couple of minutes (9h24 - 9h28), Topaz 10-7 and Topaz
| 9-5 experience a very high "time waiting for io" & "pagereads".
| - Only one of those also has "FramesFromFindFree" (but lower)
| - The stats come from a stone where I had bumped up the
| "TargetFreeFrameCount" because I noticed that the FreeFrames often
| dropped near the FreeFrameLimit, triggering a "framefromfindfree".
|
| Hopefully, the statistics provide some insights to identify the
| problem. We notice these slow responses consistently now after an
| MFC but they also happen sometimes during the day.
|
| The server where these stats were taken was not under any other kind
| of stress (no swapping, 10 of the 20Gb RAM free, no other
| disk-access process,… ). The iostat/vmstat output also showed that
| the disks (SAN) are operating at 100% throughput during the page
| reads.
|
| I have been bumping up the SPC size to 1Gb (coming from 500Mb) but I
| did not notice any difference. In fact, running with half the SPC
| size gives the same results: which is *normally* very good. Most of
| the time there is no disk access at all and the app is served
| blazingly fast. But because these irregular occurrences of long wait
| times even often last longer than 60s, they are very annoying.
|
| Thanks for doing this. I really appreciate your help.
|
| Johan
|
|
|
| On 06 Jan 2012, at 19:56, Dale Henrichs wrote:
|
| > Johan,
| >
| > Best wishes to you for 2012 as well...
| >
| > To answer your initial question: no there is no way to trace which
| > objects are getting read from disk.
| >
| > With that said, I would like to take a look at your statmon file
| > before making any specific recommendations.
| >
| > At first blush, setting `GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90` means
| > that your gems are flushing their working set of persistent
| > objects and the working set needs to be refreshed on the next
| > application access.
| >
| > LocalPageCacheMisses and LocalCacheOverflow are related to the
| > private page cache, but it is not likely that the private page
| > cache is the source of the i/o issues. The private page cache is
| > only used for communication with the stone these days...
| >
| > The fact that you are getting any FrameFromFindFree hits at all
| > means that the SPC is under extreme pressure. A large enough SPC
| > should address that problem unless there are other pressures on
| > the cache, like an unusually large number of dirty pages, or
| > something else (page reclaim can be a cause of additional disk i/o
| > and cache pressure).
| >
| > A look at the statmon file should make it possible for us to get a
| > handle on the direction to head to start making recommendations...
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Johan Brichau" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Friday, January 6, 2012 12:42:18 AM
| > | Subject: [GS/SS Beta] Is there a way to trace what objects are
| > | being read from disk?
| > |
| > | Hi everyone,
| > |
| > |
| > | My best wishes for 2012!
| > |
| > |
| > | Here is my first question of the year: is there any way to trace
| > | which objects are getting read from disk (in the spc)? I would
| > | like
| > | to try to understand what exactly gemstone is loading from disk
| > | when
| > | it is doing (long) disk accesses.
| > |
| > |
| > |
| > |
| > | The reason I have this question is that on the stones where I
| > | activated the configuration option below, we now experience a lot
| > | of
| > | page reads (disk access) when the application gets used again
| > | after
| > | the MFC/reclaim.
| > | This does not happen on the stones where we did not activate this
| > | config option. Although it does happen occasionally on other
| > | stones,
| > | this now seems to be a consistent behavior.
| > |
| > |
| > |
| > |
| > | GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=90
| > |
| > |
| > | It means that the first access to the application that happens
| > | after
| > | the MFC becomes *really* slow. I am talking about a delay of
| > | roughly
| > | 30-60s. During that time, there is a maximum rate of disk read
| > | access and I record a high value (compared to normal operation)
| > | in
| > | the cache statistics for FrameFromFindFree, PageRead,
| > | CacheMisses,
| > | etc.
| > |
| > |
| > | Although I expect to have a slower response time after the MFC
| > | because the SPC needs to refill properly, I am stunned at the
| > | amount
| > | of time it takes. I do not expect a large amount of objects that
| > | needs to get loaded. This makes me wonder if something is wrong
| > | in
| > | our application or the gemstone configuration.
| > |
| > |
| > | Bumping up the size of the SPC did not make a difference.
| > | I also already increased the size of the private page cache
| > | because
| > | there was a LocalCacheOverflow happing sometimes.
| > | I tried locking the SPC in memory but did not notice any
| > | difference.
| > |
| > |
| > | cheers,
| > | Johan
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to trace what objects are being read from disk?

Johan Brichau-2

On 06 Jan 2012, at 21:45, Dale Henrichs wrote:

Would it be possible to send the statmon files immediately preceding the ones you sent. I'd like to be able to see what's going on between the tail end of the MFC and the high PageRead episode. 

Unfortunately, I don't have any stats for those. I always start statmonitor manually.

I started a monitor now so I can include that entire timeperiod in there.

I'll be back… ;-)

Johan
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to trace what objects are being read from disk?

Stephan Eggermont-3
In reply to this post by Johan Brichau-2

On 6 jan 2012, at 21:01, Johan Brichau wrote:
> The server where these stats were taken was not under any other kind of stress (no swapping, 10 of the 20Gb RAM free, no other disk-access process,… ). The iostat/vmstat output also showed that the disks (SAN) are operating at 100% throughput during the page reads.

Is this SAN under your own control or a hosting providers? With hosting providers it is easy to lose a factor of 100 in throughput compared
to locally connected disks.

Stephan
12