As the garbage collection doesn't seem to create much space and the
database is growing a bit too fast, I wanted to check with ScanBackup which objects are involved. The latest version (12) doesn't seem to be able to open backup files, as the code to determine the file type is missing. The earlier version (9) that has the code, also doesn't work with it. See Issue 207. Growth: 23 nov. 224MB 24 nov. 320MB 30 nov. 352MB 1 dec. 496MB 3 dec. 512MB 5 dec. 528MB 6 dec. 672MB Stephan Eggermont |
Stephan,
Thanks very much for the bug report. I'll take a look. Until then you can try with version 8. James On Dec 6, 2010, at 4:30 AM, Stephan Eggermont wrote: > As the garbage collection doesn't seem to create much space and the > database is growing a bit too fast, I wanted to check with ScanBackup > which objects are involved. The latest version (12) doesn't seem to be able > to open backup files, as the code to determine the file type is missing. > The earlier version (9) that has the code, also doesn't work with it. > > See Issue 207. > > Growth: > 23 nov. 224MB > 24 nov. 320MB > 30 nov. 352MB > 1 dec. 496MB > 3 dec. 512MB > 5 dec. 528MB > 6 dec. 672MB > > Stephan Eggermont > |
In reply to this post by Stephan Eggermont-3
Stephan,
I see that James is taking a look at the SanBackup issue. In order to address the growth issues (pardon me if you already answered these questions:) - make sure you are running the maintenance vm script (more specifically the Seaside session expiration code - try stopping/restarting all of your gems and then run an MFC. (see Issue 136: http://code.google.com/p/glassdb/issues/detail?id=136) Dale On 12/06/2010 04:30 AM, Stephan Eggermont wrote: > As the garbage collection doesn't seem to create much space and the > database is growing a bit too fast, I wanted to check with ScanBackup > which objects are involved. The latest version (12) doesn't seem to be able > to open backup files, as the code to determine the file type is missing. > The earlier version (9) that has the code, also doesn't work with it. > > See Issue 207. > > Growth: > 23 nov. 224MB > 24 nov. 320MB > 30 nov. 352MB > 1 dec. 496MB > 3 dec. 512MB > 5 dec. 528MB > 6 dec. 672MB > > Stephan Eggermont > |
Hello Dale,
I'm not sure I understand how this session expiration works (it's a Seaside 2.8 app) We work with runSeasideGems start/stop, and I ran Norberts script with the Seaside Gems stopped. In GemTools, we printed the following: |classes sizes result| result := Dictionary new. classes := (PackageInfo named: 'Kompas-Web') classes. sizes := SystemRepository countInstances: classes. sizes doWithIndex: [:each :i | each > 1000 ifTrue: [result at: (classes at: i) put: each ]]. result It showed us we had 11070 KSSession instances. We are not yet that succesful. The number should be below 100 currently. Only one of them was expired: (KSSession allInstances select: [:each | each expired]) size We then expired them all at the Seaside level: (KSSession allInstances do: [:each | each expire] Then doing a SystemRepository markForCollection gave impressing results: it counted 800,000 live objects and 5,000,000 possible dead. That was not enough to garbage collect them. What is the next step? Should I run/adapt the maintenance tasks of StartMaintenance30? Stephan |
On 07.12.2010, at 12:51, Stephan Eggermont wrote: > Hello Dale, > > I'm not sure I understand how this session expiration works (it's a Seaside 2.8 app) > We work with runSeasideGems start/stop, and I ran Norberts script with the Seaside > Gems stopped. > Which script do you mean? > In GemTools, we printed the following: > > |classes sizes result| > result := Dictionary new. > classes := (PackageInfo named: 'Kompas-Web') classes. > sizes := SystemRepository countInstances: classes. > sizes doWithIndex: [:each :i | > each > 1000 ifTrue: [result at: (classes at: i) put: each ]]. > result > > It showed us we had 11070 KSSession instances. We are not yet that succesful. > The number should be below 100 currently. > > Only one of them was expired: > > (KSSession allInstances > select: [:each | each expired]) > size > > We then expired them all at the Seaside level: > > (KSSession allInstances do: [:each | each expire] > > Then doing a SystemRepository markForCollection gave > impressing results: it counted 800,000 live objects and > 5,000,000 possible dead. > > That was not enough to garbage collect them. > What is the next step? Should I run/adapt the maintenance tasks of > StartMaintenance30? > Norbert |
Hello Norbert,
On 7 dec 2010, at 14:19, Norbert Hartl wrote: > Which script do you mean? The one from Issue 136 > It seems there is something wrong with the maintenance script. The script adds some things to the object log which is available under /tools/objectLog. If there is nothing about the expiry of sessions then the script does not work properly. Yeah, in logs I find in the current admingcgem.log: [Info]: Finished second GarRemoveNotDead at 12/07/2010 02:16:34 PM CET. Removed 0 objs from possibleDead [Info]: SweepWsUnion ending possibleDead size = 717 Finished sweeping possible dead at 12/07/2010 02:16:34 PM CET. possible dead size=717 notDead size=5253458 Starting doSweepWsUnion at 12/07/2010 03:03:10 PM CET Starting values: WSU size=46898 PD size=5262338 MarkSweepArgsSType::(): buffer size= 320 pages GarRemoveNotDead: removing live from PD. live size: 5262338 Finished first GarRemoveNotDead at 12/07/2010 03:03:15 PM CET. Removed 5262338 objects from possibleDead [Info]: WsUnion during first sweep size = 21 [Info]: Finished second GarRemoveNotDead at 12/07/2010 03:03:15 PM CET. Removed 0 objs from possibleDead [Info]: SweepWsUnion ending possibleDead size = 0 Finished sweeping possible dead at 12/07/2010 03:03:15 PM CET. possible dead size=0 notDead size=5262338 Starting doSweepWsUnion at 12/07/2010 03:16:30 PM CET Starting values: WSU size=46912 PD size=5264924 MarkSweepArgsSType::(): buffer size= 320 pages GarRemoveNotDead: removing live from PD. live size: 5264924 Finished first GarRemoveNotDead at 12/07/2010 03:16:35 PM CET. Removed 5264924 objects from possibleDead [Info]: WsUnion during first sweep size = 21 [Info]: Finished second GarRemoveNotDead at 12/07/2010 03:16:35 PM CET. Removed 0 objs from possibleDead [Info]: SweepWsUnion ending possibleDead size = 0 Finished sweeping possible dead at 12/07/2010 03:16:35 PM CET. possible dead size=0 notDead size=5264924 and in maintenance_gem.log Before commit: true 2010-12-07T06:09:10.70276093482971-08:00 Unregistering... Begin AutoTransaction: true Finish expiration loop: true ...Expired: 0 sessions. Before commit: true 2010-12-07T06:10:10.70736598968506-08:00 Unregistering... Begin AutoTransaction: true Finish expiration loop: true ...Expired: 1 sessions. Before commit: true 2010-12-07T06:11:10.72887492179871-08:00 Unregistering... Begin AutoTransaction: true Finish expiration loop: true ...Expired: 0 sessions. Before commit: true 2010-12-07T06:12:10.73608803749084-08:00 Unregistering... Begin AutoTransaction: true Finish expiration loop: true ...Expired: 1 sessions. Before commit: true 2010-12-07T06:13:10.76529002189636-08:00 Unregistering... Begin AutoTransaction: true > If objects are "possible dead" then the other gems et. al have the chance to veto the collection of objects. I would say it is unlikely that all of the 5.000.000 objects are not really dead. Do you know that markForCollection does just a "mark for collection"? It does not garbage collect the objects. Log out all of your DataCurator sessions and wait a few minutes. If things are normal you should see the empty space in the extent grow. If not, complain again. It doesn't. Do I need to have more running than glass 14143 0.0 4.5 691288 370744 ? Ssl Nov23 0:28 /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/sys/stoned seaside -z/opt/gemstone/product/seaside/data/system.conf -l/opt/gemstone/log/seaside.log glass 14144 0.0 1.9 602620 163120 ? Sl Nov23 8:31 /opt/gemstone/product/sys/shrpcmonitor 'seaside@127.0.1.1' setlocationstr 31250 setnumpages 49 setnumprocs 0 setmemlock 5000 setspinlockcount 3906 settargetfflim glass 14150 0.0 1.6 579240 136456 ? S Nov23 0:10 /opt/gemstone/product/sys/pgsvrmain seaside@127.0.1.1 0 1 -1 glass 14151 0.0 5.7 579520 472232 ? S Nov23 0:29 /opt/gemstone/product/sys/pgsvrmain TCP 48160 90 glass 14160 0.0 1.5 615732 124176 ? Sl Nov23 13:41 /opt/gemstone/product/sys/gem pagemanager 'seaside' -T 5000 glass 14165 0.0 5.7 617520 467352 ? Sl Nov23 0:27 /opt/gemstone/product/sys/gem reclaimgcgem 'seaside' 0 0 -T 5000 glass 14166 0.0 5.6 617396 464204 ? Sl Nov23 13:39 /opt/gemstone/product/sys/gem admingcgem 'seaside' -T 5000 glass 14167 0.0 1.6 632712 133976 ? Sl Nov23 0:21 /opt/gemstone/product/sys/gem symbolgem 'seaside' -T 10000 glass 14339 0.0 5.5 1232224 456512 ? Sl Nov23 3:00 /opt/gemstone/product/bin/topaz -l -T200000 glass 14468 0.0 0.0 28840 2096 ? Ss Nov23 0:00 /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/sys/netldid gs64ldi -g -n -aglass -l/opt/gemstone/log/gs64ldi.log Do I need to manually add a task to the admin process? I'm using 2.8 Stephan |
Stephan,
The possible dead size of 5M does mean that you have that many objects that are not directly connected into the persistent object graph. The WSUnion sweep removes objects that have been referenced by objects that were written since the MFC completed AND (and this is an important and) objects that were voted down by gems. Since you are presumably running a fairly standard seaside setup (minus the maintenance vm) one of the seaside gems (or a GemTools gem) are voting down all of the dead probably because the vm has a root reference of some sort stuck in it's head ... which is likely to be a manifestation of Issue 136 and you are just unlucky that an object stuck in a vm's head is keeping 5M objects alive ... There are two things that we could do: 1. recycle all of your gems and run another MFC 2. start a statmonitor, run another MFC and see if we can figure out which gem (or gems) are voting down objects ... Since it only takes one well placed object to keep a bunch of objects alive, we may not have much success in tracking down _the culprit_.... Restarting all of the gems that are attached to the stone (including GemTools gems) and then running an MFC followed by a reclaimAll (which should force the immediate reclaim of the dead objects): run SystemRepository markForCollection % run SystemRepository postReclaimAll: SystemRepository reclaimAll % Note that reclaimAll will expect all gems to vote so it is best to close any development gems while attempting to run reclaimAll (the seaside gems will vote automatically).. This should should get most if not all of your 5M dead disposed of... Dale On 12/07/2010 07:21 AM, Stephan Eggermont wrote: > Hello Norbert, > > > On 7 dec 2010, at 14:19, Norbert Hartl wrote: >> Which script do you mean? > > The one from Issue 136 > >> It seems there is something wrong with the maintenance script. The script adds some things to the object log which is available under /tools/objectLog. If there is nothing about the expiry of sessions then the script does not work properly. > > Yeah, in logs I find in the current admingcgem.log: > > [Info]: Finished second GarRemoveNotDead at 12/07/2010 02:16:34 PM CET. Removed 0 objs from possibleDead > [Info]: SweepWsUnion ending possibleDead size = 717 > Finished sweeping possible dead at 12/07/2010 02:16:34 PM CET. > possible dead size=717 notDead size=5253458 > Starting doSweepWsUnion at 12/07/2010 03:03:10 PM CET > Starting values: WSU size=46898 PD size=5262338 > MarkSweepArgsSType::(): buffer size= 320 pages > GarRemoveNotDead: removing live from PD. live size: 5262338 > Finished first GarRemoveNotDead at 12/07/2010 03:03:15 PM CET. Removed 5262338 objects from possibleDead > [Info]: WsUnion during first sweep size = 21 > [Info]: Finished second GarRemoveNotDead at 12/07/2010 03:03:15 PM CET. Removed 0 objs from possibleDead > [Info]: SweepWsUnion ending possibleDead size = 0 > Finished sweeping possible dead at 12/07/2010 03:03:15 PM CET. > possible dead size=0 notDead size=5262338 > Starting doSweepWsUnion at 12/07/2010 03:16:30 PM CET > Starting values: WSU size=46912 PD size=5264924 > MarkSweepArgsSType::(): buffer size= 320 pages > GarRemoveNotDead: removing live from PD. live size: 5264924 > Finished first GarRemoveNotDead at 12/07/2010 03:16:35 PM CET. Removed 5264924 objects from possibleDead > [Info]: WsUnion during first sweep size = 21 > [Info]: Finished second GarRemoveNotDead at 12/07/2010 03:16:35 PM CET. Removed 0 objs from possibleDead > [Info]: SweepWsUnion ending possibleDead size = 0 > Finished sweeping possible dead at 12/07/2010 03:16:35 PM CET. > possible dead size=0 notDead size=5264924 > > and in maintenance_gem.log > > Before commit: true > 2010-12-07T06:09:10.70276093482971-08:00 > Unregistering... > Begin AutoTransaction: true > Finish expiration loop: true > ...Expired: 0 sessions. > Before commit: true > 2010-12-07T06:10:10.70736598968506-08:00 > Unregistering... > Begin AutoTransaction: true > Finish expiration loop: true > ...Expired: 1 sessions. > Before commit: true > 2010-12-07T06:11:10.72887492179871-08:00 > Unregistering... > Begin AutoTransaction: true > Finish expiration loop: true > ...Expired: 0 sessions. > Before commit: true > 2010-12-07T06:12:10.73608803749084-08:00 > Unregistering... > Begin AutoTransaction: true > Finish expiration loop: true > ...Expired: 1 sessions. > Before commit: true > 2010-12-07T06:13:10.76529002189636-08:00 > Unregistering... > Begin AutoTransaction: true > >> If objects are "possible dead" then the other gems et. al have the chance to veto the collection of objects. I would say it is unlikely that all of the 5.000.000 objects are not really dead. Do you know that markForCollection does just a "mark for collection"? It does not garbage collect the objects. Log out all of your DataCurator sessions and wait a few minutes. If things are normal you should see the empty space in the extent grow. If not, complain again. > > It doesn't. > > Do I need to have more running than > glass 14143 0.0 4.5 691288 370744 ? Ssl Nov23 0:28 /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/sys/stoned seaside -z/opt/gemstone/product/seaside/data/system.conf -l/opt/gemstone/log/seaside.log > glass 14144 0.0 1.9 602620 163120 ? Sl Nov23 8:31 /opt/gemstone/product/sys/shrpcmonitor 'seaside@127.0.1.1' setlocationstr 31250 setnumpages 49 setnumprocs 0 setmemlock 5000 setspinlockcount 3906 settargetfflim > glass 14150 0.0 1.6 579240 136456 ? S Nov23 0:10 /opt/gemstone/product/sys/pgsvrmain seaside@127.0.1.1 0 1 -1 > glass 14151 0.0 5.7 579520 472232 ? S Nov23 0:29 /opt/gemstone/product/sys/pgsvrmain TCP 48160 90 > glass 14160 0.0 1.5 615732 124176 ? Sl Nov23 13:41 /opt/gemstone/product/sys/gem pagemanager 'seaside' -T 5000 > glass 14165 0.0 5.7 617520 467352 ? Sl Nov23 0:27 /opt/gemstone/product/sys/gem reclaimgcgem 'seaside' 0 0 -T 5000 > glass 14166 0.0 5.6 617396 464204 ? Sl Nov23 13:39 /opt/gemstone/product/sys/gem admingcgem 'seaside' -T 5000 > glass 14167 0.0 1.6 632712 133976 ? Sl Nov23 0:21 /opt/gemstone/product/sys/gem symbolgem 'seaside' -T 10000 > glass 14339 0.0 5.5 1232224 456512 ? Sl Nov23 3:00 /opt/gemstone/product/bin/topaz -l -T200000 > glass 14468 0.0 0.0 28840 2096 ? Ss Nov23 0:00 /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/sys/netldid gs64ldi -g -n -aglass -l/opt/gemstone/log/gs64ldi.log > > Do I need to manually add a task to the admin process? I'm using 2.8 > > Stephan > > |
Thanks Dale,
Found the culprit. This topaz that has been running since Nov23: >> glass 14339 0.0 5.5 1232224 456512 ? Sl Nov23 3:00 /opt/gemstone/product/bin/topaz -l -T200000 made the difference. After killing it I'm back to 803566 live objects and 142052 possible dead, of 12784680 bytes. After making a full backup and restoring from that, I'm back at a 130,023,424 filesize with 2015232 (bytes?) free. Stephan |
On 12/07/2010 03:12 PM, Stephan Eggermont wrote:
> Thanks Dale, > > Found the culprit. This topaz that has been running since Nov23: > >>> glass 14339 0.0 5.5 1232224 456512 ? Sl Nov23 3:00 /opt/gemstone/product/bin/topaz -l -T200000 > > made the difference. After killing it I'm back to 803566 live objects and 142052 possible dead, of 12784680 bytes. > After making a full backup and restoring from that, I'm back at a 130,023,424 filesize with 2015232 (bytes?) free. > > Stephan Stephan, Excellent ... I'm glad we got that one figured out:) Dale |
Free forum by Nabble | Edit this page |