Lots of seaside objects not being GCed (need gemstone advise)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
Hi guys,

Sorry to bother but I started to find many instances of things I think I should not have.... I have a stone that nobody uses since a couple of hours (so I am sure all sessions should have been expired). I have run this code to clean:

ObjectLogEntry emptyLog.
WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.
System beginTransaction. 
SystemRepository reclaimAll.
SystemRepository startNewLog.
System commitTransaction.

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').

However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing. 

The worst is the WACallbackRegistry which then refer to closures and contexts of my callbacks, which could refer to large amount of temp data (like lots of XML objects...).

Is this normal? Any hint how can I get rid of those?

Thanks in advance, 


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent'). 


I forgot to say... This is because I do have the seaside maintenance vm running...



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list

Mariano,

This looks familiar. I *think* it was Dario who encountered a similar problem. Check past message threads. If memory serves (and it usually doesn't), this was about 2-4 months ago.

I'm sure Dale will respond, but tomorrow is the 4th of July. So perhaps not as quickly as you would like.

Let us know whether you find that earlier exchange and if so whether it is the same issue.

On Jul 3, 2015 8:22 PM, "Mariano Martinez Peck via Glass" <[hidden email]> wrote:
Hi guys,

Sorry to bother but I started to find many instances of things I think I should not have.... I have a stone that nobody uses since a couple of hours (so I am sure all sessions should have been expired). I have run this code to clean:

ObjectLogEntry emptyLog.
WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.
System beginTransaction. 
SystemRepository reclaimAll.
SystemRepository startNewLog.
System commitTransaction.

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').

However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing. 

The worst is the WACallbackRegistry which then refer to closures and contexts of my callbacks, which could refer to large amount of temp data (like lots of XML objects...).

Is this normal? Any hint how can I get rid of those?

Thanks in advance, 


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list
Just some ideas...

HTH

> WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
> WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.

These 2 steps should be swapped, I think. First expire and then MFC.

Then, double check the transaction management of the maintenance task.
Should commit correctly.

> System beginTransaction.

Don't need this.

> SystemRepository reclaimAll.

I found that this does not work directly after MFC in same GS session.
I would log out and back in. Your session may still hold onto the
objects you're trying to get rid of (although not explicitly).

> SystemRepository startNewLog.
> System commitTransaction.

Nothing to do with transactions
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list

On Mon, Jul 6, 2015 at 5:51 AM, Otto Behrens <[hidden email]> wrote:
Just some ideas...

HTH


Hi Otto, 
Thanks. My answers below.
 
> WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
> WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.

These 2 steps should be swapped, I think. First expire and then MFC.


Yes, I swapped them in my script after sending this issue. However...it is a cosmetic detail since I WAS already running the maintainance vm (which expires every 1 minutes). So.. the invoke here to #maintenanceTaskExpiration is actually unnecessary. 
 
Then, double check the transaction management of the maintenance task.
Should commit correctly.


Mmmm good idea. I will check the code. 
 
> System beginTransaction.

Don't need this.


Ups...sorry that should have been "System commitTransaction." to commit what I did until that.
Otherwise #reclaimAll will fail (warning) telling me it needs to abort because current transaction needs commit...

 
> SystemRepository reclaimAll.

I found that this does not work directly after MFC in same GS session.
I would log out and back in. Your session may still hold onto the
objects you're trying to get rid of (although not explicitly).


mmmm that kind of remembers reading issues about it.  Anyway, I just trying doing it in another session, but still, no luck :(
 
> SystemRepository startNewLog.
> System commitTransaction.

Nothing to do with transactions

Sorry that was part of my cleaning that come into this script hahaha.

Thanks for your ideas...I am still digging a bit...

--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.

Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:
Hi guys,

Sorry to bother but I started to find many instances of things I think I should not have.... I have a stone that nobody uses since a couple of hours (so I am sure all sessions should have been expired). I have run this code to clean:

ObjectLogEntry emptyLog.
WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.
System beginTransaction. 
SystemRepository reclaimAll.
SystemRepository startNewLog.
System commitTransaction.

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').
One of the interestings that came out of the Larry's "ordeal", is that we found a bug in WACache>>gemstoneReap, where an error while running this method can result in objects getting stuck in the WACache. Basically objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to the error, they may not be removed from the objectsByKey and keysByObjects dictionaries ... thus keeping them alive "forever".

If you check your maintenance vm logs, you might find an error with WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in Larry's case.

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68

However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

The worst is the WACallbackRegistry which then refer to closures and contexts of my callbacks, which could refer to large amount of temp data (like lots of XML objects...).

Is this normal? Any hint how can I get rid of those?
WACallbackRegistry instances are referenced from the WASession instance via the `continuations` WACache ... if you get rid of your WASession instances you'll get rid of the WACallbackRegistry instances.

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list


On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.


Hi Dale.
Thanks, I answer inline. 
 
Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:
Hi guys,

Sorry to bother but I started to find many instances of things I think I should not have.... I have a stone that nobody uses since a couple of hours (so I am sure all sessions should have been expired). I have run this code to clean:

ObjectLogEntry emptyLog.
WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.
System beginTransaction. 
SystemRepository reclaimAll.
SystemRepository startNewLog.
System commitTransaction.

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

Of course. All those numbers are in a system which didn't receive a single request in a whole day. And this is the results after all the cleanings I could do. So this is why I expect to have zero instances of those (meaning .. no zero, but much less in the real system that what I have now),.
 


(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').
One of the interestings that came out of the Larry's "ordeal", is that we found a bug in WACache>>gemstoneReap, where an error while running this method can result in objects getting stuck in the WACache. Basically objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to the error, they may not be removed from the objectsByKey and keysByObjects dictionaries ... thus keeping them alive "forever".

If you check your maintenance vm logs, you might find an error with WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in Larry's case.


I grep but I found no error in my maintenance logs. 

 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code. 
 

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68


Yes, in fact, earlier today I tried #findReferencePathToObject:  with (MySessionSubclass allInstances any) and guess what????
I get an array of only 2 entries, first element is target object and second element is false. Reading method comment says it means there is no path to that object. WTF!!!! so then why they do not go away??? As said, I do run MFC, I do run #reclaimAll... so..... in which scenario would I hold into instances (and in fact found via #allInstances), yet #findReferencePathToObject: would say there is no path?
 


However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

That's cool to hear. So...even if those are little number of objects, this gives me a small scenario of the real system. If this stone has not received a single request in hours, then I should get ZERO instances of those :) Cool. 
 


The worst is the WACallbackRegistry which then refer to closures and contexts of my callbacks, which could refer to large amount of temp data (like lots of XML objects...).

Is this normal? Any hint how can I get rid of those?
WACallbackRegistry instances are referenced from the WASession instance via the `continuations` WACache ... if you get rid of your WASession instances you'll get rid of the WACallbackRegistry instances.


Ok... 

 
Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
While I keep searching... I have a couple of question/findins

1) ObjectLogEntry initialize DID get rid of many things while ObjectLogEntry emptyLog did not. 
Former does "  ObjectQueue := RcQueue new: 100." while latter does "    self objectQueue removeAll.  "
So..could it be that in case of a RC collection, removing elements is not the same as building a new collection? could the RC hold to some stuff we don't want to? 

2) Could it be that some objects get GCed ONLY after a second MFC is run? I know it sounds weird, but in Pharo it was like that (not sure if still)... to be really sure you had to run it 3 times (this was due to #finalize and that part of GC done in 2 steps)

3) #allInstances can include instances already for GC? In other words, if I run MFC and then I do #allInstances... would I get those that were marked as GC (until reclaim happens)? I ask this in order to know if I should run a #reclaimAll for my tests I am doing...

Thanks in advance,





On Mon, Jul 6, 2015 at 4:28 PM, Mariano Martinez Peck <[hidden email]> wrote:


On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.


Hi Dale.
Thanks, I answer inline. 
 
Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:
Hi guys,

Sorry to bother but I started to find many instances of things I think I should not have.... I have a stone that nobody uses since a couple of hours (so I am sure all sessions should have been expired). I have run this code to clean:

ObjectLogEntry emptyLog.
WAGemStoneMaintenanceTask maintenanceTaskMarkForCollect performTask: 0.
WAGemStoneMaintenanceTask maintenanceTaskExpiration performTask: 0.
System beginTransaction. 
SystemRepository reclaimAll.
SystemRepository startNewLog.
System commitTransaction.

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

Of course. All those numbers are in a system which didn't receive a single request in a whole day. And this is the results after all the cleanings I could do. So this is why I expect to have zero instances of those (meaning .. no zero, but much less in the real system that what I have now),.
 


(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').
One of the interestings that came out of the Larry's "ordeal", is that we found a bug in WACache>>gemstoneReap, where an error while running this method can result in objects getting stuck in the WACache. Basically objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to the error, they may not be removed from the objectsByKey and keysByObjects dictionaries ... thus keeping them alive "forever".

If you check your maintenance vm logs, you might find an error with WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in Larry's case.


I grep but I found no error in my maintenance logs. 

 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code. 
 

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68


Yes, in fact, earlier today I tried #findReferencePathToObject:  with (MySessionSubclass allInstances any) and guess what????
I get an array of only 2 entries, first element is target object and second element is false. Reading method comment says it means there is no path to that object. WTF!!!! so then why they do not go away??? As said, I do run MFC, I do run #reclaimAll... so..... in which scenario would I hold into instances (and in fact found via #allInstances), yet #findReferencePathToObject: would say there is no path?
 


However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

That's cool to hear. So...even if those are little number of objects, this gives me a small scenario of the real system. If this stone has not received a single request in hours, then I should get ZERO instances of those :) Cool. 
 


The worst is the WACallbackRegistry which then refer to closures and contexts of my callbacks, which could refer to large amount of temp data (like lots of XML objects...).

Is this normal? Any hint how can I get rid of those?
WACallbackRegistry instances are referenced from the WASession instance via the `continuations` WACache ... if you get rid of your WASession instances you'll get rid of the WACallbackRegistry instances.


Ok... 

 
Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




--



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list

On 07/06/2015 12:28 PM, Mariano Martinez Peck wrote:


On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.


Hi Dale.
Thanks, I answer inline. 
 
Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

Of course. All those numbers are in a system which didn't receive a single request in a whole day. And this is the results after all the cleanings I could do. So this is why I expect to have zero instances of those (meaning .. no zero, but much less in the real system that what I have now),.
Okay, you didn't have a single request today, so these objects must be hanging around from a previous day. Did you have zero instances the day before?

Without any other information, it is possible that these objects got left behind because of a voting issue (i.e., reference left in the head of a vm) ... did you cycle all of the gems before running the mfc? What is your setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE? If I'm not mistaken GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE does not guarantee that there aren't other references in the gems head to objects ... 

These instances did not appear out of thin air and there is a logical reason for them to be still hanging around ... This a complicated system with a number of moving parts and there is no way to rule out bugs either ...

Without a detailed accounting of the "starting point" and the gems, started and stopped between that point and now it is impossible to guess we cannot guess what might have happened ...

I would suggest that at some point you record the oops of the session objects so that we don't end up finding that every time we look we are looking at a different set of sessions ...


 


(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').
One of the interestings that came out of the Larry's "ordeal", is that we found a bug in WACache>>gemstoneReap, where an error while running this method can result in objects getting stuck in the WACache. Basically objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to the error, they may not be removed from the objectsByKey and keysByObjects dictionaries ... thus keeping them alive "forever".

If you check your maintenance vm logs, you might find an error with WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in Larry's case.


I grep but I found no error in my maintenance logs. 

 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?
 

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68


Yes, in fact, earlier today I tried #findReferencePathToObject:  with (MySessionSubclass allInstances any) and guess what????
I get an array of only 2 entries, first element is target object and second element is false. Reading method comment says it means there is no path to that object. WTF!!!! so then why they do not go away??? As said, I do run MFC, I do run #reclaimAll... so..... in which scenario would I hold into instances (and in fact found via #allInstances), yet #findReferencePathToObject: would say there is no path?
If I'm not mistaken, #findReferencePathToObject: scans for references in the repository, but does not take into account instances in a vm's memory ...

At this point I don't know  whether these objects are staying alive because of persistent references or because they are in a vms head and being voted down ...

 


However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

That's cool to hear. So...even if those are little number of objects, this gives me a small scenario of the real system. If this stone has not received a single request in hours, then I should get ZERO instances of those :) Cool. 
 

To know with certainty whether or not an object is considered truly dead, you can look at System class>>_deadNotReclaimed and see if the oops of the suspect sessions are in it or not (see the comment in the method for conditions of use). Barring any nasty bugs they are likely to have be voted down ...

If you set STN_TRAN_LOG_DEBUG_LEVEL=3 in your system.conf and restart your stone ... it is possible to find the list of objects in the possible dead set, the list of objects voted down (and the session id that voted them down) and the original list of deadNotReclaimed ...

Of course if you restart your stone then the heads of the various gems will be cleared and it is likely that the objects will go away on the next mfc ... Note that in 3.1.0.6, it is possible that the gem doing the mfc is hanging onto some objects in it's head, so unless you logout after the mfc, that might be the reason for voting guys down ...

Dale


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list


On Mon, Jul 6, 2015 at 6:47 PM, Dale Henrichs <[hidden email]> wrote:

On 07/06/2015 12:28 PM, Mariano Martinez Peck wrote:


On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.


Hi Dale.
Thanks, I answer inline. 
 
Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

Of course. All those numbers are in a system which didn't receive a single request in a whole day. And this is the results after all the cleanings I could do. So this is why I expect to have zero instances of those (meaning .. no zero, but much less in the real system that what I have now),.
Okay, you didn't have a single request today, so these objects must be hanging around from a previous day. Did you have zero instances the day before?

Without any other information, it is possible that these objects got left behind because of a voting issue (i.e., reference left in the head of a vm) ... did you cycle all of the gems before running the mfc? What is your setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE? If I'm not mistaken GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE does not guarantee that there aren't other references in the gems head to objects ... 

These instances did not appear out of thin air and there is a logical reason for them to be still hanging around ... This a complicated system with a number of moving parts and there is no way to rule out bugs either ...

Without a detailed accounting of the "starting point" and the gems, started and stopped between that point and now it is impossible to guess we cannot guess what might have happened ...

I would suggest that at some point you record the oops of the session objects so that we don't end up finding that every time we look we are looking at a different set of sessions ...


Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below... 
 




 


(just as some examples).

The good news is that ALL the sessions do look expired:

(DpWebSession allInstances select: [ :each | (each instVarNamed: 'parent') isNil ]) size 32

(expired sessions have a nil 'parent').
One of the interestings that came out of the Larry's "ordeal", is that we found a bug in WACache>>gemstoneReap, where an error while running this method can result in objects getting stuck in the WACache. Basically objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to the error, they may not be removed from the objectsByKey and keysByObjects dictionaries ... thus keeping them alive "forever".

If you check your maintenance vm logs, you might find an error with WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in Larry's case.


I grep but I found no error in my maintenance logs. 

 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?


hahahahaha how can I keep my respect after this? hahaha. Sorry. What another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is killing ahahah (perfect excuse!)
OK...so yeah, it halted. 
So if I understand correct, a possible fix is what you submitted to  https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should do. I guess first step is to apply above fix. Then... A simple 
WACache allInstances do: [:each | each gemstoneReap]   would not do it? 
 
 

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68


Yes, in fact, earlier today I tried #findReferencePathToObject:  with (MySessionSubclass allInstances any) and guess what????
I get an array of only 2 entries, first element is target object and second element is false. Reading method comment says it means there is no path to that object. WTF!!!! so then why they do not go away??? As said, I do run MFC, I do run #reclaimAll... so..... in which scenario would I hold into instances (and in fact found via #allInstances), yet #findReferencePathToObject: would say there is no path?
If I'm not mistaken, #findReferencePathToObject: scans for references in the repository, but does not take into account instances in a vm's memory

Ahhhh!!!! while #allInstances answer both!
 
...

At this point I don't know  whether these objects are staying alive because of persistent references or because they are in a vms head and being voted down ...

 


However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

That's cool to hear. So...even if those are little number of objects, this gives me a small scenario of the real system. If this stone has not received a single request in hours, then I should get ZERO instances of those :) Cool. 
 

To know with certainty whether or not an object is considered truly dead, you can look at System class>>_deadNotReclaimed and see if the oops of the suspect sessions are in it or not (see the comment in the method for conditions of use). Barring any nasty bugs they are likely to have be voted down ...

If you set STN_TRAN_LOG_DEBUG_LEVEL=3 in your system.conf and restart your stone ... it is possible to find the list of objects in the possible dead set, the list of objects voted down (and the session id that voted them down) and the original list of deadNotReclaimed ...

Of course if you restart your stone then the heads of the various gems will be cleared and it is likely that the objects will go away on the next mfc ... Note that in 3.1.0.6, it is possible that the gem doing the mfc is hanging onto some objects in it's head, so unless you logout after the mfc, that might be the reason for voting guys down ...


OK, thanks for this explanation.  



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?


hahahahaha how can I keep my respect after this? hahaha. Sorry. What another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is killing ahahah (perfect excuse!)
OK...so yeah, it halted. 
So if I understand correct, a possible fix is what you submitted to  https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should do. I guess first step is to apply above fix. Then... A simple 
WACache allInstances do: [:each | each gemstoneReap]   would not do it? 
 

Ouch... no... It wasn't that..the Halt I got was simple because... the code you pasted says:

  do: [ :session | 
          (keysByObject includesKey: session)
            ifTrue: self halt

While it should be:

  do: [ :session | 
          (keysByObject includesKey: session)
            ifTrue: [ Halt halt ]
And doing that it doesn't halt.
 
Damn... I need to keep hunting then...

 

If the WASessions are not stuck in a WAApplication, then it's likely that you have some accidental reference to the WASession objects and you'll have to trace the reference path back to a persistent root using Repository>>findReferencePathToObject: .. this method only returns one reference path... In 3.2 we've created Repository>>findAllReferencePathsToObject: that finds and returns all of the reference paths (in a pinch you could upgrade your repository to 3.2.6 just to run the aalysis) ...

[1] https://github.com/GsDevKit/Seaside31/issues/68


Yes, in fact, earlier today I tried #findReferencePathToObject:  with (MySessionSubclass allInstances any) and guess what????
I get an array of only 2 entries, first element is target object and second element is false. Reading method comment says it means there is no path to that object. WTF!!!! so then why they do not go away??? As said, I do run MFC, I do run #reclaimAll... so..... in which scenario would I hold into instances (and in fact found via #allInstances), yet #findReferencePathToObject: would say there is no path?
If I'm not mistaken, #findReferencePathToObject: scans for references in the repository, but does not take into account instances in a vm's memory

Ahhhh!!!! while #allInstances answer both!
 
...

At this point I don't know  whether these objects are staying alive because of persistent references or because they are in a vms head and being voted down ...

 


However...I cannot explain why I still have all that garbage above if all sessions are expired. Is that normal? I would expect to have nothing.
It's not normal:)

That's cool to hear. So...even if those are little number of objects, this gives me a small scenario of the real system. If this stone has not received a single request in hours, then I should get ZERO instances of those :) Cool. 
 

To know with certainty whether or not an object is considered truly dead, you can look at System class>>_deadNotReclaimed and see if the oops of the suspect sessions are in it or not (see the comment in the method for conditions of use). Barring any nasty bugs they are likely to have be voted down ...

If you set STN_TRAN_LOG_DEBUG_LEVEL=3 in your system.conf and restart your stone ... it is possible to find the list of objects in the possible dead set, the list of objects voted down (and the session id that voted them down) and the original list of deadNotReclaimed ...

Of course if you restart your stone then the heads of the various gems will be cleared and it is likely that the objects will go away on the next mfc ... Note that in 3.1.0.6, it is possible that the gem doing the mfc is hanging onto some objects in it's head, so unless you logout after the mfc, that might be the reason for voting guys down ...


OK, thanks for this explanation.  



--



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list


On 07/06/2015 02:35 PM, Mariano Martinez Peck wrote:
> While I keep searching... I have a couple of question/findins
>
> 1) ObjectLogEntry initialize DID get rid of many things while
> ObjectLogEntry emptyLog did not.
> Former does "  ObjectQueue := RcQueue new: 100." while latter does "  
>   self objectQueue removeAll.  "
> So..could it be that in case of a RC collection, removing elements is
> not the same as building a new collection? could the RC hold to some
> stuff we don't want to?
Well #removeAll should get rid of everything in the queue. One thing is
that removeAll will touch all of the objects in the object log, so the
vm doing the #removeAll will likely have a bunch of those objects in
it's head and depending upon exactly what you do (all other gems shut
down, logout after doing `ObjectLogEntry emptyLog`, record and share the
results of the mfc, and logout immediately after doing the mfc and
before doing the reclaimAll) ...

Recording oops and running with STN_TRAN_LOG_DEBUG_LEVEL=3 will make it
possible to find out in detail what is going on ... but do keep in mind
that at this debug level we are recording a lot of information about all
of the sytems operations in the tranlogs, so the size of the tranlogs
can dramatically increase not to mention the system will run slower in
some places because of the increased tranlog activity ...

>
> 2) Could it be that some objects get GCed ONLY after a second MFC is
> run? I know it sounds weird, but in Pharo it was like that (not sure
> if still)... to be really sure you had to run it 3 times (this was due
> to #finalize and that part of GC done in 2 steps)

No, I don't think so ... if an object is kept alive after an mfc it is
either because there is a path to the object from a persistent root or a
gem voted the object down ... now keep in mind that if you have things
going on in a live system, the system state is changing at each commit
... so an mfc can see an object that is alive while a concurrent commit
may break the last living link and by the time you look the reference is
gone ... only when you have all gems shutdown and only your one topaz
session is alive can you be sure that that another session isn't
changing things "out from under you"
>
> 3) #allInstances can include instances already for GC? In other words,
> if I run MFC and then I do #allInstances... would I get those that
> were marked as GC (until reclaim happens)? I ask this in order to know
> if I should run a #reclaimAll for my tests I am doing...
>
It depends upon the view that you get when you abort. The comments in
the listInstances code do not specify whether or not possibleDead or
deadNotReclaimed objects are excluded from the scanned objects or not,
so presumably it is possible to pull objects to life by the vary act of
scanning for instances "too soon" ...

It is very tricky to test these things down to the single object
resolution, since the act of looking can skew your results ... Even
#reclaimAll is not 100% accurate as we've made improvements to
#reclaimAll in 3.2 and more in 3.3....

Dale


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list


On 07/06/2015 03:09 PM, Mariano Martinez Peck wrote:


On Mon, Jul 6, 2015 at 6:47 PM, Dale Henrichs <[hidden email]> wrote:

On 07/06/2015 12:28 PM, Mariano Martinez Peck wrote:


On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Mariano,

I've read over your other messages and I guess you are still struggling to clean these guys up ... Rest of my comments in line.


Hi Dale.
Thanks, I answer inline. 
 
Dael

On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:

Then...I check some #allInstances size and I get this:

DpWebSession allInstances size 32
WACallbackRegistry allInstances size 217
JQueryClass allInstances size 16519
WACache  allInstances size 35
WAApplication allInstances size 3
WARenderVisitor allInstances size 217
WARenderContext allInstances size 217
WAHtmlCanvas allInstances size 909
.....

Right off the bat, my observation is that this doesn't seem like a  lot of uncollected objects, presumably you churn through a lot more sessions than this on a regular basis, so these objects appear to be the exception instead of the rule...

Of course. All those numbers are in a system which didn't receive a single request in a whole day. And this is the results after all the cleanings I could do. So this is why I expect to have zero instances of those (meaning .. no zero, but much less in the real system that what I have now),.
Okay, you didn't have a single request today, so these objects must be hanging around from a previous day. Did you have zero instances the day before?

Without any other information, it is possible that these objects got left behind because of a voting issue (i.e., reference left in the head of a vm) ... did you cycle all of the gems before running the mfc? What is your setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE? If I'm not mistaken GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE does not guarantee that there aren't other references in the gems head to objects ... 

These instances did not appear out of thin air and there is a logical reason for them to be still hanging around ... This a complicated system with a number of moving parts and there is no way to rule out bugs either ...

Without a detailed accounting of the "starting point" and the gems, started and stopped between that point and now it is impossible to guess we cannot guess what might have happened ...

I would suggest that at some point you record the oops of the session objects so that we don't end up finding that every time we look we are looking at a different set of sessions ...


Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below...
Do you also have the marksweep guy running? GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the floor ... a stale direct reference to one of these in the the TOC can also keep these guys alive ... I think that was why I marveled in the earlier message about it only being 32 instances ... with the referencePath method not finding anything you should be able to declare a victory:)

With a dynamic system it is difficult to get a complete answer without shutting things down, so you have to settle for approximate answers ..

For these 32 sessions, perhaps you should snapshot the extents[1] and verify that the objects will go away in a separate sandbox stone so that you will not perturb production while satisfying your curiousity...

[1] http://downloads.gemtalksystems.com/docs/GemStone64/3.2.x/GS64-SysAdmin-3.2/9-BackupAndRestore.htm#pgfId-1069325

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list


On 07/06/2015 03:31 PM, Mariano Martinez Peck wrote:
 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?


hahahahaha how can I keep my respect after this? hahaha. Sorry. What another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is killing ahahah (perfect excuse!)
OK...so yeah, it halted. 
So if I understand correct, a possible fix is what you submitted to  https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should do. I guess first step is to apply above fix. Then... A simple 
WACache allInstances do: [:each | each gemstoneReap]   would not do it? 
 

Ouch... no... It wasn't that..the Halt I got was simple because... the code you pasted says:


I guess we could have inferred that this was true, once the reference paths came back clean ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
Dale,

I continue with the efforts and based on your comments I did a couple of things. First, kill all seaside gems. Stop stone. Start stone again (not seaside gems), and run:

((SystemRepository listInstances: { DpWebSession }) at: 1)    

gives me like 50 instances.. meaning there are 50 instances ON DISK. Then I tried to find a path for any of THOSE:

SystemRepository findReferencePathToObject: ((SystemRepository listInstances: { DpWebSession }) at: 1) first 

But still, #findReferencePathToObject: answers an empty path. 

If the path is empty, then "((SystemRepository listInstances: { DpWebSession }) at: 1)    " should give me zero. Correct? Because both are checking on disk...

So there is something I don't understand. 

Thanks in advance, 


On Mon, Jul 6, 2015 at 8:22 PM, Dale Henrichs <[hidden email]> wrote:


On 07/06/2015 03:31 PM, Mariano Martinez Peck wrote:
 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?


hahahahaha how can I keep my respect after this? hahaha. Sorry. What another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is killing ahahah (perfect excuse!)
OK...so yeah, it halted. 
So if I understand correct, a possible fix is what you submitted to  https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should do. I guess first step is to apply above fix. Then... A simple 
WACache allInstances do: [:each | each gemstoneReap]   would not do it? 
 

Ouch... no... It wasn't that..the Halt I got was simple because... the code you pasted says:


I guess we could have inferred that this was true, once the reference paths came back clean ...

Dale



--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
Mariano,

The fact that there are no reference paths to the instances of DpWebSession means that the objects are eligible for garbage collection until you have run an mfc and the objects were not voted down and the possibleDead have been reclaimed the objects are still alive ..

So record the objIds of the 50 or so instances of DpWebSession. logout. do an mfc. logout. wait for the deadNotReclaimed count to drop to zero (you can monitor this live in statmon or by sampling the stat in a new session that aborts frequently)  ...

To really know what's going on, we want STN_TRAN_LOG_DEBUG_LEVEL=3 so that we can use the tranlogs to understand what went on if we don't see the expected results ...

When the deadNotReclaimed is at 0, login, greab allInstances and look at the oops of the new set (if any) and compare to the oops of the original 50 ... if there are duplicates we will want to search the tranlogs for those oops and try to understand what went on ...

Dale

On 07/06/2015 05:36 PM, Mariano Martinez Peck wrote:
Dale,

I continue with the efforts and based on your comments I did a couple of things. First, kill all seaside gems. Stop stone. Start stone again (not seaside gems), and run:

((SystemRepository listInstances: { DpWebSession }) at: 1)    

gives me like 50 instances.. meaning there are 50 instances ON DISK. Then I tried to find a path for any of THOSE:

SystemRepository findReferencePathToObject: ((SystemRepository listInstances: { DpWebSession }) at: 1) first 

But still, #findReferencePathToObject: answers an empty path. 

If the path is empty, then "((SystemRepository listInstances: { DpWebSession }) at: 1)    " should give me zero. Correct? Because both are checking on disk...

So there is something I don't understand. 

Thanks in advance, 


On Mon, Jul 6, 2015 at 8:22 PM, Dale Henrichs <[hidden email]> wrote:


On 07/06/2015 03:31 PM, Mariano Martinez Peck wrote:
 

Since you have so few sessions, we can test whether the object  leak is due to this bug:

  | sessions |
  System abortTransaction.
  sessions := WASession allInstances
    select: [ :each | (each instVarNamed: 'parent') isNil ].
  System abortTransaction.
  WAApplication allInstances
    do: [ :app |
      | cache keysByObject |
      cache := app cache.
      keysByObject := cache instVarNamed: 'keysByObject'.
      sessions
        do: [ :session |
          (keysByObject includesKey: session)
            ifTrue: self halt ] ]

If you get a halt running the above, then you've been bitten by the bug and you you need to arrange to remove the session objects from both dicts. See WACache>>gemstoneReap for example code ...

I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?


hahahahaha how can I keep my respect after this? hahaha. Sorry. What another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is killing ahahah (perfect excuse!)
OK...so yeah, it halted. 
So if I understand correct, a possible fix is what you submitted to  https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should do. I guess first step is to apply above fix. Then... A simple 
WACache allInstances do: [:each | each gemstoneReap]   would not do it? 
 

Ouch... no... It wasn't that..the Halt I got was simple because... the code you pasted says:


I guess we could have inferred that this was true, once the reference paths came back clean ...

Dale



--


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list
In reply to this post by GLASS mailing list
Dale,

I have continue analyzing this in other stones and after some testing it is clear that some sessions (the size would depend on the system usage) are NOT GCed unless I shut all seaside gems down or cycle them. Originally I was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I was cycling seaside gems once a day as part of GC. Then, I changed it to 100% and stop restarting gems. Now...it COULD have happened that I did not restarted all gems since I modified the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the system was still running with 90% and yet I was not restarting seaside gems anymore. That could explain why I hold onto some instances, right?  Another possibility is the "stale reference" you mention below. I continue answering below:
 
Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below...
Do you also have the marksweep guy running?

The guy that every 30 minutes perform the "System _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this guy could affect? He does not hold any seaside session as far as I know...i simply sends "System _generationScavenge_vmMarkSweep.". Could it be that the #wait: freezes the gem and therefore does not answer the the voting? 

Mmmmm now I read in the sysadmin guide: "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be
finalized and garbage collection pauses at this point. Commit records accumulate, garbage accumulates, and a variety of problems can ensue."

Uffff maybe since this guys practically sleeps all the time and yet does not do a commit nor abort in each iteration of the loop...maybe this guy is preventing the vote?

Even more......the sysadmin guide also says: "If a committed object in the pom area has been modified, it is copied to the old area if a scavenge occurs before the change is committed."


If it is not that ...maybe you asked because....if it happened that I modified the session (by any seaside request) and the _generationScavenge_vmMarkSweep happened before the request processing finished, then the session would have been moved to "old" space? But even in this case, when the request processing finishes, it would commit the "modified persistent object" (seaside session)...  

 
GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the floor ... a stale direct reference to one of these in the the TOC can also keep these guys alive ... I think that was why I marveled in the earlier message about it only being 32 instances ... with the referencePath method not finding anything you should be able to declare a victory:)

What do you mean by a stale direct reference?  
 
Thanks 


--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list


On 07/07/2015 05:49 AM, Mariano Martinez Peck wrote:
Dale,

I have continue analyzing this in other stones and after some testing it is clear that some sessions (the size would depend on the system usage) are NOT GCed unless I shut all seaside gems down or cycle them. Originally I was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I was cycling seaside gems once a day as part of GC. Then, I changed it to 100% and stop restarting gems. Now...it COULD have happened that I did not restarted all gems since I modified the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the system was still running with 90% and yet I was not restarting seaside gems anymore.
Yes. The meaning of GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100 is that all pomgen spaces are dropped ... this does not mean that all references to persistent objects in the vm are dropped ....
That could explain why I hold onto some instances, right?  Another possibility is the "stale reference" you mention below. I continue answering below:
 
Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below...
Do you also have the marksweep guy running?

The guy that every 30 minutes perform the "System _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this guy could affect? He does not hold any seaside session as far as I know...i simply sends "System _generationScavenge_vmMarkSweep.". Could it be that the #wait: freezes the gem and therefore does not answer the the voting?
No if a gem is busy, the stone patiently waits for the gem to hit a transaction boundary - the vote happens on a transaction boundary. This is one of the factors that causes reclaimAll to be non-deterministic (our goal is for recalimAll to be deterministic, but the system _is_ a complex state machine). Gems can be busy doing a long running transaction or a a gem can be idle sitting in transaction - like an idle topaz or GemTools and unless the system triggers an event to cause the gem to wake up, like hitting the commit record limit thresholds, the system patiently waits for the Gem to finish it's "work".

Mmmmm now I read in the sysadmin guide: "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be
finalized and garbage collection pauses at this point. Commit records accumulate, garbage accumulates, and a variety of problems can ensue."

Uffff maybe since this guys practically sleeps all the time and yet does not do a commit nor abort in each iteration of the loop...maybe this guy is preventing the vote?
Recall the little process that you installed the vm marksweep code? This particular process is there so that a Seaside gem is guaranteed to have a Smalltalk process ready and available to respond to the SigAbort ... The SigAbort is sent by the stone, when commit records accumulate ...

Even more......the sysadmin guide also says: "If a committed object in the pom area has been modified, it is copied to the old area if a scavenge occurs before the change is committed."
If it is not that ...maybe you asked because....if it happened that I modified the session (by any seaside request) and the _generationScavenge_vmMarkSweep happened before the request processing finished, then the session would have been moved to "old" space? But even in this case, when the request processing finishes, it would commit the "modified persistent object" (seaside session)...  

 
GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the floor ... a stale direct reference to one of these in the the TOC can also keep these guys alive ... I think that was why I marveled in the earlier message about it only being 32 instances ... with the referencePath method not finding anything you should be able to declare a victory:)

What do you mean by a stale direct reference?  
 
The case I was thinking about is that you could create a reference to a piece of session state that ultimately refers back to a session from a purely temporary object and if that temporary object is alive in the TOC when it's time to vote and no sweep has been run then that reference _could_ cause the WASession to be voted down ...

At the end of the day, when we are talking about a handful of sessions being kept alive for an mfc or two,  I don't think it is a major problem ... you can survive with this temporary leakage and it shouldn't become necessary to shut the entire system down and restart to make sure that the last crumb is swept from the table ...

All of these checks and balances ensure that we do not garbage collect an object that shouldn't be garbage collected and in a dynamic system that means that we have to err on the side of caution.

It might be worth verifying that we don't have a bug in the system, by copying the extent into a sandbox, where you can do a clinical attempt to run an mfc and satisfy yourself that these 50 session objects can indeed be collect in a tightly controlled system ... if you re still unable to collect these objects under controlled conditions and the references paths are empty we are looking at the real posibility of a bug and we will want to get to the bottom of it...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list


On Tue, Jul 7, 2015 at 3:56 PM, Dale Henrichs <[hidden email]> wrote:


On 07/07/2015 05:49 AM, Mariano Martinez Peck wrote:
Dale,

I have continue analyzing this in other stones and after some testing it is clear that some sessions (the size would depend on the system usage) are NOT GCed unless I shut all seaside gems down or cycle them. Originally I was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I was cycling seaside gems once a day as part of GC. Then, I changed it to 100% and stop restarting gems. Now...it COULD have happened that I did not restarted all gems since I modified the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the system was still running with 90% and yet I was not restarting seaside gems anymore.
Yes. The meaning of GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100 is that all pomgen spaces are dropped ... this does not mean that all references to persistent objects in the vm are dropped ....

Indeed. That's why to be 100% sure to drop all references to persistent objects you likely need to recycle seaside gems (even with EM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100)
 
That could explain why I hold onto some instances, right?  Another possibility is the "stale reference" you mention below. I continue answering below:
 
Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below...
Do you also have the marksweep guy running?

The guy that every 30 minutes perform the "System _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this guy could affect? He does not hold any seaside session as far as I know...i simply sends "System _generationScavenge_vmMarkSweep.". Could it be that the #wait: freezes the gem and therefore does not answer the the voting?
No if a gem is busy, the stone patiently waits for the gem to hit a transaction boundary - the vote happens on a transaction boundary.

Dale, with this comment, I do not understand why then the comment in the sys admin guide I pasted below "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be finalized and garbage collection pauses at this point."
 
This is one of the factors that causes reclaimAll to be non-deterministic (our goal is for recalimAll to be deterministic, but the system _is_ a complex state machine). Gems can be busy doing a long running transaction or a a gem can be idle sitting in transaction - like an idle topaz or GemTools and unless the system triggers an event to cause the gem to wake up, like hitting the commit record limit thresholds, the system patiently waits for the Gem to finish it's "work".

Ok... so it will wait. Ok, I got that. 
 

Mmmmm now I read in the sysadmin guide: "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be
finalized and garbage collection pauses at this point. Commit records accumulate, garbage accumulates, and a variety of problems can ensue."

Uffff maybe since this guys practically sleeps all the time and yet does not do a commit nor abort in each iteration of the loop...maybe this guy is preventing the vote?
Recall the little process that you installed the vm marksweep code? This particular process is there so that a Seaside gem is guaranteed to have a Smalltalk process ready and available to respond to the SigAbort ... The SigAbort is sent by the stone, when commit records accumulate ...

Well. Here is where I have the last question. That little process we are talking about does this code:

 [
  | count minutesToForceGemGC |
  count := 0.
  minutesToForceGemGC := 30.
   [ true ] whileTrue: [
  (Delay forSeconds: 30) wait.
  count := count + 1.
  (count \\\ (minutesToForceGemGC * 2)) = 0 ifTrue: [
  System _generationScavenge_vmMarkSweep.
  count := 0.
  ].
  ].
 ] forkAt: Processor lowestPriority.

So my question is.... in that code you see I do NOT ever do a commit or abort. So I don't see how this code can enter what you describe as "the vote happens on a transaction boundary". I mean...that code is 99.9% time in a #wait doing no commit nor abort. So...wouldn't that make the voting process to wait for it forever?  Or the SigAbort is what would prevent that?


 
Even more......the sysadmin guide also says: "If a committed object in the pom area has been modified, it is copied to the old area if a scavenge occurs before the change is committed."
If it is not that ...maybe you asked because....if it happened that I modified the session (by any seaside request) and the _generationScavenge_vmMarkSweep happened before the request processing finished, then the session would have been moved to "old" space? But even in this case, when the request processing finishes, it would commit the "modified persistent object" (seaside session)...  

 
GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the floor ... a stale direct reference to one of these in the the TOC can also keep these guys alive ... I think that was why I marveled in the earlier message about it only being 32 instances ... with the referencePath method not finding anything you should be able to declare a victory:)

What do you mean by a stale direct reference?  
 
The case I was thinking about is that you could create a reference to a piece of session state that ultimately refers back to a session from a purely temporary object and if that temporary object is alive in the TOC when it's time to vote and no sweep has been run then that reference _could_ cause the WASession to be voted down ...

OK , I got that scenario. I think I prefer this than cycling gems. 


At the end of the day, when we are talking about a handful of sessions being kept alive for an mfc or two,  I don't think it is a major problem ... you can survive with this temporary leakage and it shouldn't become necessary to shut the entire system down and restart to make sure that the last crumb is swept from the table ...

Indeed. Fully agree. I just wanted to confirm that my sessions around were THIS scenario and nor a major leak. 
 


All of these checks and balances ensure that we do not garbage collect an object that shouldn't be garbage collected and in a dynamic system that means that we have to err on the side of caution.

It might be worth verifying that we don't have a bug in the system, by copying the extent into a sandbox, where you can do a clinical attempt to run an mfc and satisfy yourself that these 50 session objects can indeed be collect in a tightly controlled system ... if you re still unable to collect these objects under controlled conditions and the references paths are empty we are looking at the real posibility of a bug and we will want to get to the bottom of it...

Yes, I will run BackScan  too in the servers and check that too. 

Thanks Dale, 

--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Lots of seaside objects not being GCed (need gemstone advise)

GLASS mailing list


On 07/11/2015 02:28 PM, Mariano Martinez Peck wrote:


On Tue, Jul 7, 2015 at 3:56 PM, Dale Henrichs <[hidden email]> wrote:


On 07/07/2015 05:49 AM, Mariano Martinez Peck wrote:
Dale,

I have continue analyzing this in other stones and after some testing it is clear that some sessions (the size would depend on the system usage) are NOT GCed unless I shut all seaside gems down or cycle them. Originally I was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I was cycling seaside gems once a day as part of GC. Then, I changed it to 100% and stop restarting gems. Now...it COULD have happened that I did not restarted all gems since I modified the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the system was still running with 90% and yet I was not restarting seaside gems anymore.
Yes. The meaning of GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100 is that all pomgen spaces are dropped ... this does not mean that all references to persistent objects in the vm are dropped ....

Indeed. That's why to be 100% sure to drop all references to persistent objects you likely need to recycle seaside gems (even with EM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100)
Right ... the odds of dead object references drops lower when using this approach but to reach 100% drastic measures are needed ... Frankly this is why I made the initial comment about it being only 32 sessions ....
 
That could explain why I hold onto some instances, right?  Another possibility is the "stale reference" you mention below. I continue answering below:
 
Good point. Thanks. I will remember it for next time: each time I am dealing with this kind of stuff: cycle all seaside gems first! 
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid having to cycle gems. 
I will continue with the tests with cycling/killing the gems... but.... continue reading below...
Do you also have the marksweep guy running?

The guy that every 30 minutes perform the "System _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this guy could affect? He does not hold any seaside session as far as I know...i simply sends "System _generationScavenge_vmMarkSweep.". Could it be that the #wait: freezes the gem and therefore does not answer the the voting?
No if a gem is busy, the stone patiently waits for the gem to hit a transaction boundary - the vote happens on a transaction boundary.

Dale, with this comment, I do not understand why then the comment in the sys admin guide I pasted below "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be finalized and garbage collection pauses at this point."
 
I'm not sure how the "the stone waits for the gem to hit a transaction boundary" is inconsistent with "gems do not vote until they complete their current transaction"...
This is one of the factors that causes reclaimAll to be non-deterministic (our goal is for recalimAll to be deterministic, but the system _is_ a complex state machine). Gems can be busy doing a long running transaction or a a gem can be idle sitting in transaction - like an idle topaz or GemTools and unless the system triggers an event to cause the gem to wake up, like hitting the commit record limit thresholds, the system patiently waits for the Gem to finish it's "work".

Ok... so it will wait. Ok, I got that.
Ah, good:)
 

Mmmmm now I read in the sysadmin guide: "Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be
finalized and garbage collection pauses at this point. Commit records accumulate, garbage accumulates, and a variety of problems can ensue."

Uffff maybe since this guys practically sleeps all the time and yet does not do a commit nor abort in each iteration of the loop...maybe this guy is preventing the vote?
Recall the little process that you installed the vm marksweep code? This particular process is there so that a Seaside gem is guaranteed to have a Smalltalk process ready and available to respond to the SigAbort ... The SigAbort is sent by the stone, when commit records accumulate ...

Well. Here is where I have the last question. That little process we are talking about does this code:

 [
  | count minutesToForceGemGC |
  count := 0.
  minutesToForceGemGC := 30.
   [ true ] whileTrue: [
  (Delay forSeconds: 30) wait.
  count := count + 1.
  (count \\\ (minutesToForceGemGC * 2)) = 0 ifTrue: [
  System _generationScavenge_vmMarkSweep.
  count := 0.
  ].
  ].
 ] forkAt: Processor lowestPriority.

So my question is.... in that code you see I do NOT ever do a commit or abort. So I don't see how this code can enter what you describe as "the vote happens on a transaction boundary". I mean...that code is 99.9% time in a #wait doing no commit nor abort. So...wouldn't that make the voting process to wait for it forever?  Or the SigAbort is what would prevent that?

Good question ... Immediately before the code you'r shown, you will find the following code:

 Exception
  installStaticException:
    [:ex :cat :num :args |
      "Run the abort in a lowPriority process, since we must acquire the
       transactionMutex."
      [
        GRPlatform current transactionMutex
          critical: [
            GRPlatform current doAbortTransaction ].
        System enableSignaledAbortError.
      ] forkAt: Processor lowestPriority.
    ]
  category: GemStoneError
  number: 6009
  subtype: nil.
 System enableSignaledAbortError.
 

The above code installs a static exception handler for the SigAbort exception (error number 6009). The SigAbort is an asynchronous signal that it is signaled upon notification from the stone. The vm signals the SigAbort in the context of  the currently active GsProcess. if there are no explicit handlers on the stack, the list of static handlers is searched. If a static handler is found, the handler is run by the currently active GsProcess.... If there are no active processes (i.e., all of the processes are blocked on a semaphore or a socket call), then the vm waits for the first process to go active ... if no process wakes up before the stone hits the STN_GEM_ABORT_TIMEOUT, the stone will signal a lost OT effectively killing the session ... Since Seaside gems could very well be blocked sitting on an accept() call, the "extra" process was created to wake up every 30 seconds (half of the default STN_GEM_ABORT_TIMEOUT) to try to guarantee that there will always be an active GsProcess available to abort when a Seaside gem is idle and waiting for requests ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
12