Smalltalk › Instantiations

[8.6.1] GC and Weak Arrays

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Richard Sargent

[8.6.1] GC and Weak Arrays

Administrator

I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.

Seth Berman

Re: [8.6.1] GC and Weak Arrays

Hi Richard,

Just for my own clarification, are you under the impression that nilling a slot in a weak object should trigger some action that would get the finalizer running? Or is it that you feel within 1.5 seconds of that event there should have been a number of opportunities, via GCs, to finalize the object?

Can you tell me what the basic load is during this time. In other words, since the #finalizeCycle processing takes place during idle time, could it be the case that this isn't getting a chance to run for awhile?

I would ask if the finalize queue is overflowing, which would cause an object to be delayed in making its way into the queue, but I think you have adjusted your queue size to be pretty large if I remember correctly.

I might be missing something from what you said, so I will just wait for your reply before I start trying some things out.

-- Seth

On Wednesday, February 11, 2015 at 1:21:46 PM UTC-5, Richard Sargent wrote:

I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?

Richard Sargent

Re: [8.6.1] GC and Weak Arrays

Administrator

On Wednesday, February 11, 2015 at 12:30:55 PM UTC-8, Seth Berman wrote:

Just for my own clarification, are you under the impression that nilling a slot in a weak object should trigger some action that would get the finalizer running? Or is it that you feel within 1.5 seconds of that event there should have been a number of opportunities, via GCs, to finalize the object?

Almost the opposite. I am finding the garbage collector is telling me *it* collected the thing, when I was the one that dropped the reference. (Actually, let me qualify that. I *think* I was the one to nil it. I suppose it is possible the GC has nilled it but not yet told me about it and I think the reference is still being held. I need to check this out!)

Can you tell me what the basic load is during this time. In other words, since the #finalizeCycle processing takes place during idle time, could it be the case that this isn't getting a chance to run for awhile?

I am using a high-priority process for "MyWeakArray" finalization. On receipt of a #gcEvent:, I signal the high-priority process. It grabs the finialization protect semaphore and iterates through the finalization queue (which is large enough).

I would ask if the finalize queue is overflowing, which would cause an object to be delayed in making its way into the queue, but I think you have adjusted your queue size to be pretty large if I remember correctly.

Correct.

I might be missing something from what you said, so I will just wait for your reply before I start trying some things out.

I appreciate your looking into this. Since I first posted, I have gleaned more information, but it still remains a puzzle.

1) I explicitly invoke my specialized finalization code and learn that the finalize queue is empty.
2) I next iterate over 32K "buckets", nilling out the entries I wish to drop. These buckets hold the instances of MyWeakArray.
3) I immediately explicitly invoke my specialized finalization code again and learn that the finalize queue is not longer empty. It now contains two references to the specific object I am using as the focus of this investigation. [Background: I have two weak array maps, from two symmetric perspectives.] Apparently because of the activities in step 2, the object in question has been added to the finalization queue for each of the two maps. Step 2 took 3.06 seconds and allowed me to log more than 60,000 lines to a log file. Formatting of time stamps and print strings should generate considerable short-lived garbage. Not once during this interval did my registered #gcEvent: handler get called.

I'm finding it hard to believe no scavenge cycles finished during this. At 11:46:14:815, I get my next #gcEvent: callback. Step 2 above finished at 11:46:14:773. The previous #gcEvent: callback occurred at 11:46:11:492. (Funny. I never pay attention to how fast computers have gotten, until looking at this situation!)

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle? (By the way, I get pretty much the same situation without all the logging. If anything, I would have expected that the logging should increase the number of scavenges.)

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?

Thanks!

-- Seth

On Wednesday, February 11, 2015 at 1:21:46 PM UTC-5, Richard Sargent wrote:
I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?

Seth Berman

Re: [8.6.1] GC and Weak Arrays

Hi Richard,

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle?
I image a scavenge would have occurred during this time...unless the newspace setting is very very large. Any special space settings?

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? (From the first post)

The application thread is halted from scavenge start to end...so I think the answer to your question here is no.

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?
These gc events are posted to the async message queue, so the only other thing I can think of is something relating to this.

I fear I'm not being much help, but nothing is jumping out at me on this one.

-- Seth

On Wednesday, February 11, 2015 at 4:01:19 PM UTC-5, Richard Sargent wrote:

On Wednesday, February 11, 2015 at 12:30:55 PM UTC-8, Seth Berman wrote:
Just for my own clarification, are you under the impression that nilling a slot in a weak object should trigger some action that would get the finalizer running? Or is it that you feel within 1.5 seconds of that event there should have been a number of opportunities, via GCs, to finalize the object?

Almost the opposite. I am finding the garbage collector is telling me *it* collected the thing, when I was the one that dropped the reference. (Actually, let me qualify that. I *think* I was the one to nil it. I suppose it is possible the GC has nilled it but not yet told me about it and I think the reference is still being held. I need to check this out!)

Can you tell me what the basic load is during this time. In other words, since the #finalizeCycle processing takes place during idle time, could it be the case that this isn't getting a chance to run for awhile?

I am using a high-priority process for "MyWeakArray" finalization. On receipt of a #gcEvent:, I signal the high-priority process. It grabs the finialization protect semaphore and iterates through the finalization queue (which is large enough).

I would ask if the finalize queue is overflowing, which would cause an object to be delayed in making its way into the queue, but I think you have adjusted your queue size to be pretty large if I remember correctly.

Correct.

I might be missing something from what you said, so I will just wait for your reply before I start trying some things out.

I appreciate your looking into this. Since I first posted, I have gleaned more information, but it still remains a puzzle.

1) I explicitly invoke my specialized finalization code and learn that the finalize queue is empty.
2) I next iterate over 32K "buckets", nilling out the entries I wish to drop. These buckets hold the instances of MyWeakArray.
3) I immediately explicitly invoke my specialized finalization code again and learn that the finalize queue is not longer empty. It now contains two references to the specific object I am using as the focus of this investigation. [Background: I have two weak array maps, from two symmetric perspectives.] Apparently because of the activities in step 2, the object in question has been added to the finalization queue for each of the two maps. Step 2 took 3.06 seconds and allowed me to log more than 60,000 lines to a log file. Formatting of time stamps and print strings should generate considerable short-lived garbage. Not once during this interval did my registered #gcEvent: handler get called.

I'm finding it hard to believe no scavenge cycles finished during this. At 11:46:14:815, I get my next #gcEvent: callback. Step 2 above finished at 11:46:14:773. The previous #gcEvent: callback occurred at 11:46:11:492. (Funny. I never pay attention to how fast computers have gotten, until looking at this situation!)

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle? (By the way, I get pretty much the same situation without all the logging. If anything, I would have expected that the logging should increase the number of scavenges.)

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?

Thanks!

-- Seth

On Wednesday, February 11, 2015 at 1:21:46 PM UTC-5, Richard Sargent wrote:
I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?

Richard Sargent

Re: [8.6.1] GC and Weak Arrays

Administrator

On Wednesday, February 11, 2015 at 5:03:47 PM UTC-8, Seth Berman wrote:

Hi Richard,

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle?
I image a scavenge would have occurred during this time...unless the newspace setting is very very large. Any special space settings?

No. newSpaceSize=2097152 (2MB). Everything else is commented out.

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? (From the first post)
The application thread is halted from scavenge start to end...so I think the answer to your question here is no.

Good! I had hoped that was the case.

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?
These gc events are posted to the async message queue, so the only other thing I can think of is something relating to this.

That may be it. I was able to determine that entries in the weak array were being nilled during that 3 seconds. I have had problems with async queue overflow, so this is a very real possibility. If the async handling were suppressed all that time, one or more scavenges might have occurred with the corresponding nilling of slots, but without reporting them. I assume the #gcEvent: would come through later, if the async queue didn't actually overflow. Is that the case?

I have modified the code in the "clear for session" logic to tolerate the GC having already nilled the slot. That seems to have cleaned up my problem. Now to start stripping out all the probe code and make certain it really is fixed.

Then, I'll start investigating why the async queue is not being processed. It seems likely something I have has turned it off "for the duration", but finding that is easier said than done. :-)

Thanks for helping fill in the blanks!

I fear I'm not being much help, but nothing is jumping out at me on this one.

-- Seth

On Wednesday, February 11, 2015 at 4:01:19 PM UTC-5, Richard Sargent wrote:
On Wednesday, February 11, 2015 at 12:30:55 PM UTC-8, Seth Berman wrote:
Just for my own clarification, are you under the impression that nilling a slot in a weak object should trigger some action that would get the finalizer running? Or is it that you feel within 1.5 seconds of that event there should have been a number of opportunities, via GCs, to finalize the object?

Almost the opposite. I am finding the garbage collector is telling me *it* collected the thing, when I was the one that dropped the reference. (Actually, let me qualify that. I *think* I was the one to nil it. I suppose it is possible the GC has nilled it but not yet told me about it and I think the reference is still being held. I need to check this out!)

Can you tell me what the basic load is during this time. In other words, since the #finalizeCycle processing takes place during idle time, could it be the case that this isn't getting a chance to run for awhile?

I am using a high-priority process for "MyWeakArray" finalization. On receipt of a #gcEvent:, I signal the high-priority process. It grabs the finialization protect semaphore and iterates through the finalization queue (which is large enough).

I would ask if the finalize queue is overflowing, which would cause an object to be delayed in making its way into the queue, but I think you have adjusted your queue size to be pretty large if I remember correctly.

Correct.

I might be missing something from what you said, so I will just wait for your reply before I start trying some things out.

I appreciate your looking into this. Since I first posted, I have gleaned more information, but it still remains a puzzle.

1) I explicitly invoke my specialized finalization code and learn that the finalize queue is empty.
2) I next iterate over 32K "buckets", nilling out the entries I wish to drop. These buckets hold the instances of MyWeakArray.
3) I immediately explicitly invoke my specialized finalization code again and learn that the finalize queue is not longer empty. It now contains two references to the specific object I am using as the focus of this investigation. [Background: I have two weak array maps, from two symmetric perspectives.] Apparently because of the activities in step 2, the object in question has been added to the finalization queue for each of the two maps. Step 2 took 3.06 seconds and allowed me to log more than 60,000 lines to a log file. Formatting of time stamps and print strings should generate considerable short-lived garbage. Not once during this interval did my registered #gcEvent: handler get called.

I'm finding it hard to believe no scavenge cycles finished during this. At 11:46:14:815, I get my next #gcEvent: callback. Step 2 above finished at 11:46:14:773. The previous #gcEvent: callback occurred at 11:46:11:492. (Funny. I never pay attention to how fast computers have gotten, until looking at this situation!)

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle? (By the way, I get pretty much the same situation without all the logging. If anything, I would have expected that the logging should increase the number of scavenges.)

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?

Thanks!

-- Seth

On Wednesday, February 11, 2015 at 1:21:46 PM UTC-5, Richard Sargent wrote:
I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?

Seth Berman

Re: [8.6.1] GC and Weak Arrays

Hi Richard,

No problem...happy to help.

I assume the #gcEvent: would come through later, if the async queue didn't actually overflow. Is that the case?

Yes, that looks correct to me.

-- Seth

On Wednesday, February 11, 2015 at 8:13:50 PM UTC-5, Richard Sargent wrote:

On Wednesday, February 11, 2015 at 5:03:47 PM UTC-8, Seth Berman wrote:
Hi Richard,

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle?
I image a scavenge would have occurred during this time...unless the newspace setting is very very large. Any special space settings?

No. newSpaceSize=2097152 (2MB). Everything else is commented out.

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? (From the first post)
The application thread is halted from scavenge start to end...so I think the answer to your question here is no.

Good! I had hoped that was the case.

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?
These gc events are posted to the async message queue, so the only other thing I can think of is something relating to this.

That may be it. I was able to determine that entries in the weak array were being nilled during that 3 seconds. I have had problems with async queue overflow, so this is a very real possibility. If the async handling were suppressed all that time, one or more scavenges might have occurred with the corresponding nilling of slots, but without reporting them. I assume the #gcEvent: would come through later, if the async queue didn't actually overflow. Is that the case?

I have modified the code in the "clear for session" logic to tolerate the GC having already nilled the slot. That seems to have cleaned up my problem. Now to start stripping out all the probe code and make certain it really is fixed.

Then, I'll start investigating why the async queue is not being processed. It seems likely something I have has turned it off "for the duration", but finding that is easier said than done. :-)

Thanks for helping fill in the blanks!

I fear I'm not being much help, but nothing is jumping out at me on this one.

-- Seth

On Wednesday, February 11, 2015 at 4:01:19 PM UTC-5, Richard Sargent wrote:
On Wednesday, February 11, 2015 at 12:30:55 PM UTC-8, Seth Berman wrote:
Just for my own clarification, are you under the impression that nilling a slot in a weak object should trigger some action that would get the finalizer running? Or is it that you feel within 1.5 seconds of that event there should have been a number of opportunities, via GCs, to finalize the object?

Almost the opposite. I am finding the garbage collector is telling me *it* collected the thing, when I was the one that dropped the reference. (Actually, let me qualify that. I *think* I was the one to nil it. I suppose it is possible the GC has nilled it but not yet told me about it and I think the reference is still being held. I need to check this out!)

Can you tell me what the basic load is during this time. In other words, since the #finalizeCycle processing takes place during idle time, could it be the case that this isn't getting a chance to run for awhile?

I am using a high-priority process for "MyWeakArray" finalization. On receipt of a #gcEvent:, I signal the high-priority process. It grabs the finialization protect semaphore and iterates through the finalization queue (which is large enough).

I would ask if the finalize queue is overflowing, which would cause an object to be delayed in making its way into the queue, but I think you have adjusted your queue size to be pretty large if I remember correctly.

Correct.

I might be missing something from what you said, so I will just wait for your reply before I start trying some things out.

I appreciate your looking into this. Since I first posted, I have gleaned more information, but it still remains a puzzle.

1) I explicitly invoke my specialized finalization code and learn that the finalize queue is empty.
2) I next iterate over 32K "buckets", nilling out the entries I wish to drop. These buckets hold the instances of MyWeakArray.
3) I immediately explicitly invoke my specialized finalization code again and learn that the finalize queue is not longer empty. It now contains two references to the specific object I am using as the focus of this investigation. [Background: I have two weak array maps, from two symmetric perspectives.] Apparently because of the activities in step 2, the object in question has been added to the finalization queue for each of the two maps. Step 2 took 3.06 seconds and allowed me to log more than 60,000 lines to a log file. Formatting of time stamps and print strings should generate considerable short-lived garbage. Not once during this interval did my registered #gcEvent: handler get called.

I'm finding it hard to believe no scavenge cycles finished during this. At 11:46:14:815, I get my next #gcEvent: callback. Step 2 above finished at 11:46:14:773. The previous #gcEvent: callback occurred at 11:46:11:492. (Funny. I never pay attention to how fast computers have gotten, until looking at this situation!)

Does it sound reasonable that all of the above could not have triggered a scavenge in the middle? (By the way, I get pretty much the same situation without all the logging. If anything, I would have expected that the logging should increase the number of scavenges.)

Is there anything that could have prevented the VM from making the #gcEvent: callback, if a scavenge had occurred?

Thanks!

-- Seth

On Wednesday, February 11, 2015 at 1:21:46 PM UTC-5, Richard Sargent wrote:
I came across a puzzling problem recently. I haven't been able to narrow it down to a simple test case.

I have a weak array, I nil out one of its slots. 1.5 seconds later I get a GC event telling me the object has been collected from the weak array. That seems like a very long time.

17:30:25:576 #clearForSession: finished; keys contain '(() nil)'

17:30:27:063 [18632]MyWeakArray class>>#gcEvent: mask = 9
17:30:27:063 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization started
17:30:27:065 [9164]MyWeakArray class>>#finalizeAllMyWeakReferences Finalization protect granted
17:30:27:065 [9164]finalizedSlot: 2 removedObject: ((2015 41 2 10 17 30 16.1435880661011))

I have confirmed that the #gcEvent: callback and all the related GC activity occurs nearly instantaneously. There is no waiting for semaphores holding things up. (The numbers inside square brackets are the identity hashes of the currently running processes. You can see that I receive the event in one process, signal my semaphore, and have my high-priority process take over instantly.)

Is it possible that the scavenge that triggered the #gcEvent: started before I nilled the slot at 17:30:25:57 but finished after another second or more? Can anyone (Seth?) shed any light on this?