Is it possible to suspend the garbage collector?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to suspend the garbage collector?

Max Leske

Hi,

I have a rather annoying problem. I’m running a time critical piece of code that reads a big (~90MB) image segment from a file. I’ve optimized loading as far as possible and now GC takes far longer than the loading itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection during the load.

For completeness, here’s the use case: the process is socket activated, which means that the first request coming in will start the process. When the image starts it will load the segment to restore the last state of the application and, once that’s done, serve the request. The critical time includes vm startup, image startup, starting the server in the image and loading the snapshot. With a big snapshot the loading time of the snapshot is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

 - 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  4.4% {72ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    4.4% {72ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      4.4% {72ms} NSSegmentStream>>readObject
        4.4% {72ms} SmartRefStream>>nextAndClose
          4.4% {72ms} SmartRefStream>>next
            4.3% {70ms} SmartRefStream(ReferenceStream)>>next
              4.3% {70ms} SmartRefStream(DataStream)>>next
                3.2% {52ms} NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
        old +92,704,656 bytes
        young -8,008,252 bytes
        used +84,696,404 bytes
        free +1,287,768 bytes

**GCs**
        full 2 totalling 954ms (59.0% uptime), avg 477.0ms
        incr 5 totalling 165ms (10.0% uptime), avg 33.0ms
        tenures 1 (avg 5 GCs/tenure)
        root table 0 overflows
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

Eliot Miranda-2
 
Hi Max,

    pre-Spur to avoid GC one has to a) grow memory by enough to do all the processing you're going to do and b) change the shrinkage parameter so the Vm won't shrink the heap back down before the processing is complete.  To do b) I suggest you modify setGCParameters.  vm parameters 24 sets the shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had thoguht that there's a growMemoryBy: primitive in v3, but it appears there isn't.  So simply allocate a ByteArray of the desired size and then GC to get rid of it.  That should leave that much free space and then your load should proceed without needing to GC.

Anyway, it's worth a try.

On Sat, Jan 9, 2016 at 3:03 AM, Max Leske <[hidden email]> wrote:

Hi,

I have a rather annoying problem. I’m running a time critical piece of code that reads a big (~90MB) image segment from a file. I’ve optimized loading as far as possible and now GC takes far longer than the loading itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection during the load.

For completeness, here’s the use case: the process is socket activated, which means that the first request coming in will start the process. When the image starts it will load the segment to restore the last state of the application and, once that’s done, serve the request. The critical time includes vm startup, image startup, starting the server in the image and loading the snapshot. With a big snapshot the loading time of the snapshot is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

 - 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  4.4% {72ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    4.4% {72ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      4.4% {72ms} NSSegmentStream>>readObject
        4.4% {72ms} SmartRefStream>>nextAndClose
          4.4% {72ms} SmartRefStream>>next
            4.3% {70ms} SmartRefStream(ReferenceStream)>>next
              4.3% {70ms} SmartRefStream(DataStream)>>next
                3.2% {52ms} NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
        old                     +92,704,656 bytes
        young           -8,008,252 bytes
        used            +84,696,404 bytes
        free            +1,287,768 bytes

**GCs**
        full                    2 totalling 954ms (59.0% uptime), avg 477.0ms
        incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
        tenures         1 (avg 5 GCs/tenure)
        root table      0 overflows



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

Max Leske
In reply to this post by Max Leske
 

On 10 Jan 2016, at 20:22, [hidden email] wrote:

Hi Max,

   pre-Spur to avoid GC one has to a) grow memory by enough to do all the
processing you're going to do and b) change the shrinkage parameter so the
Vm won't shrink the heap back down before the processing is complete.  To
do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
thoguht that there's a growMemoryBy: primitive in v3, but it appears there
isn't.  So simply allocate a ByteArray of the desired size and then GC to
get rid of it.  That should leave that much free space and then your load
should proceed without needing to GC.

Anyway, it's worth a try.

Thanks Eliot.

Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
#setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
#setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.

Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?

One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?


Here’s the code I use experimentally:

MessageTally spyOn: [
| size shrinkThreshold |
size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
shrinkThreshold := (Smalltalk vmParameterAt: 24).
Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
Smalltalk setGCBiasToGrow: 1. “enable growth??"
ByteArray new: size*2.

"incremental GC should take care of collecting the ByteArray, so I’m not doing anything
manually here"

<load snapshot> ].



Cheers,
Max



Output from current MessageTally:

 - 1123 tallies, 1125 msec.

**Tree**
--------------------------------
Process: (40s)  123994112: nil
--------------------------------
12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      12.6% {141ms} NSSegmentStream>>readObject
        12.6% {141ms} SmartRefStream>>nextAndClose
          12.6% {141ms} SmartRefStream>>next
            12.3% {138ms} SmartRefStream(ReferenceStream)>>next
              12.3% {138ms} SmartRefStream(DataStream)>>next
                10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
                  |  5.5% {62ms} Dictionary>>rehash
                  |    |2.8% {31ms} Dictionary>>associationsDo:
                  |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
                  |    |1.7% {19ms} Dictionary>>noCheckAdd:
                  |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
                  |    |    1.2% {13ms} Dictionary>>scanFor:
                  |  4.5% {51ms} primitives
                1.2% {13ms} SmartRefStream(DataStream)>>readArray
                  1.2% {13ms} SmartRefStream>>next
                    1.2% {13ms} SmartRefStream(ReferenceStream)>>next
                      1.2% {13ms} SmartRefStream(DataStream)>>next
**Leaves**

**Memory**
old +94,031,228 bytes
young -9,207,660 bytes
used +84,823,568 bytes
free +90,024,824 bytes

**GCs**
full 1 totalling 85ms (8.0% uptime), avg 85.0ms
incr 15 totalling 271ms (24.0% uptime), avg 18.0ms
tenures 10 (avg 1 GCs/tenure)
root table 0 overflows



On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:


Hi,

I have a rather annoying problem. I’m running a time critical piece of
code that reads a big (~90MB) image segment from a file. I’ve optimized
loading as far as possible and now GC takes far longer than the loading
itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection
during the load.

For completeness, here’s the use case: the process is socket activated,
which means that the first request coming in will start the process. When
the image starts it will load the segment to restore the last state of the
application and, once that’s done, serve the request. The critical time
includes vm startup, image startup, starting the server in the image and
loading the snapshot. With a big snapshot the loading time of the snapshot
is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage
collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

- 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>basicSnapshot:from:do:
 4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>installSegmentFrom:andDo:
   4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>readSegmentFrom:
     4.4% {72ms} NSSegmentStream>>readObject
       4.4% {72ms} SmartRefStream>>nextAndClose
         4.4% {72ms} SmartRefStream>>next
           4.3% {70ms} SmartRefStream(ReferenceStream)>>next
             4.3% {70ms} SmartRefStream(DataStream)>>next
               3.2% {52ms}
NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
       old                     +92,704,656 bytes
       young           -8,008,252 bytes
       used            +84,696,404 bytes
       free            +1,287,768 bytes

**GCs**
       full                    2 totalling 954ms (59.0% uptime), avg
477.0ms
       incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
       tenures         1 (avg 5 GCs/tenure)
       root table      0 overflows


Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

johnmci
 
Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it.  However the original note from January 18th 2005 explains

Mac 3.8.6b4

Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB"
                                Smalltalk setGCBiasToGrow: 1.
                                Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision.

Plus added this
" A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."




The Problem:

Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to
trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.

What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why...
What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.

Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the
number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000.
The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines
the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing
those statistics and hiding the GC time.

AllocationCount         Survivors
4000    5400
3209    3459
2269    2790
1760    1574
1592    2299
1105    1662
427     2355
392     2374
123     1472
89      1478
79      2
78      2
76      2
76      2

Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC
threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog
doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of
young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...

What will change in the future.

a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating
over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.

b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.

c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.


On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <[hidden email]> wrote:
 

On 10 Jan 2016, at 20:22, [hidden email] wrote:

Hi Max,

   pre-Spur to avoid GC one has to a) grow memory by enough to do all the
processing you're going to do and b) change the shrinkage parameter so the
Vm won't shrink the heap back down before the processing is complete.  To
do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
thoguht that there's a growMemoryBy: primitive in v3, but it appears there
isn't.  So simply allocate a ByteArray of the desired size and then GC to
get rid of it.  That should leave that much free space and then your load
should proceed without needing to GC.

Anyway, it's worth a try.

Thanks Eliot.

Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
#setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
#setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.

Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?

One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?


Here’s the code I use experimentally:

MessageTally spyOn: [
| size shrinkThreshold |
size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
shrinkThreshold := (Smalltalk vmParameterAt: 24).
Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
Smalltalk setGCBiasToGrow: 1. “enable growth??"
ByteArray new: size*2.

"incremental GC should take care of collecting the ByteArray, so I’m not doing anything
manually here"

<load snapshot> ].



Cheers,
Max



Output from current MessageTally:

 - 1123 tallies, 1125 msec.

**Tree**
--------------------------------
Process: (40s)  123994112: nil
--------------------------------
12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      12.6% {141ms} NSSegmentStream>>readObject
        12.6% {141ms} SmartRefStream>>nextAndClose
          12.6% {141ms} SmartRefStream>>next
            12.3% {138ms} SmartRefStream(ReferenceStream)>>next
              12.3% {138ms} SmartRefStream(DataStream)>>next
                10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
                  |  5.5% {62ms} Dictionary>>rehash
                  |    |2.8% {31ms} Dictionary>>associationsDo:
                  |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
                  |    |1.7% {19ms} Dictionary>>noCheckAdd:
                  |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
                  |    |    1.2% {13ms} Dictionary>>scanFor:
                  |  4.5% {51ms} primitives
                1.2% {13ms} SmartRefStream(DataStream)>>readArray
                  1.2% {13ms} SmartRefStream>>next
                    1.2% {13ms} SmartRefStream(ReferenceStream)>>next
                      1.2% {13ms} SmartRefStream(DataStream)>>next
**Leaves**

**Memory**
old +94,031,228 bytes
young -9,207,660 bytes
used +84,823,568 bytes
free +90,024,824 bytes

**GCs**
full 1 totalling 85ms (8.0% uptime), avg 85.0ms
incr 15 totalling 271ms (24.0% uptime), avg 18.0ms
tenures 10 (avg 1 GCs/tenure)
root table 0 overflows



On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:


Hi,

I have a rather annoying problem. I’m running a time critical piece of
code that reads a big (~90MB) image segment from a file. I’ve optimized
loading as far as possible and now GC takes far longer than the loading
itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection
during the load.

For completeness, here’s the use case: the process is socket activated,
which means that the first request coming in will start the process. When
the image starts it will load the segment to restore the last state of the
application and, once that’s done, serve the request. The critical time
includes vm startup, image startup, starting the server in the image and
loading the snapshot. With a big snapshot the loading time of the snapshot
is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage
collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

- 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>basicSnapshot:from:do:
 4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>installSegmentFrom:andDo:
   4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>readSegmentFrom:
     4.4% {72ms} NSSegmentStream>>readObject
       4.4% {72ms} SmartRefStream>>nextAndClose
         4.4% {72ms} SmartRefStream>>next
           4.3% {70ms} SmartRefStream(ReferenceStream)>>next
             4.3% {70ms} SmartRefStream(DataStream)>>next
               3.2% {52ms}
NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
       old                     +92,704,656 bytes
       young           -8,008,252 bytes
       used            +84,696,404 bytes
       free            +1,287,768 bytes

**GCs**
       full                    2 totalling 954ms (59.0% uptime), avg
477.0ms
       incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
       tenures         1 (avg 5 GCs/tenure)
       root table      0 overflows






--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

Max Leske
 
Hi John,

Thanks for the explanation.

As far as I understand, these settings won’t help in my particular case (I experimented with turning the bias on and off before allocating the ByteArray but didn’t see any influence). Still good to know about them.

I still have a few questions regarding garbage collection.

1. When I allocate a ByteArray, as Eliot suggested, like so:

Smalltalk vmParameterAt: 24 put: 150*1024*1024.
ByteArray new: 70*1024*1024.

then I see that the young space has grown a lot, as expected, and free space is low:

before allocation:
memory 57,032,988 bytes
old 44,259,392 bytes (77.60000000000001%)
young 5,824 bytes (0.0%)
used 44,265,216 bytes (77.60000000000001%)
free 12,767,772 bytes (22.400000000000002%)

after allocation:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%)
young 76,523,392 bytes (65.10000000000001%)
used 116,189,468 bytes (98.9%)
free 1,273,712 bytes (1.1%)


If I now force an incremental GC with #garbageCollectMost then then *nothing changes*, i.e. young space is still full and free space is low. What I would expect, since incremental GC, as I understand it, marks and sweeps young space, is that young space becomes small and free space becomes large (allocated memory of the ByteArray moves from young to free space):

after #garbageCollectMost:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%
 young 75,339,048 bytes (64.10000000000001%)
used 115,005,124 bytes (97.9%)
free 2,458,056 bytes (2.1%)

Only once I do a full garbageCollect does the ByteArray get collected:
memory 121,025,356 bytes
old 39,703,404 bytes (32.800000000000004%)
young 3,265,428 bytes (2.7%)
used 42,968,832 bytes (35.5%)
free 78,056,524 bytes (64.5%)

Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.


2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.


Cheers,
Max


On 11 Jan 2016, at 20:03, John McIntosh <[hidden email]> wrote:

Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it.  However the original note from January 18th 2005 explains

Mac 3.8.6b4

Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB"
                                Smalltalk setGCBiasToGrow: 1.
                                Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision.

Plus added this
" A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."




The Problem:

Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to
trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.

What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why...
What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.

Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the
number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000.
The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines
the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing
those statistics and hiding the GC time.

AllocationCount         Survivors
4000    5400
3209    3459
2269    2790
1760    1574
1592    2299
1105    1662
427     2355
392     2374
123     1472
89      1478
79      2
78      2
76      2
76      2

Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC
threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog
doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of
young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...

What will change in the future.

a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating
over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.

b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.

c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.


On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <[hidden email]> wrote:
 

On 10 Jan 2016, at 20:22, [hidden email] wrote:

Hi Max,

   pre-Spur to avoid GC one has to a) grow memory by enough to do all the
processing you're going to do and b) change the shrinkage parameter so the
Vm won't shrink the heap back down before the processing is complete.  To
do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
thoguht that there's a growMemoryBy: primitive in v3, but it appears there
isn't.  So simply allocate a ByteArray of the desired size and then GC to
get rid of it.  That should leave that much free space and then your load
should proceed without needing to GC.

Anyway, it's worth a try.

Thanks Eliot.

Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
#setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
#setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.

Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?

One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?


Here’s the code I use experimentally:

MessageTally spyOn: [
| size shrinkThreshold |
size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
shrinkThreshold := (Smalltalk vmParameterAt: 24).
Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
Smalltalk setGCBiasToGrow: 1. “enable growth??"
ByteArray new: size*2.

"incremental GC should take care of collecting the ByteArray, so I’m not doing anything
manually here"

<load snapshot> ].



Cheers,
Max



Output from current MessageTally:

 - 1123 tallies, 1125 msec.

**Tree**
--------------------------------
Process: (40s)  123994112: nil
--------------------------------
12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      12.6% {141ms} NSSegmentStream>>readObject
        12.6% {141ms} SmartRefStream>>nextAndClose
          12.6% {141ms} SmartRefStream>>next
            12.3% {138ms} SmartRefStream(ReferenceStream)>>next
              12.3% {138ms} SmartRefStream(DataStream)>>next
                10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
                  |  5.5% {62ms} Dictionary>>rehash
                  |    |2.8% {31ms} Dictionary>>associationsDo:
                  |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
                  |    |1.7% {19ms} Dictionary>>noCheckAdd:
                  |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
                  |    |    1.2% {13ms} Dictionary>>scanFor:
                  |  4.5% {51ms} primitives
                1.2% {13ms} SmartRefStream(DataStream)>>readArray
                  1.2% {13ms} SmartRefStream>>next
                    1.2% {13ms} SmartRefStream(ReferenceStream)>>next
                      1.2% {13ms} SmartRefStream(DataStream)>>next
**Leaves**

**Memory**
old +94,031,228 bytes
young -9,207,660 bytes
used +84,823,568 bytes
free +90,024,824 bytes

**GCs**
full 1 totalling 85ms (8.0% uptime), avg 85.0ms
incr 15 totalling 271ms (24.0% uptime), avg 18.0ms
tenures 10 (avg 1 GCs/tenure)
root table 0 overflows



On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:


Hi,

I have a rather annoying problem. I’m running a time critical piece of
code that reads a big (~90MB) image segment from a file. I’ve optimized
loading as far as possible and now GC takes far longer than the loading
itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection
during the load.

For completeness, here’s the use case: the process is socket activated,
which means that the first request coming in will start the process. When
the image starts it will load the segment to restore the last state of the
application and, once that’s done, serve the request. The critical time
includes vm startup, image startup, starting the server in the image and
loading the snapshot. With a big snapshot the loading time of the snapshot
is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage
collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

- 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>basicSnapshot:from:do:
 4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>installSegmentFrom:andDo:
   4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>readSegmentFrom:
     4.4% {72ms} NSSegmentStream>>readObject
       4.4% {72ms} SmartRefStream>>nextAndClose
         4.4% {72ms} SmartRefStream>>next
           4.3% {70ms} SmartRefStream(ReferenceStream)>>next
             4.3% {70ms} SmartRefStream(DataStream)>>next
               3.2% {52ms}
NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
       old                     +92,704,656 bytes
       young           -8,008,252 bytes
       used            +84,696,404 bytes
       free            +1,287,768 bytes

**GCs**
       full                    2 totalling 954ms (59.0% uptime), avg
477.0ms
       incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
       tenures         1 (avg 5 GCs/tenure)
       root table      0 overflows






--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

timrowledge


> On 12-01-2016, at 8:39 AM, Max Leske <[hidden email]> wrote:
>
> Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.

Not sure about details but do remember that Spur uses a fairly different object space arrangement, so if that’s what you’e using John’s advice may no longer be very relevant.

>
>
> 2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.

The tally code has never been particularly good at handling long running primitives. I *think* there is better profiling code somewhere that Andreas developed for Qwaq but IIRC it is not part of the current system.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
<-------- The information went data way -------->


Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

johnmci
In reply to this post by Max Leske
 
Lots of questions. 
I had a memory policy that would gather data, you can find it here

Also it sets up targets like trying to get a incremental GC to happen within N milliseconds etc by adjust parms. 
If you search the archive for SophieMemoryPolicy or MemoryPolicy you'll find more discussion 


On Tue, Jan 12, 2016 at 8:39 AM, Max Leske <[hidden email]> wrote:
Hi John,

Thanks for the explanation.

As far as I understand, these settings won’t help in my particular case (I experimented with turning the bias on and off before allocating the ByteArray but didn’t see any influence). Still good to know about them.

I still have a few questions regarding garbage collection.

1. When I allocate a ByteArray, as Eliot suggested, like so:

Smalltalk vmParameterAt: 24 put: 150*1024*1024.
ByteArray new: 70*1024*1024.

then I see that the young space has grown a lot, as expected, and free space is low:

before allocation:
memory 57,032,988 bytes
old 44,259,392 bytes (77.60000000000001%)
young 5,824 bytes (0.0%)
used 44,265,216 bytes (77.60000000000001%)
free 12,767,772 bytes (22.400000000000002%)

after allocation:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%)
young 76,523,392 bytes (65.10000000000001%)
used 116,189,468 bytes (98.9%)
free 1,273,712 bytes (1.1%)


If I now force an incremental GC with #garbageCollectMost then then *nothing changes*, i.e. young space is still full and free space is low. What I would expect, since incremental GC, as I understand it, marks and sweeps young space, is that young space becomes small and free space becomes large (allocated memory of the ByteArray moves from young to free space):

after #garbageCollectMost:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%
 young 75,339,048 bytes (64.10000000000001%)
used 115,005,124 bytes (97.9%)
free 2,458,056 bytes (2.1%)

Only once I do a full garbageCollect does the ByteArray get collected:
memory 121,025,356 bytes
old 39,703,404 bytes (32.800000000000004%)
young 3,265,428 bytes (2.7%)
used 42,968,832 bytes (35.5%)
free 78,056,524 bytes (64.5%)

Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.


2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.


Cheers,
Max


On 11 Jan 2016, at 20:03, John McIntosh <[hidden email]> wrote:

Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it.  However the original note from January 18th 2005 explains

Mac 3.8.6b4

Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB"
                                Smalltalk setGCBiasToGrow: 1.
                                Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision.

Plus added this
" A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."




The Problem:

Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to
trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.

What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why...
What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.

Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the
number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000.
The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines
the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing
those statistics and hiding the GC time.

AllocationCount         Survivors
4000    5400
3209    3459
2269    2790
1760    1574
1592    2299
1105    1662
427     2355
392     2374
123     1472
89      1478
79      2
78      2
76      2
76      2

Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC
threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog
doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of
young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...

What will change in the future.

a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating
over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.

b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.

c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.


On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <[hidden email]> wrote:
 

On 10 Jan 2016, at 20:22, [hidden email] wrote:

Hi Max,

   pre-Spur to avoid GC one has to a) grow memory by enough to do all the
processing you're going to do and b) change the shrinkage parameter so the
Vm won't shrink the heap back down before the processing is complete.  To
do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
thoguht that there's a growMemoryBy: primitive in v3, but it appears there
isn't.  So simply allocate a ByteArray of the desired size and then GC to
get rid of it.  That should leave that much free space and then your load
should proceed without needing to GC.

Anyway, it's worth a try.

Thanks Eliot.

Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
#setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
#setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.

Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?

One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?


Here’s the code I use experimentally:

MessageTally spyOn: [
| size shrinkThreshold |
size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
shrinkThreshold := (Smalltalk vmParameterAt: 24).
Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
Smalltalk setGCBiasToGrow: 1. “enable growth??"
ByteArray new: size*2.

"incremental GC should take care of collecting the ByteArray, so I’m not doing anything
manually here"

<load snapshot> ].



Cheers,
Max



Output from current MessageTally:

 - 1123 tallies, 1125 msec.

**Tree**
--------------------------------
Process: (40s)  123994112: nil
--------------------------------
12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      12.6% {141ms} NSSegmentStream>>readObject
        12.6% {141ms} SmartRefStream>>nextAndClose
          12.6% {141ms} SmartRefStream>>next
            12.3% {138ms} SmartRefStream(ReferenceStream)>>next
              12.3% {138ms} SmartRefStream(DataStream)>>next
                10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
                  |  5.5% {62ms} Dictionary>>rehash
                  |    |2.8% {31ms} Dictionary>>associationsDo:
                  |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
                  |    |1.7% {19ms} Dictionary>>noCheckAdd:
                  |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
                  |    |    1.2% {13ms} Dictionary>>scanFor:
                  |  4.5% {51ms} primitives
                1.2% {13ms} SmartRefStream(DataStream)>>readArray
                  1.2% {13ms} SmartRefStream>>next
                    1.2% {13ms} SmartRefStream(ReferenceStream)>>next
                      1.2% {13ms} SmartRefStream(DataStream)>>next
**Leaves**

**Memory**
old +94,031,228 bytes
young -9,207,660 bytes
used +84,823,568 bytes
free +90,024,824 bytes

**GCs**
full 1 totalling 85ms (8.0% uptime), avg 85.0ms
incr 15 totalling 271ms (24.0% uptime), avg 18.0ms
tenures 10 (avg 1 GCs/tenure)
root table 0 overflows



On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:


Hi,

I have a rather annoying problem. I’m running a time critical piece of
code that reads a big (~90MB) image segment from a file. I’ve optimized
loading as far as possible and now GC takes far longer than the loading
itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection
during the load.

For completeness, here’s the use case: the process is socket activated,
which means that the first request coming in will start the process. When
the image starts it will load the segment to restore the last state of the
application and, once that’s done, serve the request. The critical time
includes vm startup, image startup, starting the server in the image and
loading the snapshot. With a big snapshot the loading time of the snapshot
is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage
collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

- 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>basicSnapshot:from:do:
 4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>installSegmentFrom:andDo:
   4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>readSegmentFrom:
     4.4% {72ms} NSSegmentStream>>readObject
       4.4% {72ms} SmartRefStream>>nextAndClose
         4.4% {72ms} SmartRefStream>>next
           4.3% {70ms} SmartRefStream(ReferenceStream)>>next
             4.3% {70ms} SmartRefStream(DataStream)>>next
               3.2% {52ms}
NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
       old                     +92,704,656 bytes
       young           -8,008,252 bytes
       used            +84,696,404 bytes
       free            +1,287,768 bytes

**GCs**
       full                    2 totalling 954ms (59.0% uptime), avg
477.0ms
       incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
       tenures         1 (avg 5 GCs/tenure)
       root table      0 overflows






--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================




--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

David T. Lewis
In reply to this post by timrowledge
 
On Tue, Jan 12, 2016 at 10:23:25AM -0800, tim Rowledge wrote:

>
>
> > On 12-01-2016, at 8:39 AM, Max Leske <[hidden email]> wrote:
> >
> > Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.
>
> Not sure about details but do remember that Spur uses a fairly different object space arrangement, so if that???s what you???e using John???s advice may no longer be very relevant.
>
> >
> >
> > 2. There???s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.
>
> The tally code has never been particularly good at handling long running primitives. I *think* there is better profiling code somewhere that Andreas developed for Qwaq but IIRC it is not part of the current system.
>

Andreas' profiler is definitely available:

  http://lists.squeakfoundation.org/pipermail/squeak-dev/2013-January/168233.html

The package is at http://ss3.gemtalksystems.com/ss/AndreasSystemProfiler/ and
you can load it in Squeak from SqueakMap.

The necessary support is in Cog/Spur. I made an attempt to get it working for
the base interpreter but was not successful.

Dave
 
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

Max Leske
In reply to this post by johnmci
 

On 12 Jan 2016, at 19:47, John McIntosh <[hidden email]> wrote:

Lots of questions. 
I had a memory policy that would gather data, you can find it here

Also it sets up targets like trying to get a incremental GC to happen within N milliseconds etc by adjust parms. 
If you search the archive for SophieMemoryPolicy or MemoryPolicy you'll find more discussion 

Cool, thanks. I’ve already played around with that a bit. I didn’t see anything special in the stats yet but I’ll need some more time.



On Tue, Jan 12, 2016 at 8:39 AM, Max Leske <[hidden email]> wrote:
Hi John,

Thanks for the explanation.

As far as I understand, these settings won’t help in my particular case (I experimented with turning the bias on and off before allocating the ByteArray but didn’t see any influence). Still good to know about them.

I still have a few questions regarding garbage collection.

1. When I allocate a ByteArray, as Eliot suggested, like so:

Smalltalk vmParameterAt: 24 put: 150*1024*1024.
ByteArray new: 70*1024*1024.

then I see that the young space has grown a lot, as expected, and free space is low:

before allocation:
memory 57,032,988 bytes
old 44,259,392 bytes (77.60000000000001%)
young 5,824 bytes (0.0%)
used 44,265,216 bytes (77.60000000000001%)
free 12,767,772 bytes (22.400000000000002%)

after allocation:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%)
young 76,523,392 bytes (65.10000000000001%)
used 116,189,468 bytes (98.9%)
free 1,273,712 bytes (1.1%)


If I now force an incremental GC with #garbageCollectMost then then *nothing changes*, i.e. young space is still full and free space is low. What I would expect, since incremental GC, as I understand it, marks and sweeps young space, is that young space becomes small and free space becomes large (allocated memory of the ByteArray moves from young to free space):

after #garbageCollectMost:
memory 117,463,180 bytes
old 39,666,076 bytes (33.800000000000004%
 young 75,339,048 bytes (64.10000000000001%)
used 115,005,124 bytes (97.9%)
free 2,458,056 bytes (2.1%)

Only once I do a full garbageCollect does the ByteArray get collected:
memory 121,025,356 bytes
old 39,703,404 bytes (32.800000000000004%)
young 3,265,428 bytes (2.7%)
used 42,968,832 bytes (35.5%)
free 78,056,524 bytes (64.5%)

Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.


2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.


Cheers,
Max


On 11 Jan 2016, at 20:03, John McIntosh <[hidden email]> wrote:

Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it.  However the original note from January 18th 2005 explains

Mac 3.8.6b4

Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB"
                                Smalltalk setGCBiasToGrow: 1.
                                Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision.

Plus added this
" A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."




The Problem:

Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to
trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.

What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why...
What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.

Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the
number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000.
The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines
the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing
those statistics and hiding the GC time.

AllocationCount         Survivors
4000    5400
3209    3459
2269    2790
1760    1574
1592    2299
1105    1662
427     2355
392     2374
123     1472
89      1478
79      2
78      2
76      2
76      2

Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC
threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog
doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of
young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...

What will change in the future.

a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating
over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.

b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.

c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.


On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <[hidden email]> wrote:
 

On 10 Jan 2016, at 20:22, [hidden email] wrote:

Hi Max,

   pre-Spur to avoid GC one has to a) grow memory by enough to do all the
processing you're going to do and b) change the shrinkage parameter so the
Vm won't shrink the heap back down before the processing is complete.  To
do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
thoguht that there's a growMemoryBy: primitive in v3, but it appears there
isn't.  So simply allocate a ByteArray of the desired size and then GC to
get rid of it.  That should leave that much free space and then your load
should proceed without needing to GC.

Anyway, it's worth a try.

Thanks Eliot.

Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
#setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
#setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.

Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?

One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?


Here’s the code I use experimentally:

MessageTally spyOn: [
| size shrinkThreshold |
size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
shrinkThreshold := (Smalltalk vmParameterAt: 24).
Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
Smalltalk setGCBiasToGrow: 1. “enable growth??"
ByteArray new: size*2.

"incremental GC should take care of collecting the ByteArray, so I’m not doing anything
manually here"

<load snapshot> ].



Cheers,
Max



Output from current MessageTally:

 - 1123 tallies, 1125 msec.

**Tree**
--------------------------------
Process: (40s)  123994112: nil
--------------------------------
12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
  12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
    12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
      12.6% {141ms} NSSegmentStream>>readObject
        12.6% {141ms} SmartRefStream>>nextAndClose
          12.6% {141ms} SmartRefStream>>next
            12.3% {138ms} SmartRefStream(ReferenceStream)>>next
              12.3% {138ms} SmartRefStream(DataStream)>>next
                10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
                  |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
                  |  5.5% {62ms} Dictionary>>rehash
                  |    |2.8% {31ms} Dictionary>>associationsDo:
                  |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
                  |    |1.7% {19ms} Dictionary>>noCheckAdd:
                  |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
                  |    |    1.2% {13ms} Dictionary>>scanFor:
                  |  4.5% {51ms} primitives
                1.2% {13ms} SmartRefStream(DataStream)>>readArray
                  1.2% {13ms} SmartRefStream>>next
                    1.2% {13ms} SmartRefStream(ReferenceStream)>>next
                      1.2% {13ms} SmartRefStream(DataStream)>>next
**Leaves**

**Memory**
old +94,031,228 bytes
young -9,207,660 bytes
used +84,823,568 bytes
free +90,024,824 bytes

**GCs**
full 1 totalling 85ms (8.0% uptime), avg 85.0ms
incr 15 totalling 271ms (24.0% uptime), avg 18.0ms
tenures 10 (avg 1 GCs/tenure)
root table 0 overflows



On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:


Hi,

I have a rather annoying problem. I’m running a time critical piece of
code that reads a big (~90MB) image segment from a file. I’ve optimized
loading as far as possible and now GC takes far longer than the loading
itself (see the MessageTally output below).
I’m wondering if there’s any possibility to defer garbage collection
during the load.

For completeness, here’s the use case: the process is socket activated,
which means that the first request coming in will start the process. When
the image starts it will load the segment to restore the last state of the
application and, once that’s done, serve the request. The critical time
includes vm startup, image startup, starting the server in the image and
loading the snapshot. With a big snapshot the loading time of the snapshot
is the most significant contributor.

Maybe I could preallocate the needed memory to prevent the garbage
collector from running?

I’d appreciate any ideas you have.


Cheers,
Max


PS: This needs to run on a Squeak 4.0.3 VM (no JIT)




Output from MessageTally:

- 1624 tallies, 1624 msec.

**Tree**
--------------------------------
Process: (40s)  592969728: nil
--------------------------------
4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>basicSnapshot:from:do:
 4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>installSegmentFrom:andDo:
   4.4% {72ms} CBImageSegment class(NSImageSegment
class)>>readSegmentFrom:
     4.4% {72ms} NSSegmentStream>>readObject
       4.4% {72ms} SmartRefStream>>nextAndClose
         4.4% {72ms} SmartRefStream>>next
           4.3% {70ms} SmartRefStream(ReferenceStream)>>next
             4.3% {70ms} SmartRefStream(DataStream)>>next
               3.2% {52ms}
NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
                 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
**Leaves**
3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash

**Memory**
       old                     +92,704,656 bytes
       young           -8,008,252 bytes
       used            +84,696,404 bytes
       free            +1,287,768 bytes

**GCs**
       full                    2 totalling 954ms (59.0% uptime), avg
477.0ms
       incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
       tenures         1 (avg 5 GCs/tenure)
       root table      0 overflows






--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================




--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to suspend the garbage collector?

Max Leske
In reply to this post by Max Leske
 

On 12 Jan 2016, at 19:48, [hidden email] wrote:


On 12-01-2016, at 8:39 AM, Max Leske <[hidden email]> wrote:

Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.

Not sure about details but do remember that Spur uses a fairly different object space arrangement, so if that’s what you’e using John’s advice may no longer be very relevant.

I’m aware of that. My problem (for the moment at least) only concerns V3 (Squeak VM 4.0.3).




2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.

The tally code has never been particularly good at handling long running primitives. I *think* there is better profiling code somewhere that Andreas developed for Qwaq but IIRC it is not part of the current system.

Interesting.

I wondered about the primitives and went to give MessageTally a closer look. I noticed that I was not taking into account other running processes and voilà:

Process: other processes
--------------------------------
84.5% {2389ms} InputEventPollingFetcher(InputEventFetcher)>>eventLoop
  84.5% {2389ms} InputEventPollingFetcher>>waitForInput
    84.5% {2389ms} Delay>>wait

That probably explains where those 500ms I was talking about went (I was loading a larger segment here, so the process took longer to run).

I’ve “fixed” that for now by running the loader in a higher priority process.

Thanks for the hint.

Cheers,
Max


tim