Hi, I have a rather annoying problem. I’m running a time critical piece of code that reads a big (~90MB) image segment from a file. I’ve optimized loading as far as possible and now GC takes far longer than the loading itself (see the MessageTally output below). I’m wondering if there’s any possibility to defer garbage collection during the load. For completeness, here’s the use case: the process is socket activated, which means that the first request coming in will start the process. When the image starts it will load the segment to restore the last state of the application and, once that’s done, serve the request. The critical time includes vm startup, image startup, starting the server in the image and loading the snapshot. With a big snapshot the loading time of the snapshot is the most significant contributor. Maybe I could preallocate the needed memory to prevent the garbage collector from running? I’d appreciate any ideas you have. Cheers, Max PS: This needs to run on a Squeak 4.0.3 VM (no JIT) Output from MessageTally: - 1624 tallies, 1624 msec. **Tree** -------------------------------- Process: (40s) 592969728: nil -------------------------------- 4.4% {72ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do: 4.4% {72ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo: 4.4% {72ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom: 4.4% {72ms} NSSegmentStream>>readObject 4.4% {72ms} SmartRefStream>>nextAndClose 4.4% {72ms} SmartRefStream>>next 4.3% {70ms} SmartRefStream(ReferenceStream)>>next 4.3% {70ms} SmartRefStream(DataStream)>>next 3.2% {52ms} NSImageSegment(ImageSegment)>>comeFullyUpOnReload: 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash **Leaves** 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash **Memory** old +92,704,656 bytes young -8,008,252 bytes used +84,696,404 bytes free +1,287,768 bytes **GCs** full 2 totalling 954ms (59.0% uptime), avg 477.0ms incr 5 totalling 165ms (10.0% uptime), avg 33.0ms tenures 1 (avg 5 GCs/tenure) root table 0 overflows |
Hi Max, pre-Spur to avoid GC one has to a) grow memory by enough to do all the processing you're going to do and b) change the shrinkage parameter so the Vm won't shrink the heap back down before the processing is complete. To do b) I suggest you modify setGCParameters. vm parameters 24 sets the shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above whichto shrink object memory (read-write)". growMemory. Hmmm, I had thoguht that there's a growMemoryBy: primitive in v3, but it appears there isn't. So simply allocate a ByteArray of the desired size and then GC to get rid of it. That should leave that much free space and then your load should proceed without needing to GC. Anyway, it's worth a try. On Sat, Jan 9, 2016 at 3:03 AM, Max Leske <[hidden email]> wrote:
_,,,^..^,,,_ best, Eliot |
In reply to this post by Max Leske
Thanks Eliot. Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. #setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory? #setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need. Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly? One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing? Here’s the code I use experimentally: MessageTally spyOn: [
Cheers, Max Output from current MessageTally: - 1123 tallies, 1125 msec. **Tree** -------------------------------- Process: (40s) 123994112: nil -------------------------------- 12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do: 12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo: 12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom: 12.6% {141ms} NSSegmentStream>>readObject 12.6% {141ms} SmartRefStream>>nextAndClose 12.6% {141ms} SmartRefStream>>next 12.3% {138ms} SmartRefStream(ReferenceStream)>>next 12.3% {138ms} SmartRefStream(DataStream)>>next 10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload: |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash | 5.5% {62ms} Dictionary>>rehash | |2.8% {31ms} Dictionary>>associationsDo: | | |2.2% {25ms} Array(SequenceableCollection)>>do: | |1.7% {19ms} Dictionary>>noCheckAdd: | | 1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil: | | 1.2% {13ms} Dictionary>>scanFor: | 4.5% {51ms} primitives 1.2% {13ms} SmartRefStream(DataStream)>>readArray 1.2% {13ms} SmartRefStream>>next 1.2% {13ms} SmartRefStream(ReferenceStream)>>next 1.2% {13ms} SmartRefStream(DataStream)>>next **Leaves** **Memory** old +94,031,228 bytes young -9,207,660 bytes used +84,823,568 bytes free +90,024,824 bytes **GCs** full 1 totalling 85ms (8.0% uptime), avg 85.0ms incr 15 totalling 271ms (24.0% uptime), avg 18.0ms tenures 10 (avg 1 GCs/tenure) root table 0 overflows
|
Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it. However the original note from January 18th 2005 explains Mac 3.8.6b4 Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB" Smalltalk setGCBiasToGrow: 1. Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision. Plus added this " A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required. For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."
On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <[hidden email]> wrote:
=========================================================================== John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk =========================================================================== |
Hi John, Thanks for the explanation. As far as I understand, these settings won’t help in my particular case (I experimented with turning the bias on and off before allocating the ByteArray but didn’t see any influence). Still good to know about them. I still have a few questions regarding garbage collection. 1. When I allocate a ByteArray, as Eliot suggested, like so: Smalltalk vmParameterAt: 24 put: 150*1024*1024. ByteArray new: 70*1024*1024. then I see that the young space has grown a lot, as expected, and free space is low: before allocation: memory 57,032,988 bytes old 44,259,392 bytes (77.60000000000001%) young 5,824 bytes (0.0%) used 44,265,216 bytes (77.60000000000001%) free 12,767,772 bytes (22.400000000000002%) after allocation: memory 117,463,180 bytes old 39,666,076 bytes (33.800000000000004%) young 76,523,392 bytes (65.10000000000001%) used 116,189,468 bytes (98.9%) free 1,273,712 bytes (1.1%) If I now force an incremental GC with #garbageCollectMost then then *nothing changes*, i.e. young space is still full and free space is low. What I would expect, since incremental GC, as I understand it, marks and sweeps young space, is that young space becomes small and free space becomes large (allocated memory of the ByteArray moves from young to free space): after #garbageCollectMost: memory 117,463,180 bytes old 39,666,076 bytes (33.800000000000004% young 75,339,048 bytes (64.10000000000001%) used 115,005,124 bytes (97.9%) free 2,458,056 bytes (2.1%) Only once I do a full garbageCollect does the ByteArray get collected: memory 121,025,356 bytes old 39,703,404 bytes (32.800000000000004%) young 3,265,428 bytes (2.7%) used 42,968,832 bytes (35.5%) free 78,056,524 bytes (64.5%) Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually. 2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output. Cheers, Max
|
> On 12-01-2016, at 8:39 AM, Max Leske <[hidden email]> wrote: > > Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually. Not sure about details but do remember that Spur uses a fairly different object space arrangement, so if that’s what you’e using John’s advice may no longer be very relevant. > > > 2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output. The tally code has never been particularly good at handling long running primitives. I *think* there is better profiling code somewhere that Andreas developed for Qwaq but IIRC it is not part of the current system. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim <-------- The information went data way --------> |
In reply to this post by Max Leske
Lots of questions. I had a memory policy that would gather data, you can find it here Also it sets up targets like trying to get a incremental GC to happen within N milliseconds etc by adjust parms. If you search the archive for SophieMemoryPolicy or MemoryPolicy you'll find more discussion On Tue, Jan 12, 2016 at 8:39 AM, Max Leske <[hidden email]> wrote:
=========================================================================== John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk =========================================================================== |
In reply to this post by timrowledge
On Tue, Jan 12, 2016 at 10:23:25AM -0800, tim Rowledge wrote: > > > > On 12-01-2016, at 8:39 AM, Max Leske <[hidden email]> wrote: > > > > Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually. > > Not sure about details but do remember that Spur uses a fairly different object space arrangement, so if that???s what you???e using John???s advice may no longer be very relevant. > > > > > > > 2. There???s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output. > > The tally code has never been particularly good at handling long running primitives. I *think* there is better profiling code somewhere that Andreas developed for Qwaq but IIRC it is not part of the current system. > Andreas' profiler is definitely available: http://lists.squeakfoundation.org/pipermail/squeak-dev/2013-January/168233.html The package is at http://ss3.gemtalksystems.com/ss/AndreasSystemProfiler/ and you can load it in Squeak from SqueakMap. The necessary support is in Cog/Spur. I made an attempt to get it working for the base interpreter but was not successful. Dave |
In reply to this post by johnmci
Cool, thanks. I’ve already played around with that a bit. I didn’t see anything special in the stats yet but I’ll need some more time.
|
In reply to this post by Max Leske
I’m aware of that. My problem (for the moment at least) only concerns V3 (Squeak VM 4.0.3).
Interesting. I wondered about the primitives and went to give MessageTally a closer look. I noticed that I was not taking into account other running processes and voilà: Process: other processes -------------------------------- 84.5% {2389ms} InputEventPollingFetcher(InputEventFetcher)>>eventLoop 84.5% {2389ms} InputEventPollingFetcher>>waitForInput 84.5% {2389ms} Delay>>wait That probably explains where those 500ms I was talking about went (I was loading a larger segment here, so the process took longer to run). I’ve “fixed” that for now by running the loader in a higher priority process. Thanks for the hint. Cheers, Max
|
Free forum by Nabble | Edit this page |