Smalltalk › Squeak › Squeak VM

Too many full GCs in Cog/Spur ... while just drawing GUI ...

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

27 messages Options

marcel.taeumel

Too many full GCs in Cog/Spur ... while just drawing GUI ...

Hi, there!

Bert and I discovered a maybe problematic behavior of the Spur-GC. In 4.6 and before, there was never a full GC when calling "ActiveWorld imageForm" (as an extreme re-drawing benchmark) in an endless loop.

See this picture as an example:

While the user may not notice that behavior for graphical updates, it is annoying for sound glitches. In my system with a 120 ms buffer in SoundPlayer, full GCs took 160 ms on average. And there were sound glitches.

There is something going on here with the new GC in Cog/Spur. Just take this re-drawing example/benchmark as an indication for a configuration issue wrt. today's image sizes, memory sizes, CPU speeds, etc.

I just want that my sound has no glitches while resizing a system window. (Note that putting the sound player at a higher priority than the WeakArray finalizer does not help.)

Best,
Marcel

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

This results in 20 FPS (5.0) vs. 29 FPS (4.6) when fully re-drawing the things as shown in my last email.

... and only 16 FPS in the recent trunk image with the recent VM ... :'(

[ActiveWorld imageForm] bench.

Best,
Marcel

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

As for the number of full-GCs tested with:

DisplayScreen depth: 32 width: 1024 height: 768 fullscreen: false.
{String. Morph. Socket. Rectangle. Form} do: [:ea | ea browse].

x := Smalltalk vmParameterAt: 7. "num full GC since startup"
[ActiveWorld imageForm] benchFor: 5 seconds.
(Smalltalk vmParameterAt: 7) - x

Here are the results:

5.0 All-in-One --> 22 full GCs
4.6 All-in-One --> 0 full GCs
Trunk --> 14 full GCs

I do not claim that these performance issues are solely due to GC.

Best,
Marcel

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

Hi, there!

Jens found a memory visualization, which he did in 2006:
GCStats-jl.mcz

See what it shows for our problem here. Red lines indicate full GCs:

For the non-morph version of the graph, do this:

GcGraph enable.
[ [ (Delay forMilliseconds: 100) wait. GcGraph update ] repeat ] forkAt: 41.

Then do some endless loop for re-drawing:
[ActiveWorld imageForm] repeat.

Stop it via [CMD]+Dot.

Best,
Marcel

Levente Uzonyi

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by marcel.taeumel

Hi Marcell,

I only get 0-2 full GCs using the latest VM and Trunk images.
What's the size of your window? Which VM do you use?

Levente

On Thu, 7 Apr 2016, marcel.taeumel wrote:

>
> As for the number of full-GCs tested with:
>
> DisplayScreen depth: 32 width: 1024 height: 768 fullscreen: false.
> {String. Morph. Socket. Rectangle. Form} do: [:ea | ea browse].
>
> x := Smalltalk vmParameterAt: 7. "num full GC since startup"
> [ActiveWorld imageForm] benchFor: 5 seconds.
> (Smalltalk vmParameterAt: 7) - x
>
> Here are the results:
>
> 5.0 All-in-One --> 22 full GCs
> 4.6 All-in-One --> 0 full GCs
> Trunk --> 14 full GCs
>
> I do not claim that these performance issues are solely due to GC.
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/Too-many-full-GCs-in-Cog-Spur-while-just-drawing-GUI-tp4888823p4888829.html
> Sent from the Squeak VM mailing list archive at Nabble.com.
>

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

Hi Levente,

it's in the code: 1024 x 768.

These tools are opened: {String. Morph. Socket. Rectangle. Form} do: [:ea | ea browse].

And a workspace.

Best,
Marcel

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Levente Uzonyi

Hi Levente,

here are the VMs:

4.6 -> #3397 (non-spur)
5.0 -> #3397
Trunk -> #3663

Windows 10.

Best,
Marcel

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Levente Uzonyi

Hi Levente,

the size of the "time window" was 5 seconds. ;-)

Best,
Marcel

Eliot Miranda-2

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by marcel.taeumel

_,,,^..^,,,_ (phone)

On Apr 7, 2016, at 4:04 AM, marcel.taeumel <[hidden email]> wrote:

Hi, there!

Bert and I discovered a maybe problematic behavior of the Spur-GC. In 4.6
and before, there was never a full GC when calling "ActiveWorld imageForm"
(as an extreme re-drawing benchmark) in an endless loop.

See this picture as an example:
<http://forum.world.st/file/n4888823/squeak-spur-gc-problem.png>

While the user may not notice that behavior for graphical updates, it is
annoying for sound glitches. In my system with a 120 ms buffer in
SoundPlayer, full GCs took 160 ms on average. And there were sound glitches.

There is something going on here with the new GC in Cog/Spur. Just take this
re-drawing example/benchmark as an indication for a configuration issue wrt.
today's image sizes, memory sizes, CPU speeds, etc.

I just want that my sound has no glitches while resizing a system window.
(Note that putting the sound player at a higher priority than the WeakArray
finalizer does not help.)

from http://www.mirandabanda.org/cogblog/cog-projects/

"Spur is released as Squeak 5.0, Newspeak and Pharo 5, and 64-bit Squeak 5.0 is in use, but green. There are still important missing components:

– an incremental global mark-sweep-compact collector that avoids long pause times running a stop-the-world scan-mark-compact during interactive use.

– an improved per-segment compaction algorithm to replace the second cut compaction algorithm (pig compact) that is slow and reorders objects."

Volunteers welcome...

Best,
Marcel

Levente Uzonyi

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by marcel.taeumel

My bad, vertical reading...

So, with 75 open windows, main window larger than 1024x768, I still get
0-2 as result. Perhaps it's a difference between unix and windows vms.

Levente

On Thu, 7 Apr 2016, marcel.taeumel wrote:

>
> Hi Levente,
>
> it's in the code: 1024 x 768.
>
> These tools are opened: {String. Morph. Socket. Rectangle. Form} do: [:ea |
> ea browse].
>
> And a workspace.
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/Too-many-full-GCs-in-Cog-Spur-while-just-drawing-GUI-tp4888823p4888846.html
> Sent from the Squeak VM mailing list archive at Nabble.com.
>

marcel.taeumel

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Eliot Miranda-2

Hi Eliot,

where should volunteers start to read and understand the GC-related parts in the VM?

Best,
Marcel

Eliot Miranda-2

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

Hi Marcel,

On Apr 7, 2016, at 8:11 AM, marcel.taeumel <[hidden email]> wrote:

Hi Eliot,

where should volunteers start to read and understand the GC-related parts in
the VM?

1. build a VMMaker image following the instructions in http://www.mirandabanda.org/cogblog/build-image/

2. read the class comment for SpurMemoryManager

3. guided by the message protocols in SpurMemoryManager, drill down into the compaction algorithm, free space lists, etc

4. Bring questions to vm-dev

There are good papers on concurrent Mark-sweep. I have two in my desk (references soon).

Clément has an idea for a compaction scheme that makes sense. We would need to discuss the idea here.

I would prefer volunteers that have already completed a masters, or have equivalent experience. But note that the Spur GC is (AFAIA) unique in supporting "lemming" debugging, the scheme to run the GC in a copy of the entire heap so that if there are bugs one simply takes other copys and debugs using the copy, instead of either trying to construct a reproducible case or trying to work backwards from a corrupted heap. There's also a full leak checker, many assertions, etc. so the development experience is pretty good.

Best,
Marcel

_,,,^..^,,,_ (phone)

--
View this message in context: http://forum.world.st/Too-many-full-GCs-in-Cog-Spur-while-just-drawing-GUI-tp4888823p4888885.html
Sent from the Squeak VM mailing list archive at Nabble.com.

Levente Uzonyi

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Levente Uzonyi

No, it's not a difference between platforms.
With fewer opened windows the number of garbage collections increases.
This is because rendering will happen more often. When the VM runs out of
memory primitive 71 in #basicNew: will fail and it'll send
#handleFailingBasicNew:. That method will retry primitive 71, which will
fail again, and then it'll do a full gc to make room for the new object.
Only when it's not possible to make room for the new object by collecting
the garbage will the VM allocate more memory.
Forcing the VM to allocate more memory, by tweaking parameter 25, will
decrease the number of GCs, though I wasn't able to get it down to 0,
because the code became way too snappy and it quickly reached the new
memory limit.

Levente

On Thu, 7 Apr 2016, Levente Uzonyi wrote:

>
> My bad, vertical reading...
>
> So, with 75 open windows, main window larger than 1024x768, I still get 0-2
> as result. Perhaps it's a difference between unix and windows vms.
>
> Levente
>
> On Thu, 7 Apr 2016, marcel.taeumel wrote:
>
>>
>> Hi Levente,
>>
>> it's in the code: 1024 x 768.
>>
>> These tools are opened: {String. Morph. Socket. Rectangle. Form} do: [:ea |
>> ea browse].
>>
>> And a workspace.
>>
>> Best,
>> Marcel
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/Too-many-full-GCs-in-Cog-Spur-while-just-drawing-GUI-tp4888823p4888846.html
>> Sent from the Squeak VM mailing list archive at Nabble.com.
>>
>

Bert Freudenberg

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

On 08.04.2016, at 00:34, Levente Uzonyi <[hidden email]> wrote:
>
> When the VM runs out of memory primitive 71 in #basicNew: will fail and it'll send
> #handleFailingBasicNew:. That method will retry primitive 71, which will fail again, and then it'll do a full gc to make room for the new object.

I couldn’t believe regular GC would go through primitive failure code and rely on the image to invoke a full GC.

And in fact, it doesn’t. I put a "BasicNewFailures := BasicNewFailures + 1.” in basicNew(:) and it never increased. So I think this is invoked only in severe conditions.

> Only when it's not possible to make room for the new object by collecting the garbage will the VM allocate more memory.

I see that code in Behavior (calling growMemoryByAtLeast:), yes, but in my tests it never got invoked.

> Forcing the VM to allocate more memory, by tweaking parameter 25, will decrease the number of GCs, though I wasn't able to get it down to 0, because the code became way too snappy and it quickly reached the new memory limit.

Yes, the number of full GCs goes down. But why not to zero? I cannot understand why we cannot find a large-enough “new space size” so that the working set is fully kept in there. Redrawing a couple windows should not produce any long-lived objects that trigger a full GC every second.

- Bert -

smime.p7s (5K) Download Attachment

Levente Uzonyi

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

On Fri, 8 Apr 2016, Bert Freudenberg wrote:
>
> On 08.04.2016, at 00:34, Levente Uzonyi <[hidden email]> wrote:
> >
> > When the VM runs out of memory primitive 71 in #basicNew: will fail and it'll send
> > #handleFailingBasicNew:. That method will retry primitive 71, which will fail again, and then it'll do a full gc to make room for the new
> object.
>
> I couldn’t believe regular GC would go through primitive failure code and rely on the image to invoke a full GC.

It does. Perhaps it's worth trying #garbageCollectMost before doing a full
GC. It decreases the full GC significantly in cases when larger objects
are allocated and thrown away frequently.

I had replaced

Smalltalk garbageCollect < bytesRequested ifTrue:
[Smalltalk growMemoryByAtLeast: bytesRequested].

with

Smalltalk garbageCollectMost < bytesRequested ifTrue:
[Smalltalk garbageCollect < bytesRequested ifTrue:
[Smalltalk growMemoryByAtLeast: bytesRequested]].

in #handleFailingBasicNew: and the number of full GCs has never gone above
4 when running this snippet:

Smalltalk garbageCollect.
x := Smalltalk vmParameterAt: 7. "num full GC since startup"
{ [ActiveWorld imageForm] bench.
(Smalltalk vmParameterAt: 7) - x }

Doing the same trick in #handleFailingBasicNew has taken the number down
to 2.
So I suggest we should use both changes if it has no negative side
effects. Eliot? :)

>
> And in fact, it doesn’t. I put a "BasicNewFailures := BasicNewFailures +
1.” in basicNew(:) and it never increased. So I think this is invoked
> only in severe conditions.

Try the snippet above. The way I found this was that I also had the #bench
send wrapped in #timeProfile, so it appeared in the profiler. Make sure
you have only a few windows open, otherwise drawing will take way too
long and there won't be enough allocations to trigger the failure.

>
> > Only when it's not possible to make room for the new object by collecting the garbage will the VM allocate more memory.
>
> I see that code in Behavior (calling growMemoryByAtLeast:), yes, but in my tests it never got invoked.
>
> > Forcing the VM to allocate more memory, by tweaking parameter 25, will decrease the number of GCs, though I wasn't able to get it down to 0,
> because the code became way too snappy and it quickly reached the new memory limit.
>
> Yes, the number of full GCs goes down. But why not to zero? I cannot understand why we cannot find a large-enough “new space size” so that the
> working set is fully kept in there. Redrawing a couple windows should not produce any long-lived objects that trigger a full GC every second.
>

If you replace #bench with #benchFor: 1 seconds, then the number of full
GCs will stay zero (using the #garbageCollectMost change I suggested
above).

Levente

Levente Uzonyi

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

I've updated Kernel-ul.1014 to the Squeak Inbox with the suggested changes.

Levente

timrowledge

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Bert Freudenberg

> On 08-04-2016, at 7:33 AM, Bert Freudenberg <[hidden email]> wrote:
>
> Yes, the number of full GCs goes down. But why not to zero? I cannot understand why we cannot find a large-enough “new space size” so that the working set is fully kept in there. Redrawing a couple windows should not produce any long-lived objects that trigger a full GC every second.

Redrawing seems likely to be using quite a lot of assorted bitmaps (no, I didn’t dig around to be sure so may be wrong) and since they’re large non-pointer objects I suspect they’d go straight to hell. Ah, old-space. Thus requiring the full gc to get rid of them.

And as a small extra datum, Marcel’s original example code answered ‘3’ on My Pi3. However, since it is about 10% the performance of a typical modern laptop I’m guessing we only run the test 10% as many times. So, trying with 'benchFor: 50' instead answers 30/42/45, which indicates a relatively linear relationship to the number of #imageForm executions.

Running a TimeProfileBrowser on it shows an astonishing 21,000 tenures and 247 incremental collections, with around 50% of total run time going to gc. Fully half the remaining time appears to go on drawing the WindowButtonMorph, whatever that is.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Common sense – so rare it’s a goddam superpower

Eliot Miranda-2

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Levente Uzonyi

On Fri, Apr 8, 2016 at 9:29 AM, Levente Uzonyi <[hidden email]> wrote:

On Fri, 8 Apr 2016, Bert Freudenberg wrote:

On 08.04.2016, at 00:34, Levente Uzonyi <[hidden email]> wrote:
>
> When the VM runs out of memory primitive 71 in #basicNew: will fail and it'll send
> #handleFailingBasicNew:. That method will retry primitive 71, which will fail again, and then it'll do a full gc to make room for the new
object.

I couldn’t believe regular GC would go through primitive failure code and rely on the image to invoke a full GC.

It does. Perhaps it's worth trying #garbageCollectMost before doing a full GC. It decreases the full GC significantly in cases when larger objects are allocated and thrown away frequently.

I had replaced

Smalltalk garbageCollect < bytesRequested ifTrue:
[Smalltalk growMemoryByAtLeast: bytesRequested].

with

Smalltalk garbageCollectMost < bytesRequested ifTrue:
[Smalltalk garbageCollect < bytesRequested ifTrue:
[Smalltalk growMemoryByAtLeast: bytesRequested]].

in #handleFailingBasicNew: and the number of full GCs has never gone above 4 when running this snippet:

Smalltalk garbageCollect.
x := Smalltalk vmParameterAt: 7. "num full GC since startup"
{ [ActiveWorld imageForm] bench.
(Smalltalk vmParameterAt: 7) - x }

Doing the same trick in #handleFailingBasicNew has taken the number down to 2.
So I suggest we should use both changes if it has no negative side effects. Eliot? :)

+1. I had hoped that scavenging would be run automatically, but this won't happen with huge allocations. For small allocations, when eden is full, the machine code new primitive will set the "needs scavenge" flag, when #handleFailingBasicNew: runs the scavenger will run, and so there is no need to do Smalltalk garbageCollectMost, because that has happened implicitly. But for huge allocations I think the code doesn't set the scavenge flag, it merely fails the primitive. But I need to check this.

What do we prefer, having the machine code for new always set the "needs scavenge" flag if an allocation failed because there was no room, or have #handleFailingBasicNew: et al calls Smalltalk garbageCollectMost explicitly?

The wrinkle here is that Spur will only allocate objects with 64k slots or less in newSpace (I apologise; this is not documented at the image level):

maxSlotsForNewSpaceAlloc

"Almost entirely arbitrary, but we dont want 1Mb bitmaps allocated in eden.

But this choice means no check for numSlots > maxSlotsForNewSpaceAlloc

for non-variable allocations."

^self fixedFieldsOfClassFormatMask

fixedFieldsOfClassFormatMask

<api>

^1 << self fixedFieldsFieldWidth - 1

fixedFieldsFieldWidth

<api>

^16

So the "needs scavenge" flag only gets set for "small" allocations, and even more confusingly, "small" is different in 32- and 64-bits since 64k slots is 256k bytes in 32-bits buts 512k bytes in 64-bits. So I am leaning on having handleFailingBasicNew: call garbageCollectMost explicitly.

And in fact, it doesn’t. I put a "BasicNewFailures := BasicNewFailures +

1.” in basicNew(:) and it never increased. So I think this is invoked

only in severe conditions.

Try the snippet above. The way I found this was that I also had the #bench send wrapped in #timeProfile, so it appeared in the profiler. Make sure you have only a few windows open, otherwise drawing will take way too long and there won't be enough allocations to trigger the failure.

When I saw Bert's message above, denying that basicNew: ever fails, I immediately repeated Bert's experiment, /knowing/ that basicNew /does/ fail. But to my surprise my experiment revealed Bert's result, that basicNew does not appear to fail :-). Of course the gotcher is that the backward branch in the loop over allocation checks for events, and runs the scavenger, so a simple loop never shows failures. It's difficult to make it fail, but fail it will :-)

> Only when it's not possible to make room for the new object by collecting the garbage will the VM allocate more memory.

I see that code in Behavior (calling growMemoryByAtLeast:), yes, but in my tests it never got invoked.

> Forcing the VM to allocate more memory, by tweaking parameter 25, will decrease the number of GCs, though I wasn't able to get it down to 0,
because the code became way too snappy and it quickly reached the new memory limit.

Yes, the number of full GCs goes down. But why not to zero? I cannot understand why we cannot find a large-enough “new space size” so that the
working set is fully kept in there. Redrawing a couple windows should not produce any long-lived objects that trigger a full GC every second.

If you replace #bench with #benchFor: 1 seconds, then the number of full GCs will stay zero (using the #garbageCollectMost change I suggested above).

Levente

_,,,^..^,,,_

best, Eliot

Bert Freudenberg

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Levente Uzonyi

On 08.04.2016, at 18:29, Levente Uzonyi <[hidden email]> wrote:

>
> On Fri, 8 Apr 2016, Bert Freudenberg wrote:
>>
>> On 08.04.2016, at 00:34, Levente Uzonyi <[hidden email]> wrote:
>> >
>> > When the VM runs out of memory primitive 71 in #basicNew: will fail and it'll send
>> > #handleFailingBasicNew:. That method will retry primitive 71, which will fail again, and then it'll do a full gc to make room for the new
>> object.
>>
>> I couldn’t believe regular GC would go through primitive failure code and rely on the image to invoke a full GC.
>
> It does. [...] Make sure you have only a few windows open, otherwise drawing will take way too long and there won't be enough allocations to trigger the failure.

You’re right, it does get triggered with just one window.

>> > Only when it's not possible to make room for the new object by collecting the garbage will the VM allocate more memory.
>>
>> I see that code in Behavior (calling growMemoryByAtLeast:), yes, but in my tests it never got invoked.
>>
>> > Forcing the VM to allocate more memory, by tweaking parameter 25, will decrease the number of GCs, though I wasn't able to get it down to 0,
>> because the code became way too snappy and it quickly reached the new memory limit.
>>
>> Yes, the number of full GCs goes down. But why not to zero? I cannot understand why we cannot find a large-enough “new space size” so that the
>> working set is fully kept in there. Redrawing a couple windows should not produce any long-lived objects that trigger a full GC every second.
>>
>
> If you replace #bench with #benchFor: 1 seconds, then the number of full GCs will stay zero (using the #garbageCollectMost change I suggested above).

Well we should be able to run for 5 seconds ;)

But it’s even worse: After loading Kernel-ul.1014 and running your snippet, I get an OutOfMemory error immediately. So we are somehow pushing the VM to its limit.

If I open more windows, it works, but I verified that there were no more basicNew failures. It apparently has to do with big allocations. E.g. this triggers OOM with your change :

10 timesRepeat: [Bitmap new: 10000000]

... but works fine (albeit slowly) without.

- Bert -

smime.p7s (5K) Download Attachment

Bert Freudenberg

Re: Too many full GCs in Cog/Spur ... while just drawing GUI ...

In reply to this post by Eliot Miranda-2

On 08.04.2016, at 20:00, Eliot Miranda <[hidden email]> wrote:

On Fri, Apr 8, 2016 at 9:29 AM, Levente Uzonyi <[hidden email]> wrote:
So I suggest we should use both changes if it has no negative side effects. Eliot? :)

+1. I had hoped that scavenging would be run automatically, but this won't happen with huge allocations. For small allocations, when eden is full, the machine code new primitive will set the "needs scavenge" flag, when #handleFailingBasicNew: runs the scavenger will run, and so there is no need to do Smalltalk garbageCollectMost, because that has happened implicitly. But for huge allocations I think the code doesn't set the scavenge flag, it merely fails the primitive. But I need to check this.

What do we prefer, having the machine code for new always set the "needs scavenge" flag if an allocation failed because there was no room, or have #handleFailingBasicNew: et al calls Smalltalk garbageCollectMost explicitly?

IMHO there’s no advantage to rely on the image to do that. I’d prefer handling it in the VM.

When I saw Bert's message above, denying that basicNew: ever fails, I immediately repeated Bert's experiment, /knowing/ that basicNew /does/ fail. But to my surprise my experiment revealed Bert's result, that basicNew does not appear to fail :-).

Now that’s a relief ;)

Of course the gotcher is that the backward branch in the loop over allocation checks for events, and runs the scavenger, so a simple loop never shows failures. It's difficult to make it fail, but fail it will :-)

Yeah. So more smaller allocations will not make it fail, but few bigger ones does.

When exactly is a full GC triggered in the current scheme?

- Bert -

smime.p7s (5K) Download Attachment