Hi Clément, Hi All,
I answered a question on Quora on GC recently and that same question was answered by Jon Harrop, who has taken a good look at GC performance. If you're not on Quora yet, let me encourage you. This blog post is a good read:
And in Jon's answer to the Quora question he mentions Staccato, A Parallel and Concurrent Real-time Compacting Garbage Collector for Multiprocessors https://pdfs.semanticscholar.org/9770/fc9baf0f2b6c7521f00958973657bf03337d.pdf
That's my reading for today.
I would consider building a reference counting GC in a language with mixed high level and low level pointers. It's difficult to use a tracing GC when you can have pointers targeting the middle of an object (How to find which object is pointed to efficiently? Go supports a tracing GC with such pointers though with memory management heavy hacks). In Smalltalk all pointers to objects at GC time are to the base header (aside from IP, but that's not a problem), so there's no real point in using reference counting IMO. For C++, smart pointers and reference counting might be relevant. A tracing GC might be faster in C++, but for sure it's going to be hell to implement. There's performance and there's complexity. The blog post does not talk about that, though he says that since Apple has more skills in reference counting it makes sense for them to use it.
Richard Jones said at the VM summer school that he did not like reference counting GC since it was hard to make it efficient in a multi threaded environment. Just pointing that out.
Whatever. Reference counting vs tracing GC is an old time conflict and people will argue about it for the next century, if not more. And don't talk to me about inline caches versus virtual tables, a paper a few years ago showed that v-tables are faster and now so many people are trying to convince me to implement that for our VM. Even if it is faster, and I am not saying it is, rewriting the runtime to use v-table over inline caches requires a crazy amount of engineering time and complexity.
On Thu, Apr 26, 2018 at 5:09 PM, Eliot Miranda <[hidden email]> wrote:
In reply to this post by Eliot Miranda-2
On 26 April 2018 at 23:09, Eliot Miranda <[hidden email]> wrote:
What is the significance of Minimum Mutator Utilisation
and is a comparative value known for our compactors?
I thought this was interesting about the mark phase...
> each thread maintains its own thread-local fragment of the [mark-stack].
Do I understand correctly that the mark-stack is equivalent to the grey-marked collection?
What support does Slang have for concurrent programming?
On Fri, Apr 27, 2018 at 9:26 AM, Ben Coman <[hidden email]> wrote:
There is a Thread model for ThreadedFFI which is used in CogMT.
There are in the C code CAS operations that I would like some day to leverage in the language...
In reply to this post by Eliot Miranda-2
You might be interested in
"Very Concurrent Mark and Sweep Garbage Collection without Fine-Grain Synchronization"
I suspect the Smalltalk allocation rate is much higher than Limbo, but the algorithm is simple -- and in a multicore system..
> On Apr 27, 2018, at 7:35 AM, Ken.Dickey <[hidden email]> wrote:
> You might be interested in
> "Very Concurrent Mark and Sweep Garbage Collection without Fine-Grain Synchronization"
> I suspect the Smalltalk allocation rate is much higher than Limbo, but the algorithm is simple -- and in a multicore system..
Yes, thanks. This is the paper that Clément and I have read and take as a model for the incremental talk mark-sweep we're planning, and which Clément's recent compactor work supports moving towards.
In reply to this post by Ben Coman
On Apr 27, 2018, at 12:26 AM, Ben Coman <[hidden email]> wrote:
It's an important metric, especially in real-time GC (I got this yesterday from reading the Metronome paper (POPL 03) and Staccato tech report (2008).
Up until Cheng's work, Baker had introduced the notion that real time GCs were those that had bounded maximum pause times. But Chang pointed out that if you have many small pauses within some interval then the mutator may not be because my delayed for very long, but its overall share of the processor could fall to insufficient levels. So Cheng introduced the Minimum Mutator Utilization (MMU) metric as the percentage processor utilization by the mutator during some interval. So for example, achieving 85% utilization over a 20ms period implies both a maximum pause time of 3ms and at least 17ms of runtime, whereas one could imagine (and indeed people have measured in such gcs) a pause of 1ms happening ten times in the same interval which would mean 50% processor utilization. So MMU is a much better metric that maximum pause time.
So the metric doesn't apply to compaction per se. It applies to GC overhead and latency.
During incremental collection, yes. In our case the store check will be extended so that if one stores into a black object, it will be set to grey and pushed on the mark stack.
Nothing. But in the VMMaker package, as Clément mentioned, there's a multi-process simulation of the support for the threaded FFI which is built upon some atomic instruction support the JIT makes available when it generates its run-time at startup. So we can use CAS to switch ownership of the VM between threads.
Eliot Miranda gave a great explanation about the MMU metric for GC.
An important part of Urs Hölzle's PhD thesis about his Self 3
implementation was the development of a similar metric for evaluating
pauses caused by adaptive compilation. Craig Chambers had previously
done a simple implementation (Self 1) and then a much more aggressive
one (Self 2) that had very impressive benchmarks (half the speed of
optimized C on numeric code while offering lots os safe features, like
overflow handling, that C doesn't have). Unfortunately Self 2 was not as
usable interactively as Self 1. When you pressed the mouse button you
would have to hold it for a few seconds before the menu popped up. But
the response to the second mouse click was extremely fast.
See chapter 9.
|Free forum by Nabble||Edit this page|