Perhaps the solution resides in settings you can tweak from within Smalltalk? Something like MemoryPolicy you have in VisualWorks for memory & garbage collection management. I don't think fixing the behavior of memory management & garbage collection (marking, sweeping, etc) in the VM is the way to go. There's an awful lot of different needs for different applications out there !
----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein) From: Clément Bera <[hidden email]> To: Squeak Virtual Machine Development Discussion <[hidden email]> Cc: Pharo Development List <[hidden email]> Sent: Monday, December 4, 2017 2:49 AM Subject: Re: [Pharo-dev] [Vm-dev] Garbage Collection (was Re: Discussing the roadmap) hum... The mail is very long so I did not read all of it. Here are some ideas/things to say on the tip of my head: - Changing an object to a forwarding object is non atomic (we need to maintain at least stack invariant) - To decrease the pauses in full GC I have 2 plans: -- incremental marking (split the mark pause in multiple small pauses): Not implemented right now. -- selective compaction (compacts only part of the heap instead of all the heap and sweeps the rest, similar to G1, but uses forwarders instead of lots of card marking metadata): I implemented SpurSweeper which only sweeps but works very well. - Currently the marking phase removes all forwarders and I would like incremental marking to maintain the same invariant (forwarders are always white). - In general, Concurrent marking and sweeping have been implemented everywhere, but no concurrent compaction. For compaction you can make it selective (compact only part of the heap and the part which needs it the most) like I suggest and like in G1, which decreases considerably compaction pause time. Work on concurrent compaction is state of the art and not in production everywhere, see for example And I will watch at talk on this topic tomorrow for the Android GC.- Some runtime, especially now with small servers being rent, are running on single core machines. So we need the low-pause GC to work incrementally aside from concurrently. So step 1 incremental GC. Step 2 concurrent marking and sweeping with low-pause for scavenge/compaction. No more time right now. On Sun, Dec 3, 2017 at 6:33 AM, Ben Coman <[hidden email]> wrote:
Clément Béra Pharo consortium engineer Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
Hi Ben,
> On Dec 4, 2017, at 6:38 AM, Ben Coman <[hidden email]> wrote: > >> On 4 December 2017 at 15:47, Clément Bera <[hidden email]> wrote: >> >> Here are some ideas/things to say on the tip of my head: >> - Changing an object to a forwarding object is non atomic (we need to maintain at least stack invariant) > > Thats because the whole multiword object has to be copied. > What about the reverse, flattening a forwarding-object back into the > real-object? > I presume only the one word of the object-pointer is updated. > btw, are we safe if competing threads write the *same* data to a slot, Alas a forwarding object has two fields that need to be set in separate words. The class index in the header must be the forwarder class index, and the first word of the object body must point to the forwarder. Reverting obviously requires two writes too. So atomic (un)forwarding is not possible. > > > Anyway my idea was the GC-thread would only *identify* objects to be > moved, and queue them > for the Main-thread to chip away at, so only one thread is converting > objects to forwarding objects. > > >> - To decrease the pauses in full GC I have 2 plans: >> -- incremental marking (split the mark pause in multiple small pauses): Not implemented right now. >> -- selective compaction (compacts only part of the heap instead of all the heap and sweeps the rest, similar to G1, but uses forwarders instead of lots of card marking metadata): I implemented SpurSweeper which only sweeps but works very well. >> - Currently the marking phase removes all forwarders and I would like incremental marking to maintain the same invariant (forwarders are always white). > > A concurrent-marking thread could essentially do the same. > i. From shared memory load forwarder F from object-slot > F <== object-slot > > ii. Follow forwarder to real-object, store into temporary R > R <== flattened/followed pointer > > iii. Atomic compare-and-swap R back into object-slot, > object-slot <== if F then R > > When (iii.) fails > * If I'm the Main-thread, then an other-thread already did what I wanted, > and since thats the *only* mutation other-threads can do to an object-slot. > I am certain... "object-slot == R", so since I'm handling a > failed forwarding-check, just continue with the normal retry. > * If I'm an other-thread, no hurry. Read the object-slot again and it > "should" be a real-object, otherwise just keep trying until I get one. > > So the question is... When using forwarders for compaction, > how often would fail forwarder-check fail... > * if there was one thread on its own; versus > * in the Main-thread if an other-thread had already flattened many of them > > >> - In general, Concurrent marking and sweeping have been implemented everywhere, but no concurrent compaction. For compaction you can make it selective (compact only part of the heap and the part which needs it the most) like I suggest and like in G1, which decreases considerably compaction pause time. Work on concurrent compaction is state of the art and not in production everywhere, see for example > > IIUC not many other languages use forwarding pointer like we do, and > these seem like a real advantage > to compact incrementally and concurrently. > > >> No more time right now. > > I recognise it wasn't a great format. > I really appreciate the time you could spare. > > cheers -ben > |
Hi Eliot,
On 4 December 2017 at 23:44, Eliot Miranda <[hidden email]> wrote: > > Alas a forwarding object has two fields that need to be set in separate words. > The class index in the header must be the forwarder class index, > and the first word of the object body must point to the forwarder. Good to better understand that. Just restating after condensing most of the thread, only the Main-thread would convert objects to forwarders. > Reverting obviously requires two writes too. So atomic (un)forwarding is not possible. sorry, just double-checking since I realised I erred in my example below, which may have mislead your response. IIUC(now)... while "creating" a forwarder involves writing separately to object-header and object-body. wouldn't "reverting" only involve updating an object-pointer in a slot to the real-object's new location? And that would be a single word operation on both 32bit and 64bit platforms? > >> - Currently the marking phase removes all forwarders and I would like incremental marking to maintain the same invariant (forwarders are always white). > > > > A concurrent-marking thread could essentially do the same. > > i. From shared memory load forward header Fh from object-slot > > Fheader <== object-slot > > ii. Follow forwarder to real-object, store into local temporary > > Rheader <== flattened/followed pointer > > > > iii. Atomic compare-and-swap R back into object-slot, > > object-slot <== if Fheader then Rheader whoops, corrected... i. From shared memory, concurrent marking-scan has loaded an object-pointer from object-slot. If it points to a forwarding object ... Opointer <== object-slot (as part of normal marking scan) if *Opointer isForwardingObject { Fpointer <== Opointer (redundant, just for clarity) ii. Follow that to real-object, store into local temporary Rpointer <== followed Fpointer forwarding chain to real object. iii. Atomic compare-and-swap Rpointer back into object-slot object-slot <== if still Fpointer then Rpointer When (iii.) fails * If I'm the Main-thread, I'm in the middle of handling a failed forwarder-check and can infer the GC-thread already did what I was going to do. I can be certain that... "object-slot == Rpointer" (since thats the only object-slot mutation GC-threads can do) so its fine to continue to my usual post-flattening retry * If I'm a GC-thread, either: * the Main-thread flattened the forwarder, or * the Main-thread changed which object the slot holds In either case, just re-read the slot A race scenario to consider is where the Main-thread converts an object to a forwarder simultaneous with a GC-thread flattening one of its slots. The object copy in its new location may miss that update. But in terms of the object graph that seems not a problem, since the copy without that update is still consistent. Infrequently just a little bit of work is lost. I'm contemplating other race scenarios around marking, but won't distract with them for now. cheers -ben |
Free forum by Nabble | Edit this page |