Hi list,
With Pavel and Christophe we spend some time digging these last weaks chasing the memory leaks we were seeing lately. It is a long story to tell, so this mail is divided in three: 1) A brief intro to weak structures and finalization in Pharo, for those that do not know, 2) A bit of history to explain what happened in pre-spur and post-spur, 3) The actual cause of the memory leak today, 4) How to avoid them in your application, and what are we going to do to prevent this in the future. For those that need/want/prefer just the practical explanation, you can jump over 2) and just read 1) and 3). ======================================================================== 1. A weak explanation ======================================================================== To cleanup objects upon garbage collection, Pharo and Squeak use a finalization mechanism based on a Weak Registry. That is, if you want to execute some cleanup (like closing a file) when an object is about to be collected, you have to put your object inside the weak registry with the corresponding executor/finalizer object. The object you want to 'track' is hold weakly by this weak registry i.e., if the only reference to the object is from the weak registry, it will be chosen for garbage collection. When this object is collected, a special process in the Pharo image will send #finalize to your executor object where you implement your cleanup. To interact with the weak registry, there are two main subscription messages: - #add:executor: Will add an object to the registry with the executor that is send as argument. - #add: Will add an object to the registry, and use as executor a 'shallow copy' of the object. Some conclusions to be made from this: 1) If the executor points strongly to the object that we want to collect, it will never be collected. That is why the #add: message creates a copy of the object. 2) If we do not provide an explicit executor, the registered object should already contain all information required for the finalization (like file handlers or external pointers). If not, the shallow copy will not be able to finalize correctly. Also: - Using weak objects/references do not guarantee that #finalize will be called, you need to put your object inside the registry! - Using weak objects/references do not guarantee that your object will be magically collected. You can still cause memory leaks! ======================================================================== 2. A weak story ======================================================================== Pharo and Squeak use historically the weak registry mentioned above. Because of the limitations that we mentioned, a different kind of weak structure called Ephemerons is required/more useful. To overcome some of these limitations, Igor (Hi Igor! maybe you're reading :)) implemented a couple of years ago a new finalization mechanism that, IIANM, worked as follows: - Some weak objects could have a first instance variable with a special linked list - When the object was about to be collected, instead it was removed from the weak structure and put into its container's linked list - On the image side, a special process iterated all special linked lists and executed #finalize on the weak objects This mechanism was called NewFinalization, in contrast with what was called LegacyFinalization. Of course these names are context dependent, since today's Pharo is back to the so called legacy one ;). NewFinalization was implemented as the default finalization mechanim in Pharo, both in VM and image side. But the VM changes remained in the Pharo branch of development. After some discussions, I remember Igor and Eliot agreed that what they actually needed were Ephemerons, and since Eliot had started working on Spur at that time, he said he would provide Ephemeric classes with the new object format. Basically, for those interested, an ephemeron is an association weak key -> strong value with the special quality that upon garbage collection all references to the weak key that are computed from the strong value (directly or indirectly) are taken as weak. This allows the collection of the weak key even if the strong value points to it, but requires some more machinery in the GC/VM. You can read more in here [1]. Until a couple of months/weeks ago, Pharo was using the NewFinalization mechanism with it's special image and VM support. And Squeak was using the 'Legacy' one. And then Spur arrived. So Spur arrived, and Eliot and Esteban made a lot of effort to simplify the VM's maintenance, and they merged both branches. As a conclusion, Pharo Spur VM did not support any more NewFinalization. This provoked at first some leaks because objects were not being finalized. A couple of weeks ago, we migrated back the image code to use the 'Legacy' mechanism, see issue 17537 [2]. And then finalization was not working either. Nor #finalize was being called on executors, nor objects in the weak registry were collected. As a symptom, opening any tool will cause 30 new everlasting registrations into the weakregistry, and no tools were collected. ======================================================================== 3. The cause ======================================================================== After lots of digging, we finally found what was the particular issue causing objects in the weak registry to not be collected. In some words, it is caused by the normal belief that "weak objects are magical", which caused that weak references and finalizers are really spread over the system with no proper care. And particularly related to the usage of announcements. To explain better, I made some pictures for you :) ***First, imagine you have a morph with its own local announcer. You subscribe to two events, and the graph will look like this. - the announcer knows two strong subscriptions - the subscriptions know the announcer to be able to unregister - the subscriptions know the registered object to send the message in case the event happens This forms a closed graph that will be collected. No problem so far. ***Second, let's see what happens if we use weak subsriptions: - the announcer know two weak subscriptions - these weak subscriptions know the announcer strongly to be able to unregister - they also know the subscriber object but weakly - THE difference is made by the weak registry: a global object that manages when and how objects are finalized. In the case of announcers, the weak registry will store weakly the subscriber morph, and strongly the weak announcer subscription. So far so good also: the references to the morph are weak. When the morph is collected, the weak registry will execute finalize on the announcement subscriptions. The subscriptions will unregister from the morph. ***The really problematic case is the third one: mixing weak and strong subscriptions in the same announcer. The object graph is just a mixture of the two other ones. One weak subscription and one strong subscription. BUT: - there is a strong path from a global object (the weak registry) to the subscriber (the morph) - then the morph is never collected - the weak registry never finalizes the weak announcement subscription - the graph remains there forever. And these are the simple cases that show the problem. Imagine that you can have this same configuration but in cycles/chains among different morphs/announcements. Plus this is aggravated by evil globals (e.g., the theme and the HandMorph remembers the last focused morph, the system window class remembers the last top window even if it was closed...). ======================================================================== 4. The solution? ======================================================================== Our solution for the moment is simple. We would like to enforce the following two rules for announcements: - announcers local to a morph should only be used strongly. YES, this may cause small hiccups and leaks, for example if you register a morph A to the announcer to another morph B. But in the long term, these two will form a closed graph and will be collected. - announcers used globally, such as the System announcer, should be used only and uniquely in a weak manner. Like that we ensure that they are loosely coupled for real. So, please, please, do not use weak announcements unless you're really sure of what you're doing. At least, until we have ephemerons and we are sure everything works as expected. Ephemerons would solve this in a more natural way: if we model the weak registry subscription as an ephemeron, any reference to the weak #key that arrives from the #value will be treated as weak also. Other action points we are working on: - fixing tools to follow the rules above - We are also writing tests to check that tools (gt*, Nautilus, Rubric, FT) do not leak. - chasing other small memory leaks created by stepping, focus global variables... ((fogbugz allIssues select: [ :each | each relatedToLeak ]) flatCollect: [ :each | each participants ]) do: #thanks [1] https://en.wikipedia.org/wiki/Ephemeron [2] https://pharo.fogbugz.com/f/cases/17537/SystemAnnouncer-has-far-too-many-subscriptions |
On 12/04/2016 11:41, Guille Polito wrote: > Hi list, > > With Pavel and Christophe we spend some time digging these last weaks > chasing the memory leaks we were seeing lately. It is a long story to > tell, so this mail is divided in three: > > 1) A brief intro to weak structures and finalization in Pharo, for those > that do not know, > 2) A bit of history to explain what happened in pre-spur and post-spur, > 3) The actual cause of the memory leak today, > 4) How to avoid them in your application, and what are we going to do to > prevent this in the future. > > For those that need/want/prefer just the practical explanation, you can > jump over 2) and just read 1) and 3). > > ======================================================================== > 1. A weak explanation > ======================================================================== > > To cleanup objects upon garbage collection, Pharo and Squeak use a > finalization mechanism based on a Weak Registry. That is, if you want to > execute some cleanup (like closing a file) when an object is about to be > collected, you have to put your object inside the weak registry with the > corresponding executor/finalizer object. The object you want to 'track' > is hold weakly by this weak registry i.e., if the only reference to the > object is from the weak registry, it will be chosen for garbage > collection. When this object is collected, a special process in the > Pharo image will send #finalize to your executor object where you > implement your cleanup. > > To interact with the weak registry, there are two main subscription > messages: > > - #add:executor: > > Will add an object to the registry with the executor that is send as > argument. > > - #add: > > Will add an object to the registry, and use as executor a 'shallow > copy' of the object. > > Some conclusions to be made from this: > 1) If the executor points strongly to the object that we want to > collect, it will never be collected. That is why the #add: message > creates a copy of the object. > 2) If we do not provide an explicit executor, the registered object > should already contain all information required for the finalization > (like file handlers or external pointers). If not, the shallow copy will > not be able to finalize correctly. > > Also: > - Using weak objects/references do not guarantee that #finalize will be > called, you need to put your object inside the registry! > - Using weak objects/references do not guarantee that your object will > be magically collected. You can still cause memory leaks! > > ======================================================================== > 2. A weak story > ======================================================================== > > Pharo and Squeak use historically the weak registry mentioned above. > Because of the limitations that we mentioned, a different kind of weak > structure called Ephemerons is required/more useful. To overcome some of > these limitations, Igor (Hi Igor! maybe you're reading :)) implemented a > couple of years ago a new finalization mechanism that, IIANM, worked as > follows: > > - Some weak objects could have a first instance variable with a special > linked list > - When the object was about to be collected, instead it was removed from > the weak structure and put into its container's linked list > - On the image side, a special process iterated all special linked lists > and executed #finalize on the weak objects > > This mechanism was called NewFinalization, in contrast with what was > called LegacyFinalization. Of course these names are context dependent, > since today's Pharo is back to the so called legacy one ;). > NewFinalization was implemented as the default finalization mechanim in > Pharo, both in VM and image side. But the VM changes remained in the > Pharo branch of development. After some discussions, I remember Igor and > Eliot agreed that what they actually needed were Ephemerons, and since > Eliot had started working on Spur at that time, he said he would provide > Ephemeric classes with the new object format. > > Basically, for those interested, an ephemeron is an association > > weak key -> strong value > > with the special quality that upon garbage collection all references to > the weak key that are computed from the strong value (directly or > indirectly) are taken as weak. This allows the collection of the weak > key even if the strong value points to it, but requires some more > machinery in the GC/VM. You can read more in here [1]. > > Until a couple of months/weeks ago, Pharo was using the NewFinalization > mechanism with it's special image and VM support. And Squeak was using > the 'Legacy' one. And then Spur arrived. > > So Spur arrived, and Eliot and Esteban made a lot of effort to simplify > the VM's maintenance, and they merged both branches. As a conclusion, > Pharo Spur VM did not support any more NewFinalization. This provoked at > first some leaks because objects were not being finalized. A couple of > weeks ago, we migrated back the image code to use the 'Legacy' > mechanism, see issue 17537 [2]. > > And then finalization was not working either. Nor #finalize was being > called on executors, nor objects in the weak registry were collected. As > a symptom, opening any tool will cause 30 new everlasting registrations > into the weakregistry, and no tools were collected. > > > ======================================================================== > 3. The cause > ======================================================================== > > After lots of digging, we finally found what was the particular issue > causing objects in the weak registry to not be collected. In some words, > it is caused by the normal belief that "weak objects are magical", which > caused that weak references and finalizers are really spread over the > system with no proper care. And particularly related to the usage of > announcements. > > To explain better, I made some pictures for you :) > > > ***First, imagine you have a morph with its own local announcer. You > subscribe to two events, and the graph will look like this. > > strong-graph > > - the announcer knows two strong subscriptions > - the subscriptions know the announcer to be able to unregister > - the subscriptions know the registered object to send the message in > case the event happens > > This forms a closed graph that will be collected. No problem so far. > > > ***Second, let's see what happens if we use weak subsriptions: > > > > - the announcer know two weak subscriptions > - these weak subscriptions know the announcer strongly to be able to > unregister > - they also know the subscriber object but weakly > - THE difference is made by the weak registry: a global object that > manages when and how objects are finalized. In the case of announcers, > the weak registry will store weakly the subscriber morph, and strongly > the weak announcer subscription. > > So far so good also: the references to the morph are weak. When the > morph is collected, the weak registry will execute finalize on the > announcement subscriptions. The subscriptions will unregister from the > morph. > > > ***The really problematic case is the third one: mixing weak and strong > subscriptions in the same announcer. > > > > The object graph is just a mixture of the two other ones. One weak > subscription and one strong subscription. BUT: > > - there is a strong path from a global object (the weak registry) to > the subscriber (the morph) > - then the morph is never collected > - the weak registry never finalizes the weak announcement subscription > - the graph remains there forever. > > > And these are the simple cases that show the problem. Imagine that you > can have this same configuration but in cycles/chains among different > morphs/announcements. Plus this is aggravated by evil globals (e.g., the > theme and the HandMorph remembers the last focused morph, the system > window class remembers the last top window even if it was closed...). > > > ======================================================================== > 4. The solution? > ======================================================================== > > Our solution for the moment is simple. We would like to enforce the > following two rules for announcements: > > - announcers local to a morph should only be used strongly. YES, this > may cause small hiccups and leaks, for example if you register a morph A > to the announcer to another morph B. But in the long term, these two > will form a closed graph and will be collected. > > - announcers used globally, such as the System announcer, should be used > only and uniquely in a weak manner. Like that we ensure that they are > loosely coupled for real. > > So, please, please, do not use weak announcements unless you're really > sure of what you're doing. At least, until we have ephemerons and we are > sure everything works as expected. Ephemerons would solve this in a more > natural way: if we model the weak registry subscription as an ephemeron, > any reference to the weak #key that arrives from the #value will be > treated as weak also. > > Other action points we are working on: > - fixing tools to follow the rules above > - We are also writing tests to check that tools (gt*, Nautilus, Rubric, > FT) do not leak. > - chasing other small memory leaks created by stepping, focus global > variables... > > > ((fogbugz allIssues select: [ :each | each relatedToLeak ]) > flatCollect: [ :each | each participants ]) > do: #thanks > > > > [1] https://en.wikipedia.org/wiki/Ephemeron > [2] > https://pharo.fogbugz.com/f/cases/17537/SystemAnnouncer-has-far-too-many-subscriptions > the announcements. -- Cyril Ferlicot http://www.synectique.eu 165 Avenue Bretagne Lille 59000 France signature.asc (817 bytes) Download Attachment |
In reply to this post by Guillermo Polito
Hi Guille,
forgive me for not responding with code immediately. Proper Ephemeron support is already in the Spur VM, plus a "proper" finalization queue, which allows us to drop the weak registries, which have a scaling problem. The proper Ephemeron support required ClassBuilder changes. I can provide Squeak code soon, but not until the end of the week; Clément and I have two presentations to prepare today and tomorrow, and it's 4am... Replacing the weak registry with the proper finalization queue, in which appears both triggered ephemerons and weak collections that have lost references, means that individual ephemerons and weak collections can mourn, instead of the system having to scan all weak collections in all weak registries whenever a single weak collection loses a referent. My own Ephemeron story is that when I tried to replace the weak registry with the proper finalization queue in Squeak last year the mysterious symptom I had was the system running out of file descriptors and source file access stopping working. I haven't had time (or a collaborator of two) to dig further. So if there is a brave soul or two interested in getting ephemerons released and tested I'd love to get in touch and get this done. We need ephemerons and I expect we want a scalable weak mourning scheme. _,,,^..^,,,_ (phone)
|
In reply to this post by Guillermo Polito
Tahnks Guille, Pavel and Christophe!
I think the essence of this e-mail should serve as documentation and should be retained somewhere. (Go Ephemerons!) Max
|
In reply to this post by Guillermo Polito
Great explanation indeed. I could not try the fogbugz snippet, but it hopefully should lead to his one:http://forum.world.st/Some-Memory-Leak-tp4814779p4814906.html 2016-04-12 11:41 GMT+02:00 Guille Polito <[hidden email]>:
...snip... |
In reply to this post by Guillermo Polito
Super cool mail!
Thanks for the time you spent to write it. I will publish it on pharo weekly. Stef Le 12/4/16 11:41, Guille Polito a
écrit :
Hi list, |
In reply to this post by Eliot Miranda-2
On Tue, Apr 12, 2016 at 6:46 AM, Levente Uzonyi <[hidden email]> wrote:
The answer is yes. Find attached an Ephemeron definition (but with no methods) and the new finalisation scheme. To use the scheme one must evaluate Smalltalk supportsQueueingFinalization: true. Ephemeron needs a suitable mourn method, which would send finalise to its key, but also remove the ephemeron from whatever registry we keep ephemerons in. As I say, though, the first order of business is to find out why turning on the scheme causes file access to fail very soon after. Levente _,,,^..^,,,_ best, Eliot |
In reply to this post by Guillermo Polito
Hi,
Great work. Thank you very much for this summary. Cheers, Doru > On Apr 12, 2016, at 2:41 AM, Guille Polito <[hidden email]> wrote: > > Hi list, > > With Pavel and Christophe we spend some time digging these last weaks chasing the memory leaks we were seeing lately. It is a long story to tell, so this mail is divided in three: > > 1) A brief intro to weak structures and finalization in Pharo, for those that do not know, > 2) A bit of history to explain what happened in pre-spur and post-spur, > 3) The actual cause of the memory leak today, > 4) How to avoid them in your application, and what are we going to do to prevent this in the future. > > For those that need/want/prefer just the practical explanation, you can jump over 2) and just read 1) and 3). > > ======================================================================== > 1. A weak explanation > ======================================================================== > > To cleanup objects upon garbage collection, Pharo and Squeak use a finalization mechanism based on a Weak Registry. That is, if you want to execute some cleanup (like closing a file) when an object is about to be collected, you have to put your object inside the weak registry with the corresponding executor/finalizer object. The object you want to 'track' is hold weakly by this weak registry i.e., if the only reference to the object is from the weak registry, it will be chosen for garbage collection. When this object is collected, a special process in the Pharo image will send #finalize to your executor object where you implement your cleanup. > > To interact with the weak registry, there are two main subscription messages: > > - #add:executor: > > Will add an object to the registry with the executor that is send as argument. > > - #add: > > Will add an object to the registry, and use as executor a 'shallow copy' of the object. > > Some conclusions to be made from this: > 1) If the executor points strongly to the object that we want to collect, it will never be collected. That is why the #add: message creates a copy of the object. > 2) If we do not provide an explicit executor, the registered object should already contain all information required for the finalization (like file handlers or external pointers). If not, the shallow copy will not be able to finalize correctly. > > Also: > - Using weak objects/references do not guarantee that #finalize will be called, you need to put your object inside the registry! > - Using weak objects/references do not guarantee that your object will be magically collected. You can still cause memory leaks! > > ======================================================================== > 2. A weak story > ======================================================================== > > Pharo and Squeak use historically the weak registry mentioned above. Because of the limitations that we mentioned, a different kind of weak structure called Ephemerons is required/more useful. To overcome some of these limitations, Igor (Hi Igor! maybe you're reading :)) implemented a couple of years ago a new finalization mechanism that, IIANM, worked as follows: > > - Some weak objects could have a first instance variable with a special linked list > - When the object was about to be collected, instead it was removed from the weak structure and put into its container's linked list > - On the image side, a special process iterated all special linked lists and executed #finalize on the weak objects > > This mechanism was called NewFinalization, in contrast with what was called LegacyFinalization. Of course these names are context dependent, since today's Pharo is back to the so called legacy one ;). NewFinalization was implemented as the default finalization mechanim in Pharo, both in VM and image side. But the VM changes remained in the Pharo branch of development. After some discussions, I remember Igor and Eliot agreed that what they actually needed were Ephemerons, and since Eliot had started working on Spur at that time, he said he would provide Ephemeric classes with the new object format. > > Basically, for those interested, an ephemeron is an association > > weak key -> strong value > > with the special quality that upon garbage collection all references to the weak key that are computed from the strong value (directly or indirectly) are taken as weak. This allows the collection of the weak key even if the strong value points to it, but requires some more machinery in the GC/VM. You can read more in here [1]. > > Until a couple of months/weeks ago, Pharo was using the NewFinalization mechanism with it's special image and VM support. And Squeak was using the 'Legacy' one. And then Spur arrived. > > So Spur arrived, and Eliot and Esteban made a lot of effort to simplify the VM's maintenance, and they merged both branches. As a conclusion, Pharo Spur VM did not support any more NewFinalization. This provoked at first some leaks because objects were not being finalized. A couple of weeks ago, we migrated back the image code to use the 'Legacy' mechanism, see issue 17537 [2]. > > And then finalization was not working either. Nor #finalize was being called on executors, nor objects in the weak registry were collected. As a symptom, opening any tool will cause 30 new everlasting registrations into the weakregistry, and no tools were collected. > > > ======================================================================== > 3. The cause > ======================================================================== > > After lots of digging, we finally found what was the particular issue causing objects in the weak registry to not be collected. In some words, it is caused by the normal belief that "weak objects are magical", which caused that weak references and finalizers are really spread over the system with no proper care. And particularly related to the usage of announcements. > > To explain better, I made some pictures for you :) > > > ***First, imagine you have a morph with its own local announcer. You subscribe to two events, and the graph will look like this. > > <strong.png> > > - the announcer knows two strong subscriptions > - the subscriptions know the announcer to be able to unregister > - the subscriptions know the registered object to send the message in case the event happens > > This forms a closed graph that will be collected. No problem so far. > > > ***Second, let's see what happens if we use weak subsriptions: > > <weak.png> > > - the announcer know two weak subscriptions > - these weak subscriptions know the announcer strongly to be able to unregister > - they also know the subscriber object but weakly > - THE difference is made by the weak registry: a global object that manages when and how objects are finalized. In the case of announcers, the weak registry will store weakly the subscriber morph, and strongly the weak announcer subscription. > > So far so good also: the references to the morph are weak. When the morph is collected, the weak registry will execute finalize on the announcement subscriptions. The subscriptions will unregister from the morph. > > > ***The really problematic case is the third one: mixing weak and strong subscriptions in the same announcer. > > <both.png> > > The object graph is just a mixture of the two other ones. One weak subscription and one strong subscription. BUT: > > - there is a strong path from a global object (the weak registry) to the subscriber (the morph) > - then the morph is never collected > - the weak registry never finalizes the weak announcement subscription > - the graph remains there forever. > > > And these are the simple cases that show the problem. Imagine that you can have this same configuration but in cycles/chains among different morphs/announcements. Plus this is aggravated by evil globals (e.g., the theme and the HandMorph remembers the last focused morph, the system window class remembers the last top window even if it was closed...). > > > ======================================================================== > 4. The solution? > ======================================================================== > > Our solution for the moment is simple. We would like to enforce the following two rules for announcements: > > - announcers local to a morph should only be used strongly. YES, this may cause small hiccups and leaks, for example if you register a morph A to the announcer to another morph B. But in the long term, these two will form a closed graph and will be collected. > > - announcers used globally, such as the System announcer, should be used only and uniquely in a weak manner. Like that we ensure that they are loosely coupled for real. > > So, please, please, do not use weak announcements unless you're really sure of what you're doing. At least, until we have ephemerons and we are sure everything works as expected. Ephemerons would solve this in a more natural way: if we model the weak registry subscription as an ephemeron, any reference to the weak #key that arrives from the #value will be treated as weak also. > > Other action points we are working on: > - fixing tools to follow the rules above > - We are also writing tests to check that tools (gt*, Nautilus, Rubric, FT) do not leak. > - chasing other small memory leaks created by stepping, focus global variables... > > > ((fogbugz allIssues select: [ :each | each relatedToLeak ]) > flatCollect: [ :each | each participants ]) > do: #thanks > > > > [1] https://en.wikipedia.org/wiki/Ephemeron > [2] https://pharo.fogbugz.com/f/cases/17537/SystemAnnouncer-has-far-too-many-subscriptions -- www.tudorgirba.com www.feenk.com "Things happen when they happen, not when you talk about them happening." |
Free forum by Nabble | Edit this page |