Fwd: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hi all,

Thanks for all the hints, I finally got that thing flying!

Building with MacPorts clang-6.0 and editing flags finally worked when I found all the places (OFLAGS and LDFLAGS took a while to figure out). Building with Apple clang gave me "leak detection not supported", so that one is probably too old for what I want to do. Now I still have to trigger the leak I am after. I will provide a bug report in the leaking component and I will report the diff on flags I had to make, they might be useful for others. It was pretty dirty, e.g. I had to symlink some objects in the MacPorts AddressSanitizer installation as some binaries were linked with wrong paths, ew.

Again, thanks for encouraging me by telling me that I should be possible, I was already close to giving up. This Xcode env is a beast I try to avoid, too opaque to me.

Below is an example report, with the symbolizer it even tells you the source location, pretty handy. I guess this is not really a leak, but expected behavior for FFI.

Cheers,
Manuel


=================================================================
==12156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9080 byte(s) in 15 object(s) allocated from:
    #0 0x10a2cb2cc in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x592cc)
    #1 0x10a091410 in primitiveFFIAllocate (Pharo:x86_64+0x100315410)
    #2 0x1137369b7  (<unknown module>)
    #3 0x109d7e56c in interpret gcc3x-cointerp.c:2754
    #4 0x109f7dc29 in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #5 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #6 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #7 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #8 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #9 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #10 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #16 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #17 0x109f6e553 in main main.m:52
    #18 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)
...

Begin forwarded message:

From: John M McIntosh <[hidden email]>
Subject: Re: [Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer
Date: 26 January 2019 at 01:04:30 GMT+1
To: Manuel Leuenberger <[hidden email]>
Reply-To: John M McIntosh <[hidden email]>

There is a branch in the source tree where I have a Xcode build for deployment of the vm on iOS and OS X.  Change the vm source  group folder to compile the cog stack or vm.

Although a year or two old you might be able to git merge it into the  current OS X build. You can then build the vm with Xcode settings and run instruments etc.

As Eliot points out you can also go in and alter the current make options for the cod line build 

....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk




On Fri, Jan 25, 2019 at 06:04, Manuel Leuenberger <[hidden email]> wrote:
Hi Ben,

Thanks for the pointer, I subscribed to those two mailing lists as well.

Cheers,
Manuel

On 25 Jan 2019, at 14:25, Ben Coman <[hidden email]> wrote:

Hi Manuel,

This level question would be better handled on pharo-dev and vm-dev mail lists (forwarded a copy).  

On Fri, 25 Jan 2019 at 16:09, Manuel Leuenberger <[hidden email]> wrote:
Hi,

Anybody has experience building the VM (pharo.cog.spur) on macOS with custom compiler and flags?

I am observing a memory leak on macOS Sierra/Pharo 7/GToolkit. I cannot precisely locate the issue, e.g. I cannot reliably reproduce the problem. So I tried to use AddressSanitizer/LeakSanitizer with custom debug builds of the VM and libMoz2D (used by GToolkit). As building on Linux is easier than on macOS, I got myself some help from a colleague and we managed to build those two artifacts on Linux by sprinkling "-g3 -O0 -fsanitize=leak -fsanitize-memory-track-origins" into the CFLAGS wherever they appear or are picked up. This works fine on Debian 9 with "CC=clang-6.0 CXX=clang++-6.0 ./mvm -d" and some on-the-fly patches for other breaking things.

Unfortunately, I cannot reproduce the problem on Linux, so it seems to be platform specific. Even more so, I seem to be the only one observing the issue. Other GToolkit users do not report this issue. As these kind of issues are freaking me out, I cannot let go and need to learn what causes them. So, here I am, trying to build the VM on macOS Sierra and miserably failing to include the sanitizer tags with Clang and giving me those tasty leak reports on VM shutdown.

The leak is not observable in the image memory, i.e. the image size reported by the VM is not growing, I am suspecting FFI, possibly in Moz2D, to leak.
The FFI registry in the VM also does not report many or large surviving objects, so I would assume the problem to be either on the plugin or VM side.

My two concrete questions are:
1. Is it even feasible to detect leaks with this approach? E.g., how does the VM behave on shutdown? Will the leaks show up in the report or will they not be covered, as the VM actually frees everything before shutting down?
2. How do I swap compiler and inject flags on macOS builds? On Linux it was fairly easily done by replacing some variables. But on macOS, Xcode seems to like to mess with me. I got myself Clang-6.0 on macOS from MacPorts, the same as version as I took on Linux. I can build the VM with this, but it there is no leak report on VM shutdown. Anybody has an idea how to debug the build?

Bonus question: Can the VM provide an inspectable view of all the allocated memory to the image? That would be perfect. I think Luke Gorrie does some magic like this in his Studio for core dumps etc.

Cheers,
Manuel






Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hm, something is still not working. I compiled pharo.cog.spur and libMoz2D with clang-7.0 and '-fsanitize=address -fsanitize-recover=address -mllvm -asan-stack=0 -mllvm -asan-globals=0 -fno-omit-frame-pointer -g3' or '-fsanitize=leak -fno-omit-frame-pointer -g3', and then run it with 'ASAN_OPTIONS=detect_leaks=1 PharoDebug.app/Contents/MacOS/Pharo Pharo.image'. This gives me the leak reports on shutdown, but I cannot view any Bloc element, I always get a primitive failed error in 

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

I do not get stderr that would give me any hints what failed exactly. So, did the primitive fail during execution or was it not found at all? I looked into the TravisCI build and compiled with 'ARCH="macos64x64" FLAVOR="pharo.cog.spur" ./mvm -d', but that did not change anything. I noticed that the SqueakFFIPrims plugin defines primitiveCreateManualSurface on all architectures, but how do I ensure and check that the built VM actually includes this primitive? Am I missing another flag or env variable?

Cheers,
Manuel

On 26 Jan 2019, at 18:09, Manuel Leuenberger <[hidden email]> wrote:

Hi all,

Thanks for all the hints, I finally got that thing flying!

Building with MacPorts clang-6.0 and editing flags finally worked when I found all the places (OFLAGS and LDFLAGS took a while to figure out). Building with Apple clang gave me "leak detection not supported", so that one is probably too old for what I want to do. Now I still have to trigger the leak I am after. I will provide a bug report in the leaking component and I will report the diff on flags I had to make, they might be useful for others. It was pretty dirty, e.g. I had to symlink some objects in the MacPorts AddressSanitizer installation as some binaries were linked with wrong paths, ew.

Again, thanks for encouraging me by telling me that I should be possible, I was already close to giving up. This Xcode env is a beast I try to avoid, too opaque to me.

Below is an example report, with the symbolizer it even tells you the source location, pretty handy. I guess this is not really a leak, but expected behavior for FFI.

Cheers,
Manuel


=================================================================
==12156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9080 byte(s) in 15 object(s) allocated from:
    #0 0x10a2cb2cc in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x592cc)
    #1 0x10a091410 in primitiveFFIAllocate (Pharo:x86_64+0x100315410)
    #2 0x1137369b7  (<unknown module>)
    #3 0x109d7e56c in interpret gcc3x-cointerp.c:2754
    #4 0x109f7dc29 in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #5 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #6 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #7 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #8 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #9 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #10 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #16 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #17 0x109f6e553 in main main.m:52
    #18 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)
...

Begin forwarded message:

From: John M McIntosh <[hidden email]>
Subject: Re: [Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer
Date: 26 January 2019 at 01:04:30 GMT+1
To: Manuel Leuenberger <[hidden email]>
Reply-To: John M McIntosh <[hidden email]>

There is a branch in the source tree where I have a Xcode build for deployment of the vm on iOS and OS X.  Change the vm source  group folder to compile the cog stack or vm.

Although a year or two old you might be able to git merge it into the  current OS X build. You can then build the vm with Xcode settings and run instruments etc.

As Eliot points out you can also go in and alter the current make options for the cod line build 

....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk




On Fri, Jan 25, 2019 at 06:04, Manuel Leuenberger <[hidden email]> wrote:
Hi Ben,

Thanks for the pointer, I subscribed to those two mailing lists as well.

Cheers,
Manuel

On 25 Jan 2019, at 14:25, Ben Coman <[hidden email]> wrote:

Hi Manuel,

This level question would be better handled on pharo-dev and vm-dev mail lists (forwarded a copy).  

On Fri, 25 Jan 2019 at 16:09, Manuel Leuenberger <[hidden email]> wrote:
Hi,

Anybody has experience building the VM (pharo.cog.spur) on macOS with custom compiler and flags?

I am observing a memory leak on macOS Sierra/Pharo 7/GToolkit. I cannot precisely locate the issue, e.g. I cannot reliably reproduce the problem. So I tried to use AddressSanitizer/LeakSanitizer with custom debug builds of the VM and libMoz2D (used by GToolkit). As building on Linux is easier than on macOS, I got myself some help from a colleague and we managed to build those two artifacts on Linux by sprinkling "-g3 -O0 -fsanitize=leak -fsanitize-memory-track-origins" into the CFLAGS wherever they appear or are picked up. This works fine on Debian 9 with "CC=clang-6.0 CXX=clang++-6.0 ./mvm -d" and some on-the-fly patches for other breaking things.

Unfortunately, I cannot reproduce the problem on Linux, so it seems to be platform specific. Even more so, I seem to be the only one observing the issue. Other GToolkit users do not report this issue. As these kind of issues are freaking me out, I cannot let go and need to learn what causes them. So, here I am, trying to build the VM on macOS Sierra and miserably failing to include the sanitizer tags with Clang and giving me those tasty leak reports on VM shutdown.

The leak is not observable in the image memory, i.e. the image size reported by the VM is not growing, I am suspecting FFI, possibly in Moz2D, to leak.
The FFI registry in the VM also does not report many or large surviving objects, so I would assume the problem to be either on the plugin or VM side.

My two concrete questions are:
1. Is it even feasible to detect leaks with this approach? E.g., how does the VM behave on shutdown? Will the leaks show up in the report or will they not be covered, as the VM actually frees everything before shutting down?
2. How do I swap compiler and inject flags on macOS builds? On Linux it was fairly easily done by replacing some variables. But on macOS, Xcode seems to like to mess with me. I got myself Clang-6.0 on macOS from MacPorts, the same as version as I took on Linux. I can build the VM with this, but it there is no leak report on VM shutdown. Anybody has an idea how to debug the build?

Bonus question: Can the VM provide an inspectable view of all the allocated memory to the image? That would be perfect. I think Luke Gorrie does some magic like this in his Studio for core dumps etc.

Cheers,
Manuel







Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

alistairgrant
 
Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:
>
> BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
> <primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
> self primitiveFailed
>
> How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
    <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
    self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

> On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:
>
>
> Hi Manuel,
>
> On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
> <[hidden email]> wrote:
>>
>> BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
>> <primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
>> self primitiveFailed
>>
>> How can I debug this?
>
> My first suggestion would be to see if the primitive is returning a
> descriptive error.  The method would become (untested):
>
> BlExternalForm >> primCreateManualSurfaceWidth: width height: height
> rowPitch: rowPitch depth: depth isMSB: isMSB
>    <primitive: 'primitiveCreateManualSurface' module:
> 'SqueakFFIPrims' error: error>
>    self primitiveFailed
>
> When the primitive fails, open the debugger, inspect this method in
> the call stack and check the value of error.  If the primitive hasn't
> been found, error will be something like #'not found' (from memory).
> If it's something else other than nil, report back with the value.
>
> If error is nil it will be a matter of diving in to the Slang / C code
> and figuring it out :-).  If you've successfully built a debug version
> of the VM, maybe the easiest next step is to set a breakpoint in
> primitiveCreateManualSurface and step through?
>
> Cheers,
> Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

On 29 Jan 2019, at 00:18, Manuel Leuenberger <[hidden email]> wrote:


Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:


Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
  <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
  self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Eliot Miranda-2
 
Hi Manuel,

On Jan 29, 2019, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:

As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Wonderful!!!! Welcome! But see below.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.  What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.  

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?


On 29 Jan 2019, at 00:18, Manuel Leuenberger <[hidden email]> wrote:


Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:


Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
  <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
  self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair

_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Ben Coman
 


On Tue, 29 Jan 2019 at 22:32, Eliot Miranda <[hidden email]> wrote:
 
Hi Manuel,

On Jan 29, 2019, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:

As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Wonderful!!!! Welcome! But see below.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.  What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.  

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

 Just brainstorming, a markdown page here...

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

timrowledge
In reply to this post by Manuel Leuenberger
 


> On 2019-01-29, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:
>
>
> Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

It's just luring you into a very false sense of security. You remember 'Alien'? That was based on the life of a VM developer..

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Political science


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

timrowledge
In reply to this post by Eliot Miranda-2
 


> On 2019-01-29, at 6:32 AM, Eliot Miranda <[hidden email]> wrote:
>
>
> What do you think?  What to others think?  How do we best capture this for posterity?

Mail list posts are one form of immortality but can easily get lost in the growing cruft as time passes. It's also difficult to establish the currency of the info a year (or ten!) later. Blog posts and forum posts have the same problems.


Pages/books on github or similar have the virtue of being hosted in a reasonably secure manner and carry date /version metadata with them, so that could be helpful. Linking to the doc in the list, in blogs or tweets spreads the signal. Linking to the (hopefully) large and growing tree of well written and well cared for doc from the sources would be very useful.

More important than any of the details is simply getting the doc produced and reviewed. Collectively we (software dweebs) are *useless* at doc, even though it constantly bites us.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: PFM: Pray For Miracle


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

alistairgrant
In reply to this post by Eliot Miranda-2
 
On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:

>
>
> ...
>
> So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
> What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.
>
> The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.
>
> What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Guillermo Polito
 
Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

johnmci
 
Hi, so if you point your git tool to the squeak vm tree at commit 
fae3d7eb403158f177c306f668cb461c35c99419
The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 

if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...


As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 



....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:

Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair





--

   

Guille Polito

Research Engineer


Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Found it, reproduced it, fixed it.


I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).

Took me two months to write the necessary two lines of code.

Cheers,
Manuel

On 28 Feb 2019, at 02:52, John M McIntosh <[hidden email]> wrote:

Hi, so if you point your git tool to the squeak vm tree at commit 
fae3d7eb403158f177c306f668cb461c35c99419
The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 

if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...


As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 



....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:

Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair





--
   
Guille Polito
Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - http://www.cnrs.fr

Phone: +33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Tobias Pape
 
Hi,

> On 28.02.2019, at 11:57, Manuel Leuenberger <[hidden email]> wrote:
>
> Found it, reproduced it, fixed it.
>
> https://twitter.com/maenuleu/status/1101068937750634496
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/373
>
> I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).
>
> Took me two months to write the necessary two lines of code.

Read it. Makes sense.
I first had the suspicion that the thing allocated ehre may be sent/processed somewhere else, too. But I read around and it looks good.

Have added a few comments to the PR. Am fine with the PR.

Best regards
        -tObias


>
> Cheers,
> Manuel
>
>> On 28 Feb 2019, at 02:52, John M McIntosh <[hidden email]> wrote:
>>
>> Hi, so if you point your git tool to the squeak vm tree at commit
>> fae3d7eb403158f177c306f668cb461c35c99419
>> The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc.
>>
>> if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...
>>
>>
>> As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object.
>>
>>
>>
>> ....
>> John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
>>
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:
>>
>>> Hi Manuel,
>>>
>>> Disclaimer: I'm not an objectiveC expert either :)
>>>
>>> When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
>>> Actually, putting the [release] was producing for me a segmentation fault.
>>> Did you try it yourself and did it fix the problem for you?
>>>
>>>
>>> On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
>>>  
>>> Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find
>>>
>>> - (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
>>>        NSEvent *event;
>>>        NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>>        while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
>>>                              untilDate:nil
>>>                              inMode:NSEventTrackingRunLoopMode
>>>                              dequeue:YES])) {
>>>          // If the event is not a system event or an event of *this* window, queue the event
>>>          // Otherwise treat the event normally and send it to the app
>>>          if (event.window && event.window != gDelegateApp.window){
>>>            [alienEventQueue addObject: event];
>>>          }else{
>>>            [NSApp sendEvent: event];
>>>            if (signal) {
>>>                interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
>>>            }
>>>          }
>>>      }
>>>
>>>      // Put back in the queue all events that did not belong to this window
>>>      // They will be managed by other windowing systems
>>>      // (or by a subsequent invocation to this function)
>>>      while ((event = [alienEventQueue firstObject])) {
>>>        [NSApp postEvent: event atStart: NO];
>>>        [alienEventQueue removeObject: event];
>>>      }
>>> }
>>>
>>> I have no clue about Objective-C, but I think
>>>
>>>  NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>>
>>> makes me own the queue on the heap, isn't there a
>>>
>>> [alienEventQueue release];
>>>
>>> missing at the end?
>>>
>>> Cheers,
>>> Manuel
>>>
>>>> On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:
>>>>
>>>> Alright, back in the game, this time using gperftools.
>>>>
>>>> I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:
>>>>
>>>> NECController reset.
>>>> FreeTypeCache current removeAll.
>>>> ASTCache reset.
>>>> Smalltalk garbageCollect.
>>>>
>>>> These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With
>>>>
>>>> SpaceTally printSpaceAnalysis.
>>>>
>>>> I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this
>>>>
>>>> instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
>>>> ReferenceFinder findPathTo: instances atRandom.
>>>>
>>>> With
>>>>
>>>> FFIExternalResourceManager uniqueInstance instVarNamed: #registry
>>>>
>>>> I also made sure that external objects are clean.
>>>>
>>>> Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.
>>>>
>>>> Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.
>>>>
>>>> https://www.dropbox.com/s/zusnfn5mhyw3rmp/memory-analysis.zip?dl=0
>>>>
>>>> I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.
>>>>
>>>> My current hypothesis is:
>>>>
>>>> The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.
>>>>
>>>> Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.
>>>>
>>>> As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.
>>>>
>>>> Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.
>>>>
>>>> Cheers,
>>>> Manuel
>>>>
>>>>> On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:
>>>>>
>>>>> I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.
>>>>>
>>>>> I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.
>>>>>
>>>>> Cheers,
>>>>> Manuel
>>>>>
>>>>> <Screen Shot 2019-01-30 at 18.27.22.png>
>>>>> Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
>>>>>     #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
>>>>>     #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
>>>>>     #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
>>>>>     #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
>>>>>     #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
>>>>>     #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
>>>>>     #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
>>>>>     #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
>>>>>     #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
>>>>>     #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
>>>>>     #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
>>>>>     #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
>>>>>     #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
>>>>>     #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
>>>>>     #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
>>>>>     #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
>>>>>     #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
>>>>>     #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
>>>>>     #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
>>>>>     #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
>>>>>     #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
>>>>>     #21 0x119a75794  (<unknown module>)
>>>>>     #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
>>>>>     #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
>>>>>     #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
>>>>>     #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
>>>>>     #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
>>>>>     #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
>>>>>     #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
>>>>>     #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>    
>>> Guille Polito
>>> Research Engineer
>>>
>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>> CRIStAL - UMR 9189
>>> French National Center for Scientific Research - http://www.cnrs.fr
>>>
>>> Web: http://guillep.github.io
>>> Phone: +33 06 52 70 66 13
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Yeah, I am not sure if my fix only has the effects wanted to release the alienEventQueue, we need expert judgment here.

On 28 Feb 2019, at 12:19, Tobias Pape <[hidden email]> wrote:


Hi,

On 28.02.2019, at 11:57, Manuel Leuenberger <[hidden email]> wrote:

Found it, reproduced it, fixed it.

https://twitter.com/maenuleu/status/1101068937750634496
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/373

I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).

Took me two months to write the necessary two lines of code.

Read it. Makes sense.
I first had the suspicion that the thing allocated ehre may be sent/processed somewhere else, too. But I read around and it looks good.

Have added a few comments to the PR. Am fine with the PR.

Best regards
-tObias



Cheers,
Manuel

On 28 Feb 2019, at 02:52, John M McIntosh <[hidden email]> wrote:

Hi, so if you point your git tool to the squeak vm tree at commit 
fae3d7eb403158f177c306f668cb461c35c99419
The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 

if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...


As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 



....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:

Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:

Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
      NSEvent *event;
      NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
      while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                            untilDate:nil
                            inMode:NSEventTrackingRunLoopMode
                            dequeue:YES])) {
        // If the event is not a system event or an event of *this* window, queue the event
        // Otherwise treat the event normally and send it to the app
        if (event.window && event.window != gDelegateApp.window){
          [alienEventQueue addObject: event];
        }else{
          [NSApp sendEvent: event];
          if (signal) {
              interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
          }
        }
    }

    // Put back in the queue all events that did not belong to this window
    // They will be managed by other windowing systems
    // (or by a subsequent invocation to this function)
    while ((event = [alienEventQueue firstObject])) {
      [NSApp postEvent: event atStart: NO];
      [alienEventQueue removeObject: event];
    }
}

I have no clue about Objective-C, but I think 

NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.

https://www.dropbox.com/s/zusnfn5mhyw3rmp/memory-analysis.zip?dl=0

I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
   #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
   #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
   #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
   #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
   #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
   #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
   #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
   #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
   #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
   #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
   #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
   #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
   #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
   #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
   #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
   #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
   #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
   #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
   #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
   #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
   #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
   #21 0x119a75794  (<unknown module>)
   #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
   #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
   #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
   #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
   #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
   #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
   #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
   #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)









-- 

Guille Polito
Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - http://www.cnrs.fr

Web: http://guillep.github.io
Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Ben Coman
In reply to this post by Manuel Leuenberger
 
Great work Manuel. Thanks for your perseverance.

cheers -ben

On Thu, 28 Feb 2019 at 18:57, Manuel Leuenberger <[hidden email]> wrote:
 
Found it, reproduced it, fixed it.


I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).

Took me two months to write the necessary two lines of code.

Cheers,
Manuel

On 28 Feb 2019, at 02:52, John M McIntosh <[hidden email]> wrote:

Hi, so if you point your git tool to the squeak vm tree at commit 
fae3d7eb403158f177c306f668cb461c35c99419
The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 

if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...


As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 



....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:

Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair





--
   
Guille Polito
Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - http://www.cnrs.fr

Phone: +33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

johnmci
 
Ok, as @krono pointed out
NSMutableArray *alienEventQueue = AUTORELEASEOBJ([[NSMutableArray alloc] init]);
with the autoreleasepool.

I'm not sure anything is build without ARC anymore, but it should be there to be correct for people who need to build on older platforms.

AUTORELEASEOBJ with ARC does nothing, and the autoreleasepool wrapper then is freeing autorelease objects in the scope of this routine. Since it is pumped from the VM processing requests there isn't a wrapping autorelease pool higher up in the stack. Easy test is to do invoke the "About Menu"

-----------------------
Years back I would fire up Instruments and go looking, but the transition away from xCode made that harder, but as pointed out it is possible to do all this checking via the cmd line if someone (helpfully) puts in the time to make it happen. 

Thanks. 

....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, February 28, 2019 5:48 AM, Ben Coman <[hidden email]> wrote:

Great work Manuel. Thanks for your perseverance.

cheers -ben

On Thu, 28 Feb 2019 at 18:57, Manuel Leuenberger <[hidden email]> wrote:
 
Found it, reproduced it, fixed it.


I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).

Took me two months to write the necessary two lines of code.

Cheers,
Manuel

On 28 Feb 2019, at 02:52, John M McIntosh <[hidden email]> wrote:

Hi, so if you point your git tool to the squeak vm tree at commit 
fae3d7eb403158f177c306f668cb461c35c99419
The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 

if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...


As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 



....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <[hidden email]> wrote:

Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :) 

When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <[hidden email]> wrote:
 
Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 

- (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
       NSEvent *event;
       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
                             untilDate:nil
                             inMode:NSEventTrackingRunLoopMode
                             dequeue:YES])) {
         // If the event is not a system event or an event of *this* window, queue the event
         // Otherwise treat the event normally and send it to the app
         if (event.window && event.window != gDelegateApp.window){
           [alienEventQueue addObject: event];
         }else{
           [NSApp sendEvent: event];
           if (signal) {
               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
           }
         }
     }

     // Put back in the queue all events that did not belong to this window
     // They will be managed by other windowing systems
     // (or by a subsequent invocation to this function)
     while ((event = [alienEventQueue firstObject])) {
       [NSApp postEvent: event atStart: NO];
       [alienEventQueue removeObject: event];
     }
}

I have no clue about Objective-C, but I think 

 NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];

makes me own the queue on the heap, isn't there a

[alienEventQueue release];

missing at the end?

Cheers,
Manuel

On 26 Feb 2019, at 23:31, Manuel Leuenberger <[hidden email]> wrote:

Alright, back in the game, this time using gperftools.

I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:

NECController reset.
FreeTypeCache current removeAll.
ASTCache reset.
Smalltalk garbageCollect.

These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With

SpaceTally printSpaceAnalysis.

I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this

instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
ReferenceFinder findPathTo: instances atRandom.

With

FFIExternalResourceManager uniqueInstance instVarNamed: #registry

I also made sure that external objects are clean.

Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.

Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.


I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.

My current hypothesis is:

The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.

Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.

As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.

Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.

Cheers,
Manuel

On 30 Jan 2019, at 18:40, Manuel Leuenberger <[hidden email]> wrote:

I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

<Screen Shot 2019-01-30 at 18.27.22.png>
Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair





--
   
Guille Polito
Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - http://www.cnrs.fr

Phone: +33 06 52 70 66 13