Fwd: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hi all,

Thanks for all the hints, I finally got that thing flying!

Building with MacPorts clang-6.0 and editing flags finally worked when I found all the places (OFLAGS and LDFLAGS took a while to figure out). Building with Apple clang gave me "leak detection not supported", so that one is probably too old for what I want to do. Now I still have to trigger the leak I am after. I will provide a bug report in the leaking component and I will report the diff on flags I had to make, they might be useful for others. It was pretty dirty, e.g. I had to symlink some objects in the MacPorts AddressSanitizer installation as some binaries were linked with wrong paths, ew.

Again, thanks for encouraging me by telling me that I should be possible, I was already close to giving up. This Xcode env is a beast I try to avoid, too opaque to me.

Below is an example report, with the symbolizer it even tells you the source location, pretty handy. I guess this is not really a leak, but expected behavior for FFI.

Cheers,
Manuel


=================================================================
==12156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9080 byte(s) in 15 object(s) allocated from:
    #0 0x10a2cb2cc in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x592cc)
    #1 0x10a091410 in primitiveFFIAllocate (Pharo:x86_64+0x100315410)
    #2 0x1137369b7  (<unknown module>)
    #3 0x109d7e56c in interpret gcc3x-cointerp.c:2754
    #4 0x109f7dc29 in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #5 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #6 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #7 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #8 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #9 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #10 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #16 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #17 0x109f6e553 in main main.m:52
    #18 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)
...

Begin forwarded message:

From: John M McIntosh <[hidden email]>
Subject: Re: [Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer
Date: 26 January 2019 at 01:04:30 GMT+1
To: Manuel Leuenberger <[hidden email]>
Reply-To: John M McIntosh <[hidden email]>

There is a branch in the source tree where I have a Xcode build for deployment of the vm on iOS and OS X.  Change the vm source  group folder to compile the cog stack or vm.

Although a year or two old you might be able to git merge it into the  current OS X build. You can then build the vm with Xcode settings and run instruments etc.

As Eliot points out you can also go in and alter the current make options for the cod line build 

....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk




On Fri, Jan 25, 2019 at 06:04, Manuel Leuenberger <[hidden email]> wrote:
Hi Ben,

Thanks for the pointer, I subscribed to those two mailing lists as well.

Cheers,
Manuel

On 25 Jan 2019, at 14:25, Ben Coman <[hidden email]> wrote:

Hi Manuel,

This level question would be better handled on pharo-dev and vm-dev mail lists (forwarded a copy).  

On Fri, 25 Jan 2019 at 16:09, Manuel Leuenberger <[hidden email]> wrote:
Hi,

Anybody has experience building the VM (pharo.cog.spur) on macOS with custom compiler and flags?

I am observing a memory leak on macOS Sierra/Pharo 7/GToolkit. I cannot precisely locate the issue, e.g. I cannot reliably reproduce the problem. So I tried to use AddressSanitizer/LeakSanitizer with custom debug builds of the VM and libMoz2D (used by GToolkit). As building on Linux is easier than on macOS, I got myself some help from a colleague and we managed to build those two artifacts on Linux by sprinkling "-g3 -O0 -fsanitize=leak -fsanitize-memory-track-origins" into the CFLAGS wherever they appear or are picked up. This works fine on Debian 9 with "CC=clang-6.0 CXX=clang++-6.0 ./mvm -d" and some on-the-fly patches for other breaking things.

Unfortunately, I cannot reproduce the problem on Linux, so it seems to be platform specific. Even more so, I seem to be the only one observing the issue. Other GToolkit users do not report this issue. As these kind of issues are freaking me out, I cannot let go and need to learn what causes them. So, here I am, trying to build the VM on macOS Sierra and miserably failing to include the sanitizer tags with Clang and giving me those tasty leak reports on VM shutdown.

The leak is not observable in the image memory, i.e. the image size reported by the VM is not growing, I am suspecting FFI, possibly in Moz2D, to leak.
The FFI registry in the VM also does not report many or large surviving objects, so I would assume the problem to be either on the plugin or VM side.

My two concrete questions are:
1. Is it even feasible to detect leaks with this approach? E.g., how does the VM behave on shutdown? Will the leaks show up in the report or will they not be covered, as the VM actually frees everything before shutting down?
2. How do I swap compiler and inject flags on macOS builds? On Linux it was fairly easily done by replacing some variables. But on macOS, Xcode seems to like to mess with me. I got myself Clang-6.0 on macOS from MacPorts, the same as version as I took on Linux. I can build the VM with this, but it there is no leak report on VM shutdown. Anybody has an idea how to debug the build?

Bonus question: Can the VM provide an inspectable view of all the allocated memory to the image? That would be perfect. I think Luke Gorrie does some magic like this in his Studio for core dumps etc.

Cheers,
Manuel






Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hm, something is still not working. I compiled pharo.cog.spur and libMoz2D with clang-7.0 and '-fsanitize=address -fsanitize-recover=address -mllvm -asan-stack=0 -mllvm -asan-globals=0 -fno-omit-frame-pointer -g3' or '-fsanitize=leak -fno-omit-frame-pointer -g3', and then run it with 'ASAN_OPTIONS=detect_leaks=1 PharoDebug.app/Contents/MacOS/Pharo Pharo.image'. This gives me the leak reports on shutdown, but I cannot view any Bloc element, I always get a primitive failed error in 

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

I do not get stderr that would give me any hints what failed exactly. So, did the primitive fail during execution or was it not found at all? I looked into the TravisCI build and compiled with 'ARCH="macos64x64" FLAVOR="pharo.cog.spur" ./mvm -d', but that did not change anything. I noticed that the SqueakFFIPrims plugin defines primitiveCreateManualSurface on all architectures, but how do I ensure and check that the built VM actually includes this primitive? Am I missing another flag or env variable?

Cheers,
Manuel

On 26 Jan 2019, at 18:09, Manuel Leuenberger <[hidden email]> wrote:

Hi all,

Thanks for all the hints, I finally got that thing flying!

Building with MacPorts clang-6.0 and editing flags finally worked when I found all the places (OFLAGS and LDFLAGS took a while to figure out). Building with Apple clang gave me "leak detection not supported", so that one is probably too old for what I want to do. Now I still have to trigger the leak I am after. I will provide a bug report in the leaking component and I will report the diff on flags I had to make, they might be useful for others. It was pretty dirty, e.g. I had to symlink some objects in the MacPorts AddressSanitizer installation as some binaries were linked with wrong paths, ew.

Again, thanks for encouraging me by telling me that I should be possible, I was already close to giving up. This Xcode env is a beast I try to avoid, too opaque to me.

Below is an example report, with the symbolizer it even tells you the source location, pretty handy. I guess this is not really a leak, but expected behavior for FFI.

Cheers,
Manuel


=================================================================
==12156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9080 byte(s) in 15 object(s) allocated from:
    #0 0x10a2cb2cc in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x592cc)
    #1 0x10a091410 in primitiveFFIAllocate (Pharo:x86_64+0x100315410)
    #2 0x1137369b7  (<unknown module>)
    #3 0x109d7e56c in interpret gcc3x-cointerp.c:2754
    #4 0x109f7dc29 in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #5 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #6 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #7 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #8 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #9 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #10 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #16 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #17 0x109f6e553 in main main.m:52
    #18 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)
...

Begin forwarded message:

From: John M McIntosh <[hidden email]>
Subject: Re: [Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer
Date: 26 January 2019 at 01:04:30 GMT+1
To: Manuel Leuenberger <[hidden email]>
Reply-To: John M McIntosh <[hidden email]>

There is a branch in the source tree where I have a Xcode build for deployment of the vm on iOS and OS X.  Change the vm source  group folder to compile the cog stack or vm.

Although a year or two old you might be able to git merge it into the  current OS X build. You can then build the vm with Xcode settings and run instruments etc.

As Eliot points out you can also go in and alter the current make options for the cod line build 

....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk




On Fri, Jan 25, 2019 at 06:04, Manuel Leuenberger <[hidden email]> wrote:
Hi Ben,

Thanks for the pointer, I subscribed to those two mailing lists as well.

Cheers,
Manuel

On 25 Jan 2019, at 14:25, Ben Coman <[hidden email]> wrote:

Hi Manuel,

This level question would be better handled on pharo-dev and vm-dev mail lists (forwarded a copy).  

On Fri, 25 Jan 2019 at 16:09, Manuel Leuenberger <[hidden email]> wrote:
Hi,

Anybody has experience building the VM (pharo.cog.spur) on macOS with custom compiler and flags?

I am observing a memory leak on macOS Sierra/Pharo 7/GToolkit. I cannot precisely locate the issue, e.g. I cannot reliably reproduce the problem. So I tried to use AddressSanitizer/LeakSanitizer with custom debug builds of the VM and libMoz2D (used by GToolkit). As building on Linux is easier than on macOS, I got myself some help from a colleague and we managed to build those two artifacts on Linux by sprinkling "-g3 -O0 -fsanitize=leak -fsanitize-memory-track-origins" into the CFLAGS wherever they appear or are picked up. This works fine on Debian 9 with "CC=clang-6.0 CXX=clang++-6.0 ./mvm -d" and some on-the-fly patches for other breaking things.

Unfortunately, I cannot reproduce the problem on Linux, so it seems to be platform specific. Even more so, I seem to be the only one observing the issue. Other GToolkit users do not report this issue. As these kind of issues are freaking me out, I cannot let go and need to learn what causes them. So, here I am, trying to build the VM on macOS Sierra and miserably failing to include the sanitizer tags with Clang and giving me those tasty leak reports on VM shutdown.

The leak is not observable in the image memory, i.e. the image size reported by the VM is not growing, I am suspecting FFI, possibly in Moz2D, to leak.
The FFI registry in the VM also does not report many or large surviving objects, so I would assume the problem to be either on the plugin or VM side.

My two concrete questions are:
1. Is it even feasible to detect leaks with this approach? E.g., how does the VM behave on shutdown? Will the leaks show up in the report or will they not be covered, as the VM actually frees everything before shutting down?
2. How do I swap compiler and inject flags on macOS builds? On Linux it was fairly easily done by replacing some variables. But on macOS, Xcode seems to like to mess with me. I got myself Clang-6.0 on macOS from MacPorts, the same as version as I took on Linux. I can build the VM with this, but it there is no leak report on VM shutdown. Anybody has an idea how to debug the build?

Bonus question: Can the VM provide an inspectable view of all the allocated memory to the image? That would be perfect. I think Luke Gorrie does some magic like this in his Studio for core dumps etc.

Cheers,
Manuel







Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Alistair Grant
 
Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:
>
> BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
> <primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
> self primitiveFailed
>
> How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
    <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
    self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

> On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:
>
>
> Hi Manuel,
>
> On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
> <[hidden email]> wrote:
>>
>> BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
>> <primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
>> self primitiveFailed
>>
>> How can I debug this?
>
> My first suggestion would be to see if the primitive is returning a
> descriptive error.  The method would become (untested):
>
> BlExternalForm >> primCreateManualSurfaceWidth: width height: height
> rowPitch: rowPitch depth: depth isMSB: isMSB
>    <primitive: 'primitiveCreateManualSurface' module:
> 'SqueakFFIPrims' error: error>
>    self primitiveFailed
>
> When the primitive fails, open the debugger, inspect this method in
> the call stack and check the value of error.  If the primitive hasn't
> been found, error will be something like #'not found' (from memory).
> If it's something else other than nil, report back with the value.
>
> If error is nil it will be a matter of diving in to the Slang / C code
> and figuring it out :-).  If you've successfully built a debug version
> of the VM, maybe the easiest next step is to set a breakpoint in
> primitiveCreateManualSurface and step through?
>
> Cheers,
> Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

On 29 Jan 2019, at 00:18, Manuel Leuenberger <[hidden email]> wrote:


Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:


Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
  <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
  self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Eliot Miranda-2
 
Hi Manuel,

On Jan 29, 2019, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:

As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Wonderful!!!! Welcome! But see below.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.  What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.  

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?


On 29 Jan 2019, at 00:18, Manuel Leuenberger <[hidden email]> wrote:


Hi Alistair,

Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.

Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.

Cheers,
Manuel

On 28 Jan 2019, at 19:29, Alistair Grant <[hidden email]> wrote:


Hi Manuel,

On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
<[hidden email]> wrote:

BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
<primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
self primitiveFailed

How can I debug this?

My first suggestion would be to see if the primitive is returning a
descriptive error.  The method would become (untested):

BlExternalForm >> primCreateManualSurfaceWidth: width height: height
rowPitch: rowPitch depth: depth isMSB: isMSB
  <primitive: 'primitiveCreateManualSurface' module:
'SqueakFFIPrims' error: error>
  self primitiveFailed

When the primitive fails, open the debugger, inspect this method in
the call stack and check the value of error.  If the primitive hasn't
been found, error will be something like #'not found' (from memory).
If it's something else other than nil, report back with the value.

If error is nil it will be a matter of diving in to the Slang / C code
and figuring it out :-).  If you've successfully built a debug version
of the VM, maybe the easiest next step is to set a breakpoint in
primitiveCreateManualSurface and step through?

Cheers,
Alistair

_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Ben Coman
 


On Tue, 29 Jan 2019 at 22:32, Eliot Miranda <[hidden email]> wrote:
 
Hi Manuel,

On Jan 29, 2019, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:

As it turns out, I am just a special kind of an idiot.

After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.

Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.

Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.

Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Wonderful!!!! Welcome! But see below.

Cheers,
Manuel

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
    #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
    #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
    #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
    #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
    #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
    #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
    #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
    #8 0x1151a8134  (<unknown module>)
    #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
    #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
    #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
    #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
    #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.  What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.  

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

 Just brainstorming, a markdown page here...

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

timrowledge
In reply to this post by Manuel Leuenberger
 


> On 2019-01-29, at 3:16 AM, Manuel Leuenberger <[hidden email]> wrote:
>
>
> Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

It's just luring you into a very false sense of security. You remember 'Alien'? That was based on the life of a VM developer..

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Political science


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

timrowledge
In reply to this post by Eliot Miranda-2
 


> On 2019-01-29, at 6:32 AM, Eliot Miranda <[hidden email]> wrote:
>
>
> What do you think?  What to others think?  How do we best capture this for posterity?

Mail list posts are one form of immortality but can easily get lost in the growing cruft as time passes. It's also difficult to establish the currency of the info a year (or ten!) later. Blog posts and forum posts have the same problems.


Pages/books on github or similar have the virtue of being hosted in a reasonably secure manner and carry date /version metadata with them, so that could be helpful. Linking to the doc in the list, in blogs or tweets spreads the signal. Linking to the (hopefully) large and growing tree of well written and well cared for doc from the sources would be very useful.

More important than any of the details is simply getting the doc produced and reviewed. Collectively we (software dweebs) are *useless* at doc, even though it constantly bites us.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: PFM: Pray For Miracle


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Alistair Grant
In reply to this post by Eliot Miranda-2
 
On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:

>
>
> ...
>
> So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
> What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.
>
> The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.
>
> What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger
 
I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.

I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.

Cheers,
Manuel

Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
    #21 0x119a75794  (<unknown module>)
    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)



On 29 Jan 2019, at 20:19, Alistair Grant <[hidden email]> wrote:


On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <[hidden email]> wrote:


...

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.
What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

I would split this in two:

- The basic instructions on compiling with the leak checker should go
in the HowToBuild.
- The description you're suggesting above should go in
http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
OpenSmalltalk-VM repository.

Each should provide a link to the other.

I'm not a big fan of github wiki pages, but they have two advantages:

- They keep the content with the code.
- They're relatively easy to download and back-up offline (in the
event that we decide to move the code elsewhere).

My preference would be http://opensmalltalk.org/, but I'm not sure
what the access rights on it are.  Actually, I didn't even know it
existed, thanks Ben!

As Tim said, email lists are hard to search, and can't be easily
updated.  Blog posts don't really lend themselves to being updated
either.

My 2c...

Cheers,
Alistair