NativeBoost VM crash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

NativeBoost VM crash

Holger Freyther
Hi,


I have a class that calls RAND_bytes in libcrypto and I am (mis-?)using NativeBoost
for that. The function appears to work in general and I can make 1000000 calls in a
row.

I am now invoking (RAND bytes: 4) asInteger from within and then get crashes if I
insert enough (10000) objects. I’m using a Pharo3 image with todays Pharo-vm-mac-latest
and it is easily reproducible.

I am not sure if that is before or after the C code has been called but the frame just
looks invalid (hence invalid). I had a breakpoint in the initial call to RAND_bytes and
looked at the “trampoline”, at the time it is crashing the frame is a different one and
the original is gone (disassembles to something different) .


Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
or being garbage collected? Is there an easy way I could proof the theory?


kind regards
        holger


Details

* thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
    frame #0: 0x21a70140
->  0x21a70140: negl   %esp
    0x21a70142: addl   -0xc(%ebp), %esp
    0x21a70145: movl   0xada30, %eax
    0x21a7014b: movl   (%eax), %eax

(lldb) register read
General Purpose Registers:
       eax = 0x21a70110
       ebx = 0x000000e8
       ecx = 0x00000002
       edx = 0x0000000f
       edi = 0x21a7010c
       esi = 0x21a70131
       ebp = 0xbffacb38
       esp = 0x00000008
        ss = 0x00000023
    eflags = 0x00000202
       eip = 0x21a70140
        cs = 0x0000001b
        ds = 0x00000023
        es = 0x00000023
        fs = 0x00000000
        gs = 0x0000000f

(lldb) bt
* thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
  * frame #0: 0x21a70140
    frame #1: 0x0009827b Pharo`primitiveNativeCall + 107
    frame #2: 0x1f400780


(lldb) di -f
->  0x21a70140: negl   %esp
    0x21a70142: addl   -0xc(%ebp), %esp
    0x21a70145: movl   0xada30, %eax
    0x21a7014b: movl   (%eax), %eax
    0x21a7014d: movl   %esp, -0x10(%ebp)
    0x21a70150: subl   $0x4, %esp
    0x21a70153: andl   $0xf, %esp
    0x21a70156: negl   %esp
    0x21a70158: addl   -0x10(%ebp), %esp
    0x21a7015b: pushl  %eax




Object subclass: #RAND
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'Hack'!
"-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!

RAND class
        instanceVariableNames: ''!

!RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:49'!
intRand: aByteArray size: aSize
        <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>

        ^self nbCall: #(int RAND_bytes(byte* aByteArray, int aSize)) module: 'crypto'.! !

!RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:48'!
rand: numberOfBytes
        | rand |
        rand := ByteArray new: numberOfBytes.
        ^(self intRand: rand size: rand size) = 1
                ifTrue: [ rand ]
                ifFalse: [ nil ]! !
Reply | Threaded
Open this post in threaded view
|

Re: NativeBoost VM crash

Max Leske

> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>
> Hi,
>
>
> I have a class that calls RAND_bytes in libcrypto and I am (mis-?)using NativeBoost
> for that. The function appears to work in general and I can make 1000000 calls in a
> row.
>
> I am now invoking (RAND bytes: 4) asInteger from within and then get crashes if I
> insert enough (10000) objects. I’m using a Pharo3 image with todays Pharo-vm-mac-latest
> and it is easily reproducible.
>
> I am not sure if that is before or after the C code has been called but the frame just
> looks invalid (hence invalid). I had a breakpoint in the initial call to RAND_bytes and
> looked at the “trampoline”, at the time it is crashing the frame is a different one and
> the original is gone (disassembles to something different) .
>
>
> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
> or being garbage collected? Is there an easy way I could proof the theory?
>
>
> kind regards
> holger

Have you tried using #optMayGC? That option will put the function into a safe memory region to prevent it from being moved during GC.

Cheers,
Max

>
>
> Details
>
> * thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
>    frame #0: 0x21a70140
> ->  0x21a70140: negl   %esp
>    0x21a70142: addl   -0xc(%ebp), %esp
>    0x21a70145: movl   0xada30, %eax
>    0x21a7014b: movl   (%eax), %eax
>
> (lldb) register read
> General Purpose Registers:
>       eax = 0x21a70110
>       ebx = 0x000000e8
>       ecx = 0x00000002
>       edx = 0x0000000f
>       edi = 0x21a7010c
>       esi = 0x21a70131
>       ebp = 0xbffacb38
>       esp = 0x00000008
>        ss = 0x00000023
>    eflags = 0x00000202
>       eip = 0x21a70140
>        cs = 0x0000001b
>        ds = 0x00000023
>        es = 0x00000023
>        fs = 0x00000000
>        gs = 0x0000000f
>
> (lldb) bt
> * thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
>  * frame #0: 0x21a70140
>    frame #1: 0x0009827b Pharo`primitiveNativeCall + 107
>    frame #2: 0x1f400780
>
>
> (lldb) di -f
> ->  0x21a70140: negl   %esp
>    0x21a70142: addl   -0xc(%ebp), %esp
>    0x21a70145: movl   0xada30, %eax
>    0x21a7014b: movl   (%eax), %eax
>    0x21a7014d: movl   %esp, -0x10(%ebp)
>    0x21a70150: subl   $0x4, %esp
>    0x21a70153: andl   $0xf, %esp
>    0x21a70156: negl   %esp
>    0x21a70158: addl   -0x10(%ebp), %esp
>    0x21a7015b: pushl  %eax
>
>
>
>
> Object subclass: #RAND
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Hack'!
> "-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!
>
> RAND class
> instanceVariableNames: ''!
>
> !RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:49'!
> intRand: aByteArray size: aSize
> <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
>
> ^self nbCall: #(int RAND_bytes(byte* aByteArray, int aSize)) module: 'crypto'.! !
>
> !RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:48'!
> rand: numberOfBytes
> | rand |
> rand := ByteArray new: numberOfBytes.
> ^(self intRand: rand size: rand size) = 1
> ifTrue: [ rand ]
> ifFalse: [ nil ]! !


Reply | Threaded
Open this post in threaded view
|

Re: NativeBoost VM crash

Holger Freyther
In reply to this post by Holger Freyther

> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>
>
> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
> or being garbage collected? Is there an easy way I could proof the theory?

I created a dummy replacement for the C code. It does not touch the “buf” memory
and I still get the SIGILL (though more iterations are needed).

I added a breakpoint in the code that generates the native code and the code is not
re-generated. So the memory for the CompiledMethod(?) just appears to change.

It looks like the stack is getting corrupted at some point in time. The code to call
my native code appears to remain valid and something else corrupts inside the VM.

any idea of where to move from here?

holger


dummy code with no side-effects:

#include <assert.h>

int myRAND_bytes(unsigned char *buf, int num)
{
        assert(buf);
        return 1;
}
Reply | Threaded
Open this post in threaded view
|

Re: NativeBoost VM crash

Max Leske

> On 23 Jul 2015, at 19:08, Holger Freyther <[hidden email]> wrote:
>
>
>> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>>
>>
>> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
>> or being garbage collected? Is there an easy way I could proof the theory?
>
> I created a dummy replacement for the C code. It does not touch the “buf” memory
> and I still get the SIGILL (though more iterations are needed).
>
> I added a breakpoint in the code that generates the native code and the code is not
> re-generated. So the memory for the CompiledMethod(?) just appears to change.
>
> It looks like the stack is getting corrupted at some point in time. The code to call
> my native code appears to remain valid and something else corrupts inside the VM.
>
> any idea of where to move from here?
>
> holger

Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.

>
>
> dummy code with no side-effects:
>
> #include <assert.h>
>
> int myRAND_bytes(unsigned char *buf, int num)
> {
> assert(buf);
> return 1;
> }


Reply | Threaded
Open this post in threaded view
|

Re: NativeBoost VM crash

Holger Freyther

> On 23 Jul 2015, at 19:15, Max Leske <[hidden email]> wrote:
>
>
>>
>
> Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.

yes, I tried it but it looks like optMayGC is relevant in case C will callback to Smalltalk.
I don’t have this case here. :(


Reply | Threaded
Open this post in threaded view
|

Re: NativeBoost VM crash

Max Leske

> On 23 Jul 2015, at 19:44, Holger Freyther <[hidden email]> wrote:
>
>
>> On 23 Jul 2015, at 19:15, Max Leske <[hidden email]> wrote:
>>
>>
>>>
>>
>> Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.
>
> yes, I tried it but it looks like optMayGC is relevant in case C will callback to Smalltalk.
> I don’t have this case here. :(

Well, if the function call takes a very long time… You never know. Just thought that may be the easiest solution.

>
>

Is the ByteArray you pass being allocated by NativeBoost? NativeBoost uses mmap and maybe there’s some missmanagement such that you get a buffer overflow. You could try allocating the memory with malloc yourself and see if that fixes the problem.

Cheers,
Max