Smalltalk › Pharo › Pharo Smalltalk Developers

NativeBoost VM crash

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Holger Freyther

NativeBoost VM crash

Hi,

I have a class that calls RAND_bytes in libcrypto and I am (mis-?)using NativeBoost
for that. The function appears to work in general and I can make 1000000 calls in a
row.

I am now invoking (RAND bytes: 4) asInteger from within and then get crashes if I
insert enough (10000) objects. I’m using a Pharo3 image with todays Pharo-vm-mac-latest
and it is easily reproducible.

I am not sure if that is before or after the C code has been called but the frame just
looks invalid (hence invalid). I had a breakpoint in the initial call to RAND_bytes and
looked at the “trampoline”, at the time it is crashing the frame is a different one and
the original is gone (disassembles to something different) .

Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
or being garbage collected? Is there an easy way I could proof the theory?

kind regards
holger

Details

* thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
frame #0: 0x21a70140
-> 0x21a70140: negl %esp
0x21a70142: addl -0xc(%ebp), %esp
0x21a70145: movl 0xada30, %eax
0x21a7014b: movl (%eax), %eax

(lldb) register read
General Purpose Registers:
eax = 0x21a70110
ebx = 0x000000e8
ecx = 0x00000002
edx = 0x0000000f
edi = 0x21a7010c
esi = 0x21a70131
ebp = 0xbffacb38
esp = 0x00000008
ss = 0x00000023
eflags = 0x00000202
eip = 0x21a70140
cs = 0x0000001b
ds = 0x00000023
es = 0x00000023
fs = 0x00000000
gs = 0x0000000f

(lldb) bt
* thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
* frame #0: 0x21a70140
frame #1: 0x0009827b Pharo`primitiveNativeCall + 107
frame #2: 0x1f400780

(lldb) di -f
-> 0x21a70140: negl %esp
0x21a70142: addl -0xc(%ebp), %esp
0x21a70145: movl 0xada30, %eax
0x21a7014b: movl (%eax), %eax
0x21a7014d: movl %esp, -0x10(%ebp)
0x21a70150: subl $0x4, %esp
0x21a70153: andl $0xf, %esp
0x21a70156: negl %esp
0x21a70158: addl -0x10(%ebp), %esp
0x21a7015b: pushl %eax

Object subclass: #RAND
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Hack'!
"-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!

RAND class
instanceVariableNames: ''!

!RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:49'!
intRand: aByteArray size: aSize
<primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>

^self nbCall: #(int RAND_bytes(byte* aByteArray, int aSize)) module: 'crypto'.! !

!RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:48'!
rand: numberOfBytes
| rand |
rand := ByteArray new: numberOfBytes.
^(self intRand: rand size: rand size) = 1
ifTrue: [ rand ]
ifFalse: [ nil ]! !

Max Leske

Re: NativeBoost VM crash

> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>
> Hi,
>
>
> I have a class that calls RAND_bytes in libcrypto and I am (mis-?)using NativeBoost
> for that. The function appears to work in general and I can make 1000000 calls in a
> row.
>
> I am now invoking (RAND bytes: 4) asInteger from within and then get crashes if I
> insert enough (10000) objects. I’m using a Pharo3 image with todays Pharo-vm-mac-latest
> and it is easily reproducible.
>
> I am not sure if that is before or after the C code has been called but the frame just
> looks invalid (hence invalid). I had a breakpoint in the initial call to RAND_bytes and
> looked at the “trampoline”, at the time it is crashing the frame is a different one and
> the original is gone (disassembles to something different) .
>
>
> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
> or being garbage collected? Is there an easy way I could proof the theory?
>
>
> kind regards
> holger

Have you tried using #optMayGC? That option will put the function into a safe memory region to prevent it from being moved during GC.

Cheers,
Max

>
>
> Details
>
> * thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
> frame #0: 0x21a70140
> -> 0x21a70140: negl %esp
> 0x21a70142: addl -0xc(%ebp), %esp
> 0x21a70145: movl 0xada30, %eax
> 0x21a7014b: movl (%eax), %eax
>
> (lldb) register read
> General Purpose Registers:
> eax = 0x21a70110
> ebx = 0x000000e8
> ecx = 0x00000002
> edx = 0x0000000f
> edi = 0x21a7010c
> esi = 0x21a70131
> ebp = 0xbffacb38
> esp = 0x00000008
> ss = 0x00000023
> eflags = 0x00000202
> eip = 0x21a70140
> cs = 0x0000001b
> ds = 0x00000023
> es = 0x00000023
> fs = 0x00000000
> gs = 0x0000000f
>
> (lldb) bt
> * thread #1: tid = 0x120dae, 0x21a70140, queue = 'com.apple.main-thread', stop reason = signal SIGILL
> * frame #0: 0x21a70140
> frame #1: 0x0009827b Pharo`primitiveNativeCall + 107
> frame #2: 0x1f400780
>
>
> (lldb) di -f
> -> 0x21a70140: negl %esp
> 0x21a70142: addl -0xc(%ebp), %esp
> 0x21a70145: movl 0xada30, %eax
> 0x21a7014b: movl (%eax), %eax
> 0x21a7014d: movl %esp, -0x10(%ebp)
> 0x21a70150: subl $0x4, %esp
> 0x21a70153: andl $0xf, %esp
> 0x21a70156: negl %esp
> 0x21a70158: addl -0x10(%ebp), %esp
> 0x21a7015b: pushl %eax
>
>
>
>
> Object subclass: #RAND
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Hack'!
> "-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!
>
> RAND class
> instanceVariableNames: ''!
>
> !RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:49'!
> intRand: aByteArray size: aSize
> <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
>
> ^self nbCall: #(int RAND_bytes(byte* aByteArray, int aSize)) module: 'crypto'.! !
>
> !RAND class methodsFor: 'as yet unclassified' stamp: 'HolgerHansPeterFreyther 7/23/2015 11:48'!
> rand: numberOfBytes
> | rand |
> rand := ByteArray new: numberOfBytes.
> ^(self intRand: rand size: rand size) = 1
> ifTrue: [ rand ]
> ifFalse: [ nil ]! !

Holger Freyther

Re: NativeBoost VM crash

In reply to this post by Holger Freyther

> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>
>
> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
> or being garbage collected? Is there an easy way I could proof the theory?

I created a dummy replacement for the C code. It does not touch the “buf” memory
and I still get the SIGILL (though more iterations are needed).

I added a breakpoint in the code that generates the native code and the code is not
re-generated. So the memory for the CompiledMethod(?) just appears to change.

It looks like the stack is getting corrupted at some point in time. The code to call
my native code appears to remain valid and something else corrupts inside the VM.

any idea of where to move from here?

holger

dummy code with no side-effects:

#include <assert.h>

int myRAND_bytes(unsigned char *buf, int num)
{
assert(buf);
return 1;
}

Max Leske

Re: NativeBoost VM crash

> On 23 Jul 2015, at 19:08, Holger Freyther <[hidden email]> wrote:
>
>
>> On 23 Jul 2015, at 14:42, Holger Freyther <[hidden email]> wrote:
>>
>>
>> Am I using NativeBoost correctly here? Do you have any hints? Is the code moving
>> or being garbage collected? Is there an easy way I could proof the theory?
>
> I created a dummy replacement for the C code. It does not touch the “buf” memory
> and I still get the SIGILL (though more iterations are needed).
>
> I added a breakpoint in the code that generates the native code and the code is not
> re-generated. So the memory for the CompiledMethod(?) just appears to change.
>
> It looks like the stack is getting corrupted at some point in time. The code to call
> my native code appears to remain valid and something else corrupts inside the VM.
>
> any idea of where to move from here?
>
> holger

Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.

>
>
> dummy code with no side-effects:
>
> #include <assert.h>
>
> int myRAND_bytes(unsigned char *buf, int num)
> {
> assert(buf);
> return 1;
> }

Holger Freyther

Re: NativeBoost VM crash

> On 23 Jul 2015, at 19:15, Max Leske <[hidden email]> wrote:
>
>
>>
>
> Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.

yes, I tried it but it looks like optMayGC is relevant in case C will callback to Smalltalk.
I don’t have this case here. :(

Max Leske

Re: NativeBoost VM crash

> On 23 Jul 2015, at 19:44, Holger Freyther <[hidden email]> wrote:
>
>
>> On 23 Jul 2015, at 19:15, Max Leske <[hidden email]> wrote:
>>
>>
>>>
>>
>> Did you try the #optMayGC option I suggested? It really looks like the method code may be moved by the GC.
>
> yes, I tried it but it looks like optMayGC is relevant in case C will callback to Smalltalk.
> I don’t have this case here. :(

Well, if the function call takes a very long time… You never know. Just thought that may be the easiest solution.

>
>

Is the ByteArray you pass being allocated by NativeBoost? NativeBoost uses mmap and maybe there’s some missmanagement such that you get a buffer overflow. You could try allocating the memory with malloc yourself and see if that fixes the problem.

Cheers,
Max