JIT crashes due non-executable memory

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

JIT crashes due non-executable memory

Holger Freyther
Dear Paolo,

so I wanted to see when the JIT of GST broke but given the changes in
autoconf/libtool/etc. it is quite difficult to compile stable-2.x on
a halfway modern system.

First of all I compiled gst with --enable-jit and then disabled the
generational gc with --disable-generation-gc. This means a SIGSEGV
will always lead to a crash.

Then I started up gdb on .libs/lt-gst (to not use libtool --mode=...),
used "handle SIGSEGV stop" to be able to inspect the process after the
segfault.

Right now I noticed that it is already crashing in the 'trampoline' (
gst_run_native_code) in the first instection and after inspecting the
/proc/PID/maps it is a non-executable segment of the memory.


gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x080755d0 in ?? ()
(gdb) bt
#0  0x080755d0 in ?? ()
(gdb) disassemble 0x080755d0,+1
Dump of assembler code from 0x80755d0 to 0x80755d1:
=> 0x080755d0: push   %ebp

$ cat /proc/PID/maps
08075000-08092000 rw-p 00000000 00:00 0          [heap]


So long story short? What kind of allocator would you like to use for
the JITed code and does a newer version of lightning already provide
one?

cheers
        holger

PS: I think the first thing I will do is to implement the GDB jit stubs
to help in debugging the jitted code.

PPS: Do you know if the 'address'.. always true warnings are resolved
in lightning?



_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Tue, Jan 22, 2013 at 06:26:51PM +0100, Holger Hans Peter Freyther wrote:

Hi,


> So long story short? What kind of allocator would you like to use for
> the JITed code and does a newer version of lightning already provide
> one?

today I had another look at it (mostly motivated by understanding why
the lightning tests/examples do work) and I found that on x86 the call
to jit_flush_code will use mprotect on the page.

Something like this makes me move to the next error:

diff --git a/libgst/xlat.c b/libgst/xlat.c
index e555cca..1fd0325 100644
--- a/libgst/xlat.c
+++ b/libgst/xlat.c
@@ -620,6 +620,8 @@ generate_run_time_code (void)
 
   jit_movi_i (JIT_RET, 0);
   jit_ret ();
+
+  jit_flush_code(_gst_run_native_code, jit_get_ip().ptr);
 }


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 02, 2013 at 02:01:32PM +0200, Holger Hans Peter Freyther wrote:
> On Tue, Jan 22, 2013 at 06:26:51PM +0100, Holger Hans Peter Freyther wrote:

> +
> +  jit_flush_code(_gst_run_native_code, jit_get_ip().ptr);

I have found another issue with the bootstrap and now have some basic
JIT working and started to look into the test failures:

One of them is this:
  (Delay forMilliseconds: 100) value: [ [true] whileTrue ] onTimeoutDo: []


Object: BlockContext new: 8 "<-0x4cc86ae0>" error: Invalid index -1: index out of range
SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254)
SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660)
BlockContext(Object)>>checkIndexableBounds: (Object.st:796)
BlockContext(Object)>>at: (Object.st:858)
BlockContext(ContextPart)>>at: (ContextPart.st:294)
[] in BlockClosure>>asContext: (BlkClosure.st:180)
BlockContext class>>fromClosure:parent: (BlkContext.st:68)
optimized [] in UndefinedObject>>executeStatements (a String:1)
BlockClosure>>ensure: (BlkClosure.st:270)
[] in Delay>>value:onTimeoutDo: (Delay.st:315)
BlockClosure>>on:do: (BlkClosure.st:195)
Delay>>value:onTimeoutDo: (Delay.st:316)
UndefinedObject>>executeStatements (a String:1)


this appears to come from the fact that:

An instance of BlockContext
  parent: BlockClosure>>ensure: (BlkClosure.st:270)
  nativeIP: 74241900
  ip: 0
  sp: -1
  receiver: UndefinedObject
  method: [] in UndefinedObject>>executeStatements
  outerContext: nil


while the sp for the BC is 0.


does this ring a bell?

        holger

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 02, 2013 at 05:04:38PM +0200, Holger Hans Peter Freyther wrote:

> An instance of BlockContext
>   parent: BlockClosure>>ensure: (BlkClosure.st:270)
>   nativeIP: 74241900
>   ip: 0
>   sp: -1

I am trying to figure out where the -1 is coming from and when
the context is changed but searching for sp and and -1 is not
really helping. :)

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 02, 2013 at 06:13:28PM +0200, Holger Hans Peter Freyther wrote:

> I am trying to figure out where the -1 is coming from and when
> the context is changed but searching for sp and and -1 is not
> really helping. :)

The BlockClosure>>#asContext: was changed in 2007 in git revision
51f4dffef9df9095e59801df57741bd1a9458fd3.


diff --git a/kernel/BlkClosure.st b/kernel/BlkClosure.st
index ec17d2b..cd07652 100644
--- a/kernel/BlkClosure.st
+++ b/kernel/BlkClosure.st
@@ -167,13 +167,15 @@ creation of Processes from blocks.'>
         Note that the block has no home, so it cannot contain returns."
 
        <category: 'private'>
+       "parent ifNotNil: [parent inspect. parent method inspect]."
+
        ^BlockContext
            fromClosure: [
                | top |
                top := parent isNil
                    ifTrue: [nil]
                    ifFalse: [
-                       parent sp == 0
+                       parent sp <= 0
                            ifTrue: [parent receiver]
                            ifFalse: [parent at: parent sp]].
                self value. top]



this works around the problem but I don't understand enough of it.
When will the sp != 0 for the Interpreter? Where does the assumption
if parent sp != 0 => parent is at this position. Or why shouldn't
this code be inside the context class? if the receiver is burried in
the stack.. then the class should be able to find it self?

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 02, 2013 at 06:46:42PM +0200, Holger Hans Peter Freyther wrote:
> > I am trying to figure out where the -1 is coming from and when
> > the context is changed but searching for sp and and -1 is not
> > really helping. :)
>
> The BlockClosure>>#asContext: was changed in 2007 in git revision
> 51f4dffef9df9095e59801df57741bd1a9458fd3.

Hi,

another weekend, another attempt at the JIT. I am debugging a crash
with a Magritte test:

    testCalculated [
        <category: 'testing'>
        | object dummy |
        object := MADynamicObject on: [Time millisecondClockValue].
        dummy := object yourself.
        (Delay forMilliseconds: 2) wait.
        self assert: dummy < object yourself
    ]

it is crashing inside the Delay process.. and after a lot of stepi
inside the GDB tui I am at the point where unwind_context is restoring
a wrong native_ip and it is jumping somewhere else. This means that
at some point the ic->native_ip is wrong (or I don't understand how
the ipOffset is set inside the context...). (or the IC is read from
somewhere wrong/after a GC...)

any ideas or feedback on the two patches?

cheers
        holger


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sat, Jun 08, 2013 at 07:32:30PM +0200, Holger Hans Peter Freyther wrote:

> it is crashing inside the Delay process.. and after a lot of stepi
> inside the GDB tui I am at the point where unwind_context is restoring
> a wrong native_ip and it is jumping somewhere else. This means that
> at some point the ic->native_ip is wrong (or I don't understand how
> the ipOffset is set inside the context...). (or the IC is read from
> somewhere wrong/after a GC...)

Hi,

what appears to happen is that the translated method is freed but it
is still inside the method context (i removed the xfree for the
method_entry and things started to change, sometimes even working)

I think the following could happen:

1.) The first time Delay class>>#runDelayProcess will set the
  oop->flags F_XLAT_REACHABLE (or not??)
2.) ??? (something to the oop->flags or a replacement method is installed)
3.) The method will be discarded... (two GC runs or such)
4.) The code returns to a methodOop that has not been jitted yet. If
    I can trust my printf debuggung I return to a runDelayProcess that has
    not even be jitted..


So I think that the oop will be swept when it should not? Any idea on
how to continue to debug this? Tracing all flag assignments with hw
watchpoints is a bit... difficult.


any

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sat, Jun 08, 2013 at 10:02:16PM +0200, Holger Hans Peter Freyther wrote:

> Hi,

Dear Paolo

> So I think that the oop will be swept when it should not? Any idea on
> how to continue to debug this? Tracing all flag assignments with hw
> watchpoints is a bit... difficult.

I added various asserts and the below is hit the first. An OOP is swept
while it is still marked as F_XLAT_REACHABLE. Now this appears to be
complicated. The native code can only set the F_XLAT_REACHABLE but it is
never cleared (two processes can enter the same method so there is no
point in setting it back).

How did you intend the garbage collection to work here?

a.) Never delete native code?
b.) Start to walk the context list(s)? As part of the native code?
c.) Allow it to be collected and re-generate
d.) Mark the OOP when it is put into the context?

But then again, why is the CompiledMethod GCed? It should be reachable
from the Method Dictionary of the method?



diff --git a/libgst/oop.c b/libgst/oop.c
index 6b79935..2cb3cd7 100644
--- a/libgst/oop.c
+++ b/libgst/oop.c
@@ -1435,11 +1435,14 @@ _gst_sweep_oop (OOP oop)
 
 #ifdef ENABLE_JIT_TRANSLATION
   if (oop->flags & F_XLAT)
+    {
     /* Unreachable, always free the native code.  It is *not* optional
        to free the code in this case -- and I'm not talking about memory
        leaks: a different method could use the same OOP as this one and
        the old method would be executed instead of the new one! */
-    _gst_release_native_code (oop);
+      assert ((oop->flags & F_XLAT_REACHABLE) == 0);
+      _gst_release_native_code (oop);
+    }
 #endif


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 09, 2013 at 08:25:34AM +0200, Holger Hans Peter Freyther wrote:

> d.) Mark the OOP when it is put into the context?
>
> But then again, why is the CompiledMethod GCed? It should be reachable
> from the Method Dictionary of the method?

    if UNCOMMON (oop->flags & F_CONTEXT)
      {
        gst_method_context ctx;
        intptr_t methodSP;
        ctx = (gst_method_context) object;
        methodSP = TO_INT (ctx->spOffset);
        /* printf("setting up for loop on context %x, sp = %d\n",
           ctx, methodSP); */
        TAIL_MARK_OOPRANGE (&ctx->objClass,
                            ctx->contextStack + methodSP + 1);

      }

The code is already "walking" the context (if it is present). Now the
code is doing a tail recursion and just doing ctx->method->flags |= for
the F_REACHABLE attribute made more harm than it fixed but I think this
is what should happen (or we put the method OOP onto the stack as well)?


what do you think?

        holger

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 09, 2013 at 08:45:56AM +0200, Holger Hans Peter Freyther wrote:

>         TAIL_MARK_OOPRANGE (&ctx->objClass,
>                             ctx->contextStack + methodSP + 1);

this should already walk over the ctx->method and mark it. I need to
look deeper into the marking and see which xlated method is swept

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 09, 2013 at 02:58:33PM +0200, Holger Hans Peter Freyther wrote:

> this should already walk over the ctx->method and mark it. I need to
> look deeper into the marking and see which xlated method is swept

And I am back to memory corruption as a cause of this. In theory valgrind
could work for GST but I don't know where to start. :}

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: JIT crashes due non-executable memory

Holger Freyther
On Sun, Jun 09, 2013 at 07:34:05PM +0200, Holger Hans Peter Freyther wrote:
> On Sun, Jun 09, 2013 at 02:58:33PM +0200, Holger Hans Peter Freyther wrote:
>
> > this should already walk over the ctx->method and mark it. I need to
> > look deeper into the marking and see which xlated method is swept
>
> And I am back to memory corruption as a cause of this. In theory valgrind
> could work for GST but I don't know where to start. :}

Hardware watchpoint 1: -location $1->flags

Old value = 5243174
New value = 1048870
_gst_release_native_code (methodOOP=methodOOP@entry=0x4093d4a0) at xlat.c:3889
3889  if (methodOOP->flags & F_XLAT_DISCARDED)
(gdb) bt
#0  _gst_release_native_code (methodOOP=methodOOP@entry=0x4093d4a0) at xlat.c:3889
#1  0xb7f2c8a0 in maybe_release_xlat (oop=0x4093d4a0) at oop.inl:165
#2  alloc_oop (flags=262144, objData=0xb760e4ac) at oop.inl:188
#3  _gst_alloc_obj (size=20, p_oop=p_oop@entry=0xbfffea74) at oop.c:787
#4  0xb7f73788 in new_instance (p_oop=0xbfffea74, class_oop=0x408f44a8) at dict.inl:710
#5  _gst_make_block_closure (blockOOP=0x40910a88) at interp.c:1303
#6  0x081e9dd5 in ?? ()

so the idea is that if F_XLAT_REACHABLE is set the entire "maybe_release_xlat"
will trigger... I was disabling the call from alloc_oop. Do you remember why
the call is inside the alloc_oop at all (and the others)? E.g. why is it too
late in the "oop swept" routine? The jitted code is 'attached' to the method
OOP anyway?

holger



_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk