Dear Paolo,
so I wanted to see when the JIT of GST broke but given the changes in autoconf/libtool/etc. it is quite difficult to compile stable-2.x on a halfway modern system. First of all I compiled gst with --enable-jit and then disabled the generational gc with --disable-generation-gc. This means a SIGSEGV will always lead to a crash. Then I started up gdb on .libs/lt-gst (to not use libtool --mode=...), used "handle SIGSEGV stop" to be able to inspect the process after the segfault. Right now I noticed that it is already crashing in the 'trampoline' ( gst_run_native_code) in the first instection and after inspecting the /proc/PID/maps it is a non-executable segment of the memory. gdb output: Program received signal SIGSEGV, Segmentation fault. 0x080755d0 in ?? () (gdb) bt #0 0x080755d0 in ?? () (gdb) disassemble 0x080755d0,+1 Dump of assembler code from 0x80755d0 to 0x80755d1: => 0x080755d0: push %ebp $ cat /proc/PID/maps 08075000-08092000 rw-p 00000000 00:00 0 [heap] So long story short? What kind of allocator would you like to use for the JITed code and does a newer version of lightning already provide one? cheers holger PS: I think the first thing I will do is to implement the GDB jit stubs to help in debugging the jitted code. PPS: Do you know if the 'address'.. always true warnings are resolved in lightning? _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Tue, Jan 22, 2013 at 06:26:51PM +0100, Holger Hans Peter Freyther wrote:
Hi, > So long story short? What kind of allocator would you like to use for > the JITed code and does a newer version of lightning already provide > one? today I had another look at it (mostly motivated by understanding why the lightning tests/examples do work) and I found that on x86 the call to jit_flush_code will use mprotect on the page. Something like this makes me move to the next error: diff --git a/libgst/xlat.c b/libgst/xlat.c index e555cca..1fd0325 100644 --- a/libgst/xlat.c +++ b/libgst/xlat.c @@ -620,6 +620,8 @@ generate_run_time_code (void) jit_movi_i (JIT_RET, 0); jit_ret (); + + jit_flush_code(_gst_run_native_code, jit_get_ip().ptr); } _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 02, 2013 at 02:01:32PM +0200, Holger Hans Peter Freyther wrote:
> On Tue, Jan 22, 2013 at 06:26:51PM +0100, Holger Hans Peter Freyther wrote: > + > + jit_flush_code(_gst_run_native_code, jit_get_ip().ptr); I have found another issue with the bootstrap and now have some basic JIT working and started to look into the test failures: One of them is this: (Delay forMilliseconds: 100) value: [ [true] whileTrue ] onTimeoutDo: [] Object: BlockContext new: 8 "<-0x4cc86ae0>" error: Invalid index -1: index out of range SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254) SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660) BlockContext(Object)>>checkIndexableBounds: (Object.st:796) BlockContext(Object)>>at: (Object.st:858) BlockContext(ContextPart)>>at: (ContextPart.st:294) [] in BlockClosure>>asContext: (BlkClosure.st:180) BlockContext class>>fromClosure:parent: (BlkContext.st:68) optimized [] in UndefinedObject>>executeStatements (a String:1) BlockClosure>>ensure: (BlkClosure.st:270) [] in Delay>>value:onTimeoutDo: (Delay.st:315) BlockClosure>>on:do: (BlkClosure.st:195) Delay>>value:onTimeoutDo: (Delay.st:316) UndefinedObject>>executeStatements (a String:1) this appears to come from the fact that: An instance of BlockContext parent: BlockClosure>>ensure: (BlkClosure.st:270) nativeIP: 74241900 ip: 0 sp: -1 receiver: UndefinedObject method: [] in UndefinedObject>>executeStatements outerContext: nil while the sp for the BC is 0. does this ring a bell? holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 02, 2013 at 05:04:38PM +0200, Holger Hans Peter Freyther wrote:
> An instance of BlockContext > parent: BlockClosure>>ensure: (BlkClosure.st:270) > nativeIP: 74241900 > ip: 0 > sp: -1 I am trying to figure out where the -1 is coming from and when the context is changed but searching for sp and and -1 is not really helping. :) _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 02, 2013 at 06:13:28PM +0200, Holger Hans Peter Freyther wrote:
> I am trying to figure out where the -1 is coming from and when > the context is changed but searching for sp and and -1 is not > really helping. :) The BlockClosure>>#asContext: was changed in 2007 in git revision 51f4dffef9df9095e59801df57741bd1a9458fd3. diff --git a/kernel/BlkClosure.st b/kernel/BlkClosure.st index ec17d2b..cd07652 100644 --- a/kernel/BlkClosure.st +++ b/kernel/BlkClosure.st @@ -167,13 +167,15 @@ creation of Processes from blocks.'> Note that the block has no home, so it cannot contain returns." <category: 'private'> + "parent ifNotNil: [parent inspect. parent method inspect]." + ^BlockContext fromClosure: [ | top | top := parent isNil ifTrue: [nil] ifFalse: [ - parent sp == 0 + parent sp <= 0 ifTrue: [parent receiver] ifFalse: [parent at: parent sp]]. self value. top] this works around the problem but I don't understand enough of it. When will the sp != 0 for the Interpreter? Where does the assumption if parent sp != 0 => parent is at this position. Or why shouldn't this code be inside the context class? if the receiver is burried in the stack.. then the class should be able to find it self? _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 02, 2013 at 06:46:42PM +0200, Holger Hans Peter Freyther wrote:
> > I am trying to figure out where the -1 is coming from and when > > the context is changed but searching for sp and and -1 is not > > really helping. :) > > The BlockClosure>>#asContext: was changed in 2007 in git revision > 51f4dffef9df9095e59801df57741bd1a9458fd3. Hi, another weekend, another attempt at the JIT. I am debugging a crash with a Magritte test: testCalculated [ <category: 'testing'> | object dummy | object := MADynamicObject on: [Time millisecondClockValue]. dummy := object yourself. (Delay forMilliseconds: 2) wait. self assert: dummy < object yourself ] it is crashing inside the Delay process.. and after a lot of stepi inside the GDB tui I am at the point where unwind_context is restoring a wrong native_ip and it is jumping somewhere else. This means that at some point the ic->native_ip is wrong (or I don't understand how the ipOffset is set inside the context...). (or the IC is read from somewhere wrong/after a GC...) any ideas or feedback on the two patches? cheers holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sat, Jun 08, 2013 at 07:32:30PM +0200, Holger Hans Peter Freyther wrote:
> it is crashing inside the Delay process.. and after a lot of stepi > inside the GDB tui I am at the point where unwind_context is restoring > a wrong native_ip and it is jumping somewhere else. This means that > at some point the ic->native_ip is wrong (or I don't understand how > the ipOffset is set inside the context...). (or the IC is read from > somewhere wrong/after a GC...) Hi, what appears to happen is that the translated method is freed but it is still inside the method context (i removed the xfree for the method_entry and things started to change, sometimes even working) I think the following could happen: 1.) The first time Delay class>>#runDelayProcess will set the oop->flags F_XLAT_REACHABLE (or not??) 2.) ??? (something to the oop->flags or a replacement method is installed) 3.) The method will be discarded... (two GC runs or such) 4.) The code returns to a methodOop that has not been jitted yet. If I can trust my printf debuggung I return to a runDelayProcess that has not even be jitted.. So I think that the oop will be swept when it should not? Any idea on how to continue to debug this? Tracing all flag assignments with hw watchpoints is a bit... difficult. any _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sat, Jun 08, 2013 at 10:02:16PM +0200, Holger Hans Peter Freyther wrote:
> Hi, Dear Paolo > So I think that the oop will be swept when it should not? Any idea on > how to continue to debug this? Tracing all flag assignments with hw > watchpoints is a bit... difficult. I added various asserts and the below is hit the first. An OOP is swept while it is still marked as F_XLAT_REACHABLE. Now this appears to be complicated. The native code can only set the F_XLAT_REACHABLE but it is never cleared (two processes can enter the same method so there is no point in setting it back). How did you intend the garbage collection to work here? a.) Never delete native code? b.) Start to walk the context list(s)? As part of the native code? c.) Allow it to be collected and re-generate d.) Mark the OOP when it is put into the context? But then again, why is the CompiledMethod GCed? It should be reachable from the Method Dictionary of the method? diff --git a/libgst/oop.c b/libgst/oop.c index 6b79935..2cb3cd7 100644 --- a/libgst/oop.c +++ b/libgst/oop.c @@ -1435,11 +1435,14 @@ _gst_sweep_oop (OOP oop) #ifdef ENABLE_JIT_TRANSLATION if (oop->flags & F_XLAT) + { /* Unreachable, always free the native code. It is *not* optional to free the code in this case -- and I'm not talking about memory leaks: a different method could use the same OOP as this one and the old method would be executed instead of the new one! */ - _gst_release_native_code (oop); + assert ((oop->flags & F_XLAT_REACHABLE) == 0); + _gst_release_native_code (oop); + } #endif _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 09, 2013 at 08:25:34AM +0200, Holger Hans Peter Freyther wrote:
> d.) Mark the OOP when it is put into the context? > > But then again, why is the CompiledMethod GCed? It should be reachable > from the Method Dictionary of the method? if UNCOMMON (oop->flags & F_CONTEXT) { gst_method_context ctx; intptr_t methodSP; ctx = (gst_method_context) object; methodSP = TO_INT (ctx->spOffset); /* printf("setting up for loop on context %x, sp = %d\n", ctx, methodSP); */ TAIL_MARK_OOPRANGE (&ctx->objClass, ctx->contextStack + methodSP + 1); } The code is already "walking" the context (if it is present). Now the code is doing a tail recursion and just doing ctx->method->flags |= for the F_REACHABLE attribute made more harm than it fixed but I think this is what should happen (or we put the method OOP onto the stack as well)? what do you think? holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 09, 2013 at 08:45:56AM +0200, Holger Hans Peter Freyther wrote:
> TAIL_MARK_OOPRANGE (&ctx->objClass, > ctx->contextStack + methodSP + 1); this should already walk over the ctx->method and mark it. I need to look deeper into the marking and see which xlated method is swept _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 09, 2013 at 02:58:33PM +0200, Holger Hans Peter Freyther wrote:
> this should already walk over the ctx->method and mark it. I need to > look deeper into the marking and see which xlated method is swept And I am back to memory corruption as a cause of this. In theory valgrind could work for GST but I don't know where to start. :} _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, Jun 09, 2013 at 07:34:05PM +0200, Holger Hans Peter Freyther wrote:
> On Sun, Jun 09, 2013 at 02:58:33PM +0200, Holger Hans Peter Freyther wrote: > > > this should already walk over the ctx->method and mark it. I need to > > look deeper into the marking and see which xlated method is swept > > And I am back to memory corruption as a cause of this. In theory valgrind > could work for GST but I don't know where to start. :} Hardware watchpoint 1: -location $1->flags Old value = 5243174 New value = 1048870 _gst_release_native_code (methodOOP=methodOOP@entry=0x4093d4a0) at xlat.c:3889 3889 if (methodOOP->flags & F_XLAT_DISCARDED) (gdb) bt #0 _gst_release_native_code (methodOOP=methodOOP@entry=0x4093d4a0) at xlat.c:3889 #1 0xb7f2c8a0 in maybe_release_xlat (oop=0x4093d4a0) at oop.inl:165 #2 alloc_oop (flags=262144, objData=0xb760e4ac) at oop.inl:188 #3 _gst_alloc_obj (size=20, p_oop=p_oop@entry=0xbfffea74) at oop.c:787 #4 0xb7f73788 in new_instance (p_oop=0xbfffea74, class_oop=0x408f44a8) at dict.inl:710 #5 _gst_make_block_closure (blockOOP=0x40910a88) at interp.c:1303 #6 0x081e9dd5 in ?? () so the idea is that if F_XLAT_REACHABLE is set the entire "maybe_release_xlat" will trigger... I was disabling the call from alloc_oop. Do you remember why the call is inside the alloc_oop at all (and the others)? E.g. why is it too late in the "oop swept" routine? The jitted code is 'attached' to the method OOP anyway? holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |