Hi all,
I created the attached torture test to get a feeling of how many processes I can create and if my planned approach would work. With about 100.000 processes I ran into a crash inside the GC. Compiling GST without support for the generational GC seem not to crash. Is this test just hitting the limit of number of objects that the GC can properly manage? I will now build with GC_DEBUG and see what we are hitting. regards holger With the generational GC: #2 0x0013df6e in abort () at abort.c:92 #3 0x007e1615 in oldspace_sigsegv_handler (fault_address=0x10, serious=0) at ../../libgst/oop.c:942 #4 0x008331b4 in sigsegv_handler (sig=11, sc=...) at ../../../sigsegv/src/handler-unix.c:134 #5 <signal handler called> #6 0x007e1242 in scanned_fields_in (object=<value optimized out>, flags=<value optimized out>) at ../../libgst/oop.c:1940 #7 0x007e286d in _gst_copy_an_oop (oop=<value optimized out>) at ../../libgst/oop.c:2079 #8 0x007e2b58 in scan_grey_pages () at ../../libgst/oop.c:1847 #9 0x007e38fc in copy_oops () at ../../libgst/oop.c:1755 #10 _gst_scavenge () at ../../libgst/oop.c:1229 #11 0x007e3e5c in _gst_alloc_obj (size=20, p_oop=0xbf8abd6c) at ../../libgst/oop.c:772 Without the generational GC: The backtrace _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk ParallelTtest.st (488 bytes) Download Attachment |
On 11/20/2010 06:30 PM, Holger Hans Peter Freyther wrote:
> I created the attached torture test to get a feeling of how many processes I > can create and if my planned approach would work. With about 100.000 processes > I ran into a crash inside the GC. Compiling GST without support for the > generational GC seem not to crash. > > Is this test just hitting the limit of number of objects that the GC can > properly manage? It's certainly a heavy stress test, but it shouldn't crash the VM. Thanks! Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/20/2010 06:32 PM, Paolo Bonzini wrote:
> > It's certainly a heavy stress test, but it shouldn't crash the VM. Thanks! > GC_DEBUG didn't help. So I am now with valgrind and have some issues in dict.c. It appears that init_runtime_objects is called before _gst_init_dictionary is called, or at least the dictionary is initialized. I am not sure what is the right way to resolve this though. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/20/2010 07:14 PM, Holger Hans Peter Freyther wrote:
> On 11/20/2010 06:32 PM, Paolo Bonzini wrote: > >> >> It's certainly a heavy stress test, but it shouldn't crash the VM. Thanks! >> > GC_DEBUG didn't help. So I am now with valgrind and have some issues in > dict.c. It appears that init_runtime_objects is called before > _gst_init_dictionary is called, or at least the dictionary is initialized. I > am not sure what is the right way to resolve this though. okay, not quite true.. but somehow it is not initialized... i will keep digging to get valgrind working on gst... _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Holger Freyther
On 11/20/2010 06:30 PM, Holger Hans Peter Freyther wrote:
> I created the attached torture test to get a feeling of how many processes I > can create and if my planned approach would work. With about 100.000 processes > I ran into a crash inside the GC. Compiling GST without support for the > generational GC seem not to crash. How much time does it take to crash? Does it happen even without the printNl. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/21/2010 11:48 AM, Paolo Bonzini wrote:
> On 11/20/2010 06:30 PM, Holger Hans Peter Freyther wrote: >> I created the attached torture test to get a feeling of how many processes I >> can create and if my planned approach would work. With about 100.000 processes >> I ran into a crash inside the GC. Compiling GST without support for the >> generational GC seem not to crash. > > How much time does it take to crash? Does it happen even without the printNl. It crashes without the printNl, it needs the call to delay wait. I can create the Delay for each process once and it is still crashing, it also needs a lot of processes to force this crash. It crashes within 30 seconds or such. I am going to try your approach with GDB, watchpoints and continuing a couple of times and see if I can get it to crash and have gdb right there. I might also try reverse debugging... _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/21/2010 01:14 PM, Holger Hans Peter Freyther wrote:
> I am going to try your approach with GDB, watchpoints and continuing a couple > of times and see if I can get it to crash and have gdb right there. I might > also try reverse debugging... reverse debugging works better than it did in GDB 7.0, 7.1 but it is too slow to be usable... I will just let it run anyway.. just to see if it can be helpful. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Holger Freyther
On 11/21/2010 01:14 PM, Holger Hans Peter Freyther wrote:
> On 11/21/2010 11:48 AM, Paolo Bonzini wrote: >> On 11/20/2010 06:30 PM, Holger Hans Peter Freyther wrote: >>> I created the attached torture test to get a feeling of how many processes I >>> can create and if my planned approach would work. With about 100.000 processes >>> I ran into a crash inside the GC. Compiling GST without support for the >>> generational GC seem not to crash. >> >> How much time does it take to crash? Does it happen even without the printNl. > > It crashes without the printNl, it needs the call to delay wait. I can create > the Delay for each process once and it is still crashing, it also needs a lot > of processes to force this crash. It crashes within 30 seconds or such. Ok, reproduced. Here is a more deterministic testcase: Object subclass: Scheduler [ MutexSem := Semaphore forMutualExclusion. TimeoutSem := Semaphore new. Scheduler class >> step [ TimeoutSem wait ] Scheduler class >> kick [ MutexSem critical: [TimeoutSem signal] ] ] Eval [ [[Scheduler step] repeat] forkAt: Processor userInterruptPriority. 1 to: 100000 do: [:thread_nr | [ | id | id := thread_nr. id \\ 1000 == 0 ifTrue: [id printNl]. 20 timesRepeat: [Scheduler kick]. ] fork. ]. Semaphore new wait. ] where the Scheduler class is a heavily butchered version of Delay. :) Interestingly, inlining the two methods in the Eval makes the testcase work, so it's probably something related to contexts. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/21/2010 04:10 PM, Paolo Bonzini wrote:
> where the Scheduler class is a heavily butchered version of Delay. :) > > Interestingly, inlining the two methods in the Eval makes the testcase > work, so it's probably something related to contexts. It's a memory corruption due to running out-of-memory and not detecting it. FWIW, here are my debugging steps: 1) after some fruitless attempts to get to the point of corruption with gdb, I added this patch diff --git a/libgst/oop.c b/libgst/oop.c index f5b885b..4c15f57 100644 --- a/libgst/oop.c +++ b/libgst/oop.c @@ -1076,6 +1076,7 @@ _gst_global_gc (int next_allocation) int old_limit; _gst_mem.numGlobalGCs++; + _gst_mem.numScavenges = 0; old_limit = _gst_mem.old->heap_limit; _gst_mem.old->heap_limit = 0; @@ -2032,10 +2033,10 @@ _gst_copy_an_oop (OOP oop) obj = OOP_TO_OBJ (oop); pData = (OOP *) obj; -#if defined(GC_DEBUG_OUTPUT) - printf (">Copy "); + if (_gst_mem.numGlobalGCs == 20 && _gst_mem.numScavenges == 249) { + printf (">Copy %p ", ((gst_object)0x7ffff6dc87a0)->objClass); _gst_display_oop (oop); -#endif + } #if defined (GC_DEBUGGING) if UNCOMMON (!IS_INT (obj->objSize)) I easily got the numbers (20/249/0x7ffff6dc87a0) from the breakpoints I was using in gdb. The debugging output wasn't too long and had >Copy 0x7fc75b361920 0x7fc75f495300 0x7ffff7268010 ... >Copy (nil) ... which showed that OOP 0x7fc75f495300 was being copied at the time of the corruption. 2) I put a breakpoint on the call to _gst_display_oop, conditional on printing the OOP that I got from the debugging output. 3) At the breakpoint, I put a watchpoint on *(void **)0x7ffff6dc87a0. I remembered hardware watchpoints didn't work so I used a software one. HW watchpoints indeed didn't work because the corruption happened in kernel mode (due to one mmap overwriting another): Watchpoint 3: *(void **)0x7ffff6dc87a0 Old value = (void *) 0x23 New value = (void *) 0x0 0x0000003bda0dfffa in mmap64 () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003bda0dfffa in mmap64 () #1 0x00007ffff7d684ac in anon_mmap_commit (base=<value optimized out>, size=<value optimized out>) at ../../libgst/sysdep/posix/mem.c:227 #2 0x00007ffff7d6684b in heap_sbrk_internal (hdp=0x7fffd6d82000, size=262144) at ../../libgst/heap.c:235 #3 0x00007ffff7d66692 in _gst_heap_sbrk (hd=0x7fffd6d83000 "@", size=262144) at ../../libgst/heap.c:187 (gdb) up 3 #3 0x00007ffff7d66692 in _gst_heap_sbrk (hd=0x7fffd6d83000 "@", size=262144) at ../../libgst/heap.c:187 187 return heap_sbrk_internal (hdp, size); (gdb) p hdp $5 = (struct heap *) 0x7fffd6d82000 (gdb) p *$ $6 = {areasize = 536870912, base = 0x7fffd6d82000 "", breakval = 0x7ffff6dc3000 "", top = 0x7ffff6dc3000 ""} (gdb) p hdp->breakval - hdp->base $7 = 537137152 So the heap had overflowed. Trivial patch follows: diff --git a/libgst/heap.c b/libgst/heap.c index 25d7f50..1f64fb2 100644 --- a/libgst/heap.c +++ b/libgst/heap.c @@ -218,6 +218,18 @@ heap_sbrk_internal (struct heap * hdp, } else if (hdp->breakval + size > hdp->top) { + if (hdp->breakval - hdp->base + size > hdp->areasize) + { + if (hdp->breakval - hdp->base == hdp->areasize); + { + /* FIXME: a library should never exit! */ + fprintf (stderr, "gst: out of memory allocating %d bytes\n", + size); + exit (1); + } + size = hdp->areasize - (hdp->breakval - hdp->base); + } + moveto = PAGE_ALIGN (hdp->breakval + size); mapbytes = moveto - hdp->top; mapto = _gst_osmem_commit (hdp->top, mapbytes); Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 11/21/2010 08:14 PM, Paolo Bonzini wrote:
> On 11/21/2010 04:10 PM, Paolo Bonzini wrote: >> where the Scheduler class is a heavily butchered version of Delay. :) >> >> Interestingly, inlining the two methods in the Eval makes the testcase >> work, so it's probably something related to contexts. > > It's a memory corruption due to running out-of-memory and not detecting it. > Thanks, I am just back from a Concert. I was suspecting OOM as well and now created a testcase which allocates a BigObject and it is crashing too but you were faster. What do you propose as a proper resolution? Is there some kind of exception and Context we could pre-allocate and then raise it? Maybe reserve some more heap for the OOM case? z. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |