Hi, In one of our projects we are using Pharo4. The image gets built by gradle, which loads the Metacello project. Sometimes, we see the build process hangs. It just don't progress. When adding local gitfiletree://
dependencies manually through Monticello after a while Pharo gets frozen. It's not always the same repository,
it's not always the same number of repositories before it hangs.I launched the image with strace, and attached gdb to the frozen process. It turns out It's waiting for a lock that gets never released. The environment is a 64b Gentoo Linux with enough of everything (multiple monitors, multiple cores, enough RAM). I hope anybody could point me how to dig deeper into this. [..] Reading symbols from /usr/lib32/libbz2.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib32/libbz2.so.1 0x0809d8bb in signalSemaphoreWithIndex () (gdb) backtrace #0 0x0809d8bb in signalSemaphoreWithIndex () #1 0x0810868c in handleSignal () #2 <signal handler called> #3 0x0809d8c8 in signalSemaphoreWithIndex () #4 0x0809f0af in aioPoll () #5 0xf76f9671 in display_ioRelinquishProcessorForMicroseconds () from /home/chous/realhome/toolbox/pharo-5.0/pharo-vm/vm-display-X11 #6 0x080a1887 in ioRelinquishProcessorForMicroseconds () #7 0x080767fa in primitiveRelinquishProcessor () #8 0xb6fc838c in ?? () #9 0xb6fc3700 in ?? () #10 0xb7952882 in ?? () #11 0xb6fc3648 in ?? () (gdb) disassemble Dump of assembler code for function handleSignal: 0x081085e0 <+0>: sub $0x9c,%esp 0x081085e6 <+6>: mov %ebx,0x90(%esp) 0x081085ed <+13>: mov 0xa0(%esp),%ebx 0x081085f4 <+20>: mov %esi,0x94(%esp) 0x081085fb <+27>: mov %edi,0x98(%esp) 0x08108602 <+34>: movzbl 0x8168420(%ebx),%esi 0x08108609 <+41>: mov %ebx,%eax 0x0810860b <+43>: mov %esi,%edx 0x0810860d <+45>: call 0x81070d0 <forwardSignaltoSemaphoreAt> 0x08108612 <+50>: call 0x805aae0 <pthread_self@plt> 0x08108617 <+55>: mov 0x8168598,%edi 0x0810861d <+61>: cmp %edi,%eax 0x0810861f <+63>: je 0x8108680 <handleSignal+160> 0x08108621 <+65>: lea 0x10(%esp),%esi 0x08108625 <+69>: mov %esi,(%esp) 0x08108628 <+72>: call 0x805b330 <sigemptyset@plt> 0x0810862d <+77>: mov %ebx,0x4(%esp) 0x08108631 <+81>: mov %esi,(%esp) 0x08108634 <+84>: call 0x805b0c0 <sigaddset@plt> 0x08108639 <+89>: movl $0x0,0x8(%esp) 0x08108641 <+97>: mov %esi,0x4(%esp) 0x08108645 <+101>: movl $0x0,(%esp) 0x0810864c <+108>: call 0x805ada0 <pthread_sigmask@plt> 0x08108651 <+113>: mov %ebx,0x4(%esp) 0x08108655 <+117>: mov %edi,(%esp) 0x08108658 <+120>: call 0x805b240 <pthread_kill@plt> 0x0810865d <+125>: mov 0x90(%esp),%ebx 0x08108664 <+132>: mov 0x94(%esp),%esi 0x0810866b <+139>: mov 0x98(%esp),%edi 0x08108672 <+146>: add $0x9c,%esp 0x08108678 <+152>: ret 0x08108679 <+153>: lea 0x0(%esi,%eiz,1),%esi 0x08108680 <+160>: test %esi,%esi 0x08108682 <+162>: je 0x810865d <handleSignal+125> 0x08108684 <+164>: mov %esi,(%esp) 0x08108687 <+167>: call 0x809d8a0 <signalSemaphoreWithIndex> => 0x0810868c <+172>: jmp 0x810865d <handleSignal+125> End of assembler dump. (gdb) up 3 (gdb) disassemble Dump of assembler code for function signalSemaphoreWithIndex: 0x0809d8a0 <+0>: push %esi 0x0809d8a1 <+1>: xor %eax,%eax 0x0809d8a3 <+3>: push %ebx 0x0809d8a4 <+4>: sub $0x24,%esp 0x0809d8a7 <+7>: mov 0x30(%esp),%esi 0x0809d8ab <+11>: test %esi,%esi 0x0809d8ad <+13>: jle 0x809d918 <signalSemaphoreWithIndex+120> 0x0809d8af <+15>: mov $0x1,%edx 0x0809d8b4 <+20>: lea 0x0(%esi,%eiz,1),%esi 0x0809d8b8 <+24>: mfence 0x0809d8bb <+27>: mov $0x0,%eax 0x0809d8c0 <+32>: lock cmpxchg %edx,0x8152d80 => 0x0809d8c8 <+40>: mov %eax,0x1c(%esp) 0x0809d8cc <+44>: mov 0x1c(%esp),%eax 0x0809d8d0 <+48>: test %eax,%eax 0x0809d8d2 <+50>: jne 0x809d8b8 <signalSemaphoreWithIndex+24> 0x0809d8d4 <+52>: mov 0x8152d84,%edx 0x0809d8da <+58>: cmp $0x1ff,%edx 0x0809d8e0 <+64>: lea 0x1(%edx),%ebx 0x0809d8e3 <+67>: cmove %eax,%ebx 0x0809d8e6 <+70>: mov 0x8152d88,%eax 0x0809d8eb <+75>: cmp %ebx,%eax 0x0809d8ed <+77>: je 0x809d920 <signalSemaphoreWithIndex+128> 0x0809d8ef <+79>: mov 0x8152d84,%eax 0x0809d8f4 <+84>: mov %esi,0x8152da0(,%eax,4) 0x0809d8fb <+91>: mfence 0x0809d8fe <+94>: mov %ebx,0x8152d84 0x0809d904 <+100>: movl $0x0,0x8152d80 0x0809d90e <+110>: call 0x807c2c0 <forceInterruptCheck> 0x0809d913 <+115>: mov $0x1,%eax 0x0809d918 <+120>: add $0x24,%esp 0x0809d91b <+123>: pop %ebx 0x0809d91c <+124>: pop %esi 0x0809d91d <+125>: ret 0x0809d91e <+126>: xchg %ax,%ax 0x0809d920 <+128>: movl $0x810c888,(%esp) 0x0809d927 <+135>: movl $0x0,0x8152d80 0x0809d931 <+145>: call 0x80a3720 <error> 0x0809d936 <+150>: jmp 0x809d8ef <signalSemaphoreWithIndex+79> End of assembler dump. Meanwhile, strace gets frozen showing this: [..] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f63665cd9d0) = 3736 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {0x42a8a0, [], SA_RESTORER, 0x7f6365ba3ad0}, {SIG_DFL, [], SA_RESTORER, 0x7f6365ba3ad0}, 8) = 0 wait4(-1, 0x7ffc4ef7f7e8, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(-1, |
Hi Jose,
yes, I've noticed that as well. It was, at a point, drastic (i.e. almost allways lock-up) on my work development laptop; it now happens far less often (but it does happens to me from time to time). Dave Lewis, the author of OSProcess, fixed one issue which solved most of the lockups I had, but not all of them. The lockup is in the interaction between OSProcess inside Pharo and the external shell command (i.e. it concerns anything which uses OSProcess), and seems like missing a signal. It is also machine and linux version dependent (Ubuntu 14.10 was horrible, 14.04 and 15.04 on the same hardware are far less sensitive), and seems to also depend on the load of the machine itself. By the way, which version of OSProcess you are using? Thierry 2015-06-02 11:10 GMT+02:00 Jose San Leandro <[hidden email]>:
|
Hi Thierry, ConfigurationOfOSProcess-ThierryGoubier.38.mcz, which corresponds to version 4.6.2.2015-06-02 11:25 GMT+02:00 Thierry Goubier <[hidden email]>:
|
2015-06-02 12:14 GMT+02:00 Jose San Leandro <[hidden email]>:
Ok, then this is the latest.
Yes, this would work. I'll ask again Dave if he has any idea; the bug is hard to reproduce. Would you mind telling the linux kernel / libc version of your gentoo box? Thierry
|
No problem, of course. It's a dual-core running a custom 4.0.4-hardened-r2 kernel, hardened/linux/amd64/selinux profile (but in permissive mode), glibc version 2.20-r2, with multilib and selinux USE flags active.I can provide more information if that helps, of course. Even ssh to a Docker container running in it, but it won't support X I fear. Thanks! 2015-06-02 14:34 GMT+02:00 Thierry Goubier <[hidden email]>:
|
2015-06-02 15:03 GMT+02:00 Jose San Leandro <[hidden email]>:
When the pharo process get locked, can you do a kill -SIGUSR1 on the pharo process and look at the output? It will give the status inside the vm. Thierry
|
In reply to this post by Thierry Goubier
On Tue, Jun 02, 2015 at 02:34:49PM +0200, Thierry Goubier wrote:
> 2015-06-02 12:14 GMT+02:00 Jose San Leandro <[hidden email]>: > > > Hi Thierry, > > > > ConfigurationOfOSProcess-ThierryGoubier.38.mcz, which corresponds to > > version 4.6.2. > > > > Ok, then this is the latest. > > > > > > Another workaround that would work for me is to be able to "resume" a > > previous load attempt of a Metacello project. Or a custom "hook" in > > Metacello to save the image after every dependency is successfully loaded. > > > > Yes, this would work. I'll ask again Dave if he has any idea; the bug is > hard to reproduce. Hi Thierry and Jose, I am reading this thread with interest and will help if I can. I do have one idea that we have not tried before. I have a theory that this may be an intermittent problem caused by SIGCHLD signals (from the external OS process when it exits) being missed by the UnixOSProcessAccessor>>grimReaperProcess that handles them. If this is happening, then I may be able to change grimReaperProcess to work around the problem. When you see the OS deadlock condition, are you able tell if your Pharo VM process has subprocesses in the zombie state (indicating that grimReaperProcess did not clean them up)? The unix command "ps -axf | less" will let you look at the process tree and that may give us a clue if this is happening. Thanks! Dave > > Would you mind telling the linux kernel / libc version of your gentoo box? > > Thierry > > > > > > > > 2015-06-02 11:25 GMT+02:00 Thierry Goubier <[hidden email]>: > > > >> Hi Jose, > >> > >> yes, I've noticed that as well. It was, at a point, drastic (i.e. almost > >> allways lock-up) on my work development laptop; it now happens far less > >> often (but it does happens to me from time to time). > >> > >> Dave Lewis, the author of OSProcess, fixed one issue which solved most of > >> the lockups I had, but not all of them. The lockup is in the interaction > >> between OSProcess inside Pharo and the external shell command (i.e. it > >> concerns anything which uses OSProcess), and seems like missing a signal. > >> It is also machine and linux version dependent (Ubuntu 14.10 was horrible, > >> 14.04 and 15.04 on the same hardware are far less sensitive), and seems to > >> also depend on the load of the machine itself. > >> > >> By the way, which version of OSProcess you are using? > >> > >> Thierry > >> > >> > >> 2015-06-02 11:10 GMT+02:00 Jose San Leandro <[hidden email]>: > >> > >>> Hi, > >>> > >>> In one of our projects we are using Pharo4. The image gets built by > >>> gradle, which loads the Metacello project. Sometimes, we see the build > >>> process hangs. It just don't progress. > >>> > >>> When adding local gitfiletree:// dependencies manually through > >>> Monticello after a while Pharo gets frozen. It's not always the same > >>> repository, it's not always the same number of repositories before it hangs. > >>> > >>> I launched the image with strace, and attached gdb to the frozen process. > >>> It turns out It's waiting for a lock that gets never released. > >>> > >>> The environment is a 64b Gentoo Linux with enough of everything > >>> (multiple monitors, multiple cores, enough RAM). > >>> > >>> I hope anybody could point me how to dig deeper into this. > >>> > >>> # gdb > >>> (gdb) attach [pid] > >>> [..] > >>> Reading symbols from /usr/lib32/libbz2.so.1...(no debugging symbols > >>> found)...done. > >>> Loaded symbols for /usr/lib32/libbz2.so.1 > >>> 0x0809d8bb in signalSemaphoreWithIndex () > >>> (gdb) backtrace > >>> #0 0x0809d8bb in signalSemaphoreWithIndex () > >>> #1 0x0810868c in handleSignal () > >>> #2 <signal handler called> > >>> #3 0x0809d8c8 in signalSemaphoreWithIndex () > >>> #4 0x0809f0af in aioPoll () > >>> #5 0xf76f9671 in display_ioRelinquishProcessorForMicroseconds () from > >>> /home/chous/realhome/toolbox/pharo-5.0/pharo-vm/vm-display-X11 > >>> #6 0x080a1887 in ioRelinquishProcessorForMicroseconds () > >>> #7 0x080767fa in primitiveRelinquishProcessor () > >>> #8 0xb6fc838c in ?? () > >>> #9 0xb6fc3700 in ?? () > >>> #10 0xb7952882 in ?? () > >>> #11 0xb6fc3648 in ?? () > >>> (gdb) disassemble > >>> Dump of assembler code for function handleSignal: > >>> 0x081085e0 <+0>: sub $0x9c,%esp > >>> 0x081085e6 <+6>: mov %ebx,0x90(%esp) > >>> 0x081085ed <+13>: mov 0xa0(%esp),%ebx > >>> 0x081085f4 <+20>: mov %esi,0x94(%esp) > >>> 0x081085fb <+27>: mov %edi,0x98(%esp) > >>> 0x08108602 <+34>: movzbl 0x8168420(%ebx),%esi > >>> 0x08108609 <+41>: mov %ebx,%eax > >>> 0x0810860b <+43>: mov %esi,%edx > >>> 0x0810860d <+45>: call 0x81070d0 <forwardSignaltoSemaphoreAt> > >>> 0x08108612 <+50>: call 0x805aae0 <pthread_self@plt> > >>> 0x08108617 <+55>: mov 0x8168598,%edi > >>> 0x0810861d <+61>: cmp %edi,%eax > >>> 0x0810861f <+63>: je 0x8108680 <handleSignal+160> > >>> 0x08108621 <+65>: lea 0x10(%esp),%esi > >>> 0x08108625 <+69>: mov %esi,(%esp) > >>> 0x08108628 <+72>: call 0x805b330 <sigemptyset@plt> > >>> 0x0810862d <+77>: mov %ebx,0x4(%esp) > >>> 0x08108631 <+81>: mov %esi,(%esp) > >>> 0x08108634 <+84>: call 0x805b0c0 <sigaddset@plt> > >>> 0x08108639 <+89>: movl $0x0,0x8(%esp) > >>> 0x08108641 <+97>: mov %esi,0x4(%esp) > >>> 0x08108645 <+101>: movl $0x0,(%esp) > >>> 0x0810864c <+108>: call 0x805ada0 <pthread_sigmask@plt> > >>> 0x08108651 <+113>: mov %ebx,0x4(%esp) > >>> 0x08108655 <+117>: mov %edi,(%esp) > >>> 0x08108658 <+120>: call 0x805b240 <pthread_kill@plt> > >>> 0x0810865d <+125>: mov 0x90(%esp),%ebx > >>> 0x08108664 <+132>: mov 0x94(%esp),%esi > >>> 0x0810866b <+139>: mov 0x98(%esp),%edi > >>> 0x08108672 <+146>: add $0x9c,%esp > >>> 0x08108678 <+152>: ret > >>> 0x08108679 <+153>: lea 0x0(%esi,%eiz,1),%esi > >>> 0x08108680 <+160>: test %esi,%esi > >>> 0x08108682 <+162>: je 0x810865d <handleSignal+125> > >>> 0x08108684 <+164>: mov %esi,(%esp) > >>> 0x08108687 <+167>: call 0x809d8a0 <signalSemaphoreWithIndex> > >>> => 0x0810868c <+172>: jmp 0x810865d <handleSignal+125> > >>> End of assembler dump. > >>> (gdb) up 3 > >>> (gdb) disassemble > >>> Dump of assembler code for function signalSemaphoreWithIndex: > >>> 0x0809d8a0 <+0>: push %esi > >>> 0x0809d8a1 <+1>: xor %eax,%eax > >>> 0x0809d8a3 <+3>: push %ebx > >>> 0x0809d8a4 <+4>: sub $0x24,%esp > >>> 0x0809d8a7 <+7>: mov 0x30(%esp),%esi > >>> 0x0809d8ab <+11>: test %esi,%esi > >>> 0x0809d8ad <+13>: jle 0x809d918 <signalSemaphoreWithIndex+120> > >>> 0x0809d8af <+15>: mov $0x1,%edx > >>> 0x0809d8b4 <+20>: lea 0x0(%esi,%eiz,1),%esi > >>> 0x0809d8b8 <+24>: mfence > >>> 0x0809d8bb <+27>: mov $0x0,%eax > >>> 0x0809d8c0 <+32>: lock cmpxchg %edx,0x8152d80 > >>> => 0x0809d8c8 <+40>: mov %eax,0x1c(%esp) > >>> 0x0809d8cc <+44>: mov 0x1c(%esp),%eax > >>> 0x0809d8d0 <+48>: test %eax,%eax > >>> 0x0809d8d2 <+50>: jne 0x809d8b8 <signalSemaphoreWithIndex+24> > >>> 0x0809d8d4 <+52>: mov 0x8152d84,%edx > >>> 0x0809d8da <+58>: cmp $0x1ff,%edx > >>> 0x0809d8e0 <+64>: lea 0x1(%edx),%ebx > >>> 0x0809d8e3 <+67>: cmove %eax,%ebx > >>> 0x0809d8e6 <+70>: mov 0x8152d88,%eax > >>> 0x0809d8eb <+75>: cmp %ebx,%eax > >>> 0x0809d8ed <+77>: je 0x809d920 <signalSemaphoreWithIndex+128> > >>> 0x0809d8ef <+79>: mov 0x8152d84,%eax > >>> 0x0809d8f4 <+84>: mov %esi,0x8152da0(,%eax,4) > >>> 0x0809d8fb <+91>: mfence > >>> 0x0809d8fe <+94>: mov %ebx,0x8152d84 > >>> 0x0809d904 <+100>: movl $0x0,0x8152d80 > >>> 0x0809d90e <+110>: call 0x807c2c0 <forceInterruptCheck> > >>> 0x0809d913 <+115>: mov $0x1,%eax > >>> 0x0809d918 <+120>: add $0x24,%esp > >>> 0x0809d91b <+123>: pop %ebx > >>> 0x0809d91c <+124>: pop %esi > >>> 0x0809d91d <+125>: ret > >>> 0x0809d91e <+126>: xchg %ax,%ax > >>> 0x0809d920 <+128>: movl $0x810c888,(%esp) > >>> 0x0809d927 <+135>: movl $0x0,0x8152d80 > >>> 0x0809d931 <+145>: call 0x80a3720 <error> > >>> 0x0809d936 <+150>: jmp 0x809d8ef <signalSemaphoreWithIndex+79> > >>> End of assembler dump. > >>> > >>> Meanwhile, strace gets frozen showing this: > >>> [..] > >>> clone(child_stack=0, > >>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > >>> child_tidptr=0x7f63665cd9d0) = 3736 > >>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > >>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 > >>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > >>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 > >>> rt_sigaction(SIGINT, {0x42a8a0, [], SA_RESTORER, 0x7f6365ba3ad0}, > >>> {SIG_DFL, [], SA_RESTORER, 0x7f6365ba3ad0}, 8) = 0 > >>> wait4(-1, 0x7ffc4ef7f7e8, 0, NULL) = ? ERESTARTSYS (To be restarted > >>> if SA_RESTART is set) > >>> --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > >>> wait4(-1, > >>> > >> > >> > > |
Hi Dave,
Le 03/06/2015 03:15, David T. Lewis a écrit : > Hi Thierry and Jose, > > I am reading this thread with interest and will help if I can. > > I do have one idea that we have not tried before. I have a theory that this may > be an intermittent problem caused by SIGCHLD signals (from the external OS process > when it exits) being missed by the UnixOSProcessAccessor>>grimReaperProcess > that handles them. > > If this is happening, then I may be able to change grimReaperProcess to > work around the problem. > > When you see the OS deadlock condition, are you able tell if your Pharo VM > process has subprocesses in the zombie state (indicating that grimReaperProcess > did not clean them up)? The unix command "ps -axf | less" will let you look > at the process tree and that may give us a clue if this is happening. I found it very easy to reproduce and I do have a zombie children process to the pharo process. Interesting enough, the lock-up happens in a very specific place, a call to git branch, which is a very short command returning just a few characters (where all other commands have longuer outputs). Reducing the frequency of the calls to git branch by a bit of caching reduces the chances of a lock-up. Thanks, Dave > Thanks! > > Dave > > > |
On Wed, Jun 3, 2015 at 1:05 PM, Thierry Goubier
<[hidden email]> wrote: > Hi Dave, > > Le 03/06/2015 03:15, David T. Lewis a écrit : >> >> Hi Thierry and Jose, >> >> I am reading this thread with interest and will help if I can. >> >> I do have one idea that we have not tried before. I have a theory that >> this may >> be an intermittent problem caused by SIGCHLD signals (from the external OS >> process >> when it exits) being missed by the >> UnixOSProcessAccessor>>grimReaperProcess >> that handles them. >> >> If this is happening, then I may be able to change grimReaperProcess to >> work around the problem. >> >> When you see the OS deadlock condition, are you able tell if your Pharo VM >> process has subprocesses in the zombie state (indicating that >> grimReaperProcess >> did not clean them up)? The unix command "ps -axf | less" will let you >> look >> at the process tree and that may give us a clue if this is happening. > > > I found it very easy to reproduce and I do have a zombie children process to > the pharo process. > > Interesting enough, the lock-up happens in a very specific place, a call to > git branch, which is a very short command returning just a few characters > (where all other commands have longuer outputs). Reducing the frequency of > the calls to git branch by a bit of caching reduces the chances of a > lock-up. As a workaround and investigation, can you wrap the "git banch" in a script and experiment with extending the time. #!/usr/local/mygitbranch git branch $@ STATUS=$? # sleep 1 exit STATUS http://www.tldp.org/LDP/abs/html/exit-status.html http://stackoverflow.com/questions/18492443/pass-all-parameters-of-one-shell-script-to-another cheers -ben |
In reply to this post by Thierry Goubier
Hi Dave, Thierry, Here's what I get in all recent attempts:[..] /2.3/lib/gradle-launcher-2.3.jar org.gradle.launcher.GradleMain assemble 18620 pts/5 S+ 0:00 | \_ bash /home/chous/toolbox/pharo/pharo Pharo.image config gitfiletree:///home/chous/osoco/open-badges/game-core ConfigurationOfGameCore --install=bleedingEdge 18635 pts/5 R+ 5:07 | \_ /home/chous/toolbox/pharo/pharo-vm/pharo --nodisplay Pharo.image config gitfiletree:///home/chous/osoco/open-badges/game-core ConfigurationOfGameCore --install=bleedingEdge 32741 pts/5 Z+ 0:00 | \_ [git] <defunct> 2015-06-03 7:05 GMT+02:00 Thierry Goubier <[hidden email]>: Hi Dave, |
Sending the SIGUSR1 signal prints this: Most recent primitives0xb84e2a7c s NonInteractiveUIManager(UIManager)>defer: 0xb84e29f4 s PharoCommandLineHandler class>activateWith: 0xb84f5e08 s [] in BasicCommandLineHandler>activateSubCommand: 0xb84e2e98 s BlockClosure>on:do: 0xb84e2970 s BasicCommandLineHandler>activateSubCommand: 0xb84e2914 s BasicCommandLineHandler>handleSubcommand 0xb84f5e64 s BasicCommandLineHandler>handleArgument: 0xb84e281c s [] in BasicCommandLineHandler>activate 0xb84e2878 s BlockClosure>on:do: 0xb84e27a0 s BasicCommandLineHandler>activate 0xb84f5cf4 s [] in BasicCommandLineHandler class>startUp: 0xb84f5d50 s BlockClosure>cull: 0xb84f5dac s [] in SmalltalkImage>executeDeferredStartupActions: 0xb84e2ef4 s BlockClosure>on:do: 0xb84e0ee4 s SmalltalkImage>logStartUpErrorDuring:into:tryDebugger: 0xb84e0e1c s SmalltalkImage>executeDeferredStartupActions: 0xb84e0c1c s SmalltalkImage>startupImage:snapshotWorked: 0xb84e0170 s SmalltalkImage>snapshot:andQuit: 0xb84e04ac s [] in WorldState class>saveAndQuit 0xb84e0508 s BlockClosure>ensure: 0xb84d3274 s CursorWithMask(Cursor)>showWhile: 0xb84d3208 s WorldState class>saveAndQuit 0xb84e0564 s [] in ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent: 0xb84e05c0 s BlockClosure>ensure: 0xb84d3198 s CursorWithMask(Cursor)>showWhile: 0xb84d30c8 s ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent: 0xb84d306c s ToggleMenuItemMorph(MenuItemMorph)>mouseUp: 0xb84e061c s ToggleMenuItemMorph(MenuItemMorph)>handleMouseUp: 0xb84e0678 s MouseButtonEvent>sentTo: 0xb84e06d4 s ToggleMenuItemMorph(Morph)>handleEvent: 0xb84e0730 s MorphicEventDispatcher>dispatchDefault:with: 0xb84e078c s MorphicEventDispatcher>handleMouseUp: 0xb84e07e8 s MouseButtonEvent>sentTo: 0xb84e0844 s [] in MorphicEventDispatcher>dispatchEvent:with: 0xb84e08a0 s BlockClosure>ensure: 0xb84d2fec s MorphicEventDispatcher>dispatchEvent:with: 0xb84e08fc s ToggleMenuItemMorph(Morph)>processEvent:using: 0xb84d2f08 s MorphicEventDispatcher>dispatchDefault:with: 0xb84d2f64 s MorphicEventDispatcher>handleMouseUp: 0xb84e01cc s MouseButtonEvent>sentTo: 0xb84e0228 s [] in MorphicEventDispatcher>dispatchEvent:with: 0xb84e0284 s BlockClosure>ensure: 0xb84d2e88 s MorphicEventDispatcher>dispatchEvent:with: 0xb84e02e0 s MenuMorph(Morph)>processEvent:using: 0xb84e033c s MenuMorph(Morph)>processEvent: 0xb84e0398 s MenuMorph>handleFocusEvent: 0xb84e03f4 s [] in HandMorph>sendFocusEvent:to:clear: 0xb84e0450 s BlockClosure>on:do: 0xb84d2d88 s WorldMorph(PasteUpMorph)>becomeActiveDuring: 0xb84d2d10 s HandMorph>sendFocusEvent:to:clear: 0xb84d2e00 s HandMorph>sendEvent:focus:clear: 0xb84d2c9c s HandMorph>sendMouseEvent: 0xb84e0958 s HandMorph>handleEvent: 0xb84e09b4 s HandMorph>processEvents 0xb84e0a10 s [] in WorldState>doOneCycleNowFor: 0xb84e0a6c s Array(SequenceableCollection)>do: 0xb84e0ac8 s WorldState>handsDo: 0xb84d2ba4 s WorldState>doOneCycleNowFor: 0xb84e0b24 s WorldState>doOneCycleFor: 0xb84e0b80 s WorldMorph>doOneCycle 0xb84a0b60 s [] in MorphicUIManager>spawnNewProcess 0xb84a0adc s [] in BlockClosure>newProcess [..] On Wed, Jun 3, 2015 at 8:32 AM, Jose San Leandro <[hidden email]> wrote:
2015-06-03 8:32 GMT+02:00 Jose San Leandro <[hidden email]>:
|
In reply to this post by Thierry Goubier
On Wed, Jun 03, 2015 at 07:05:15AM +0200, Thierry Goubier wrote:
> Hi Dave, > > Le 03/06/2015 03:15, David T. Lewis a ?crit : > >Hi Thierry and Jose, > > > >I am reading this thread with interest and will help if I can. > > > >I do have one idea that we have not tried before. I have a theory that > >this may > >be an intermittent problem caused by SIGCHLD signals (from the external OS > >process > >when it exits) being missed by the UnixOSProcessAccessor>>grimReaperProcess > >that handles them. > > > >If this is happening, then I may be able to change grimReaperProcess to > >work around the problem. > > > >When you see the OS deadlock condition, are you able tell if your Pharo VM > >process has subprocesses in the zombie state (indicating that > >grimReaperProcess > >did not clean them up)? The unix command "ps -axf | less" will let you look > >at the process tree and that may give us a clue if this is happening. > > I found it very easy to reproduce and I do have a zombie children > process to the pharo process. Can you try filing in the attached UnixOSProcessAccessor>>grimReaperProcess and see if it helps? I do not know if it will make a difference, but the idea is to put a timeout on the semaphore that is waiting for signals from SIGCHLD. I am hoping that if these signals are sometimes being missed, then the timeout will allow the process to recover from the problem. > > Interesting enough, the lock-up happens in a very specific place, a call > to git branch, which is a very short command returning just a few > characters (where all other commands have longuer outputs). Reducing the > frequency of the calls to git branch by a bit of caching reduces the > chances of a lock-up. > This is a good clue, and it may indicate a different kind of problem (so maybe I am looking in the wrong place). Ben's suggestion of adding a delay to the external process sounds like a good idea to help troubleshoot it. Dave UnixOSProcessAccessor-grimReaperProcess.st (1K) Download Attachment |
Unfortunately it doesn't fix it, or at least I get the same sympthoms. Sending SIGUSR1 prints this:SIGUSR1 Wed Jun 3 16:53:50 2015 /home/chous/toolbox/pharo-4.0/pharo-vm/pharo pharo VM version: 3.9-7 #1 Thu Apr 2 00:51:45 CEST 2015 gcc 4.6.3 [Production ITHB VM] Built from: NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 With: NBCogit NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 Revision: https://github.com/pharo-project/pharo-vm.git Commit: 32d18ba0f2db9bee7f3bdbf16bdb24fe4801cfc5 Date: 2015-03-24 11:08:14 +0100 By: Esteban Lorenzano <[hidden email]> Jenkins build #14904 Build host: Linux pharo-linux 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux plugin path: /home/chous/toolbox/pharo-4.0/pharo-vm/ [default: /home/chous/toolbox/pharo-4.0/pharo-vm/] C stack backtrace & registers: eax 0xff981e94 ebx 0xff981db0 ecx 0xff981e48 edx 0xff981dfc edi 0xff981c80 esi 0xff981c80 ebp 0xff981d18 esp 0xff981d64 eip 0xff981f78 *[0xff981f78] /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a33a2] /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a3649] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf773acc0] /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x28)[0x809d8c8] /home/chous/toolbox/pharo/pharo-vm/pharo[0x810868c] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf773acb0] /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x5e)[0x809d8fe] /home/chous/toolbox/pharo/pharo-vm/pharo(aioPoll+0x22f)[0x809f0af] /home/chous/toolbox/pharo-4.0/pharo-vm/vm-display-X11(+0xe671)[0xf772a671] /home/chous/toolbox/pharo/pharo-vm/pharo(ioRelinquishProcessorForMicroseconds+0x17)[0x80a1887] /home/chous/toolbox/pharo/pharo-vm/pharo[0x80767fa] [0xb4a2fe0c] [0xb4a2d700] [0xb53b9382] [0xb4a2d648] [0x5b] All Smalltalk process stacks (active first): Process 0xb6d930c4 priority 10 0xff9ad450 M ProcessorScheduler class>idleProcess 0xb4d935c0: a(n) ProcessorScheduler class 0xff9ad470 I [] in ProcessorScheduler class>startUp 0xb4d935c0: a(n) ProcessorScheduler class 0xff9ad490 I [] in BlockClosure>newProcess 0xb6d92fe8: a(n) BlockClosure suspended processes Process 0xb68e1984 priority 50 0xff9a6490 M WeakArray class>finalizationProcess 0xb4d93790: a(n) WeakArray class 0xb69beb68 s [] in WeakArray class>restartFinalizationProcess 0xb68e1924 s [] in BlockClosure>newProcess Process 0xb5ced038 priority 80 0xff9af490 M DelayMicrosecondScheduler>runTimerEventLoop 0xb5bb6f9c: a(n) DelayMicrosecondScheduler 0xb6098314 s [] in DelayMicrosecondScheduler>startTimerEventLoop 0xb5cecfd8 s [] in BlockClosure>newProcess Process 0xb68ec880 priority 40 0xff9b2478 M [] in UnixOSProcessAccessor>(nil) 0xb60dc6d0: a(n) UnixOSProcessAccessor 0xff9b2490 M BlockClosure>repeat 0xb68ef2d4: a(n) BlockClosure 0xb68ef278 s [] in UnixOSProcessAccessor>(nil) 0xb68ec820 s [] in BlockClosure>newProcess Process 0xb6d92d78 priority 60 0xff98742c M InputEventFetcher>waitForInput 0xb5a09718: a(n) InputEventFetcher 0xff987450 M InputEventFetcher>eventLoop 0xb5a09718: a(n) InputEventFetcher 0xff987470 I [] in InputEventFetcher>installEventLoop 0xb5a09718: a(n) InputEventFetcher 0xff987490 I [] in BlockClosure>newProcess 0xb6d92c9c: a(n) BlockClosure Process 0xb6f25f94 priority 60 0xb6f25fcc s SmalltalkImage>lowSpaceWatcher 0xb71523e4 s [] in SmalltalkImage>installLowSpaceWatcher 0xb6f25f34 s [] in BlockClosure>newProcess Process 0xb73a4e7c priority 30 0xff99b470 M [] in AioEventHandler>handleExceptions:readEvents:writeEvents: 0xb73a49e4: a(n) AioEventHandler 0xff99b490 I [] in BlockClosure>newProcess 0xb73a4d90: a(n) BlockClosure Process 0xb6686c88 priority 40 0xffa073d0 M [] in Delay>wait 0xb73a63fc: a(n) Delay 0xffa073f0 M BlockClosure>ifCurtailed: 0xb73a6614: a(n) BlockClosure 0xffa0740c M Delay>wait 0xb73a63fc: a(n) Delay 0xffa07428 M PipeableOSProcess(PipeJunction)>outputOn: 0xb73a0d34: a(n) PipeableOSProcess 0xffa07444 M PipeableOSProcess(PipeJunction)>output 0xb73a0d34: a(n) PipeableOSProcess 0xffa0746c M [] in MCFileTreeGitRepository class>runOSProcessGitCommand:in: 0xb611fa88: a(n) MCFileTreeGitRepository class 0xffa0748c M BlockClosure>ensure: 0xb739d9dc: a(n) BlockClosure 0xff9e538c M MCFileTreeGitRepository class>runOSProcessGitCommand:in: 0xb611fa88: a(n) MCFileTreeGitRepository class 0xff9e53ac M MCFileTreeGitRepository class>runGitCommand:in: 0xb611fa88: a(n) MCFileTreeGitRepository class 0xff9e53cc M MCFileTreeGitRepository>gitCommand:in: 0xb612926c: a(n) MCFileTreeGitRepository 0xff9e53f4 M MCFileTreeGitRepository>gitVersionsForPackage: 0xb612926c: a(n) MCFileTreeGitRepository 0xff9e543c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: a(n) MCFileTreeGitRepository 0xff9e5458 M FileSystemDirectoryEntry(Object)>in: 0xb71b3fe8: a(n) FileSystemDirectoryEntry 0xff9e548c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: a(n) MCFileTreeGitRepository 0xffa04310 M BlockClosure>cull: 0xb71b4894: a(n) BlockClosure 0xffa04338 I [] in Job>run 0xb71b48b4: a(n) Job 0xffa04350 M BlockClosure>on:do: 0xb71b56b8: a(n) BlockClosure 0xffa0437c I [] in Job>run 0xb71b48b4: a(n) Job 0xffa0439c M BlockClosure>ensure: 0xb71b4980: a(n) BlockClosure 0xffa043c4 I Job>run 0xb71b48b4: a(n) Job 0xffa043e4 I MorphicUIManager(UIManager)>displayProgress:from:to:during: 0xb50a8790: a(n) MorphicUIManager 0xffa04414 I ByteString(String)>displayProgressFrom:to:during: 0xb61238d8: a(n) ByteString 0xffa04444 M MCFileTreeGitRepository>loadAllFileNames 0xb612926c: a(n) MCFileTreeGitRepository 0xffa04464 I MCFileTreeGitRepository>allFileNames 0xb612926c: a(n) MCFileTreeGitRepository 0xffa0448c M MCFileTreeGitRepository>goferVersionFrom: 0xb612926c: a(n) MCFileTreeGitRepository 0xff9e238c I MetacelloCachingGoferResolvedReference(GoferResolvedReference)>version 0xb71b3134: a(n) MetacelloCachingGoferResolvedReference 0xff9e23a4 M MetacelloCachingGoferResolvedReference>version 0xb71b3134: a(n) MetacelloCachingGoferResolvedReference 0xff9e23bc M [] in MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9e23e0 M OrderedCollection>do: 0xb71b3234: a(n) OrderedCollection 0xff9e240c M [] in MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9e2424 M BlockClosure>on:do: 0xb71b3334: a(n) BlockClosure 0xff9e244c M MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9e2490 M [] in MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9ae318 M MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: 0xb50e8b94: a(n) MetacelloPharo30Platform 0xff9ae338 M MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9ae358 M MetacelloPackageSpec>loadUsing:gofer: 0xb706be54: a(n) MetacelloPackageSpec 0xff9ae37c M [] in MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9ae3a0 M OrderedCollection>do: 0xb70c807c: a(n) OrderedCollection 0xff9ae3c0 M MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9ae3f0 I [] in MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xff9ae410 M BlockClosure>ensure: 0xb70c813c: a(n) BlockClosure 0xff9ae438 I MetacelloLoaderPolicy>pushLoadDirective:during: 0xb706cb7c: a(n) MetacelloLoaderPolicy 0xff9ae460 I MetacelloLoaderPolicy>pushLinearLoadDirectivesDuring:for: 0xb706cb7c: a(n) MetacelloLoaderPolicy 0xff9ae488 I MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader 0xb70c33c0 s MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>load 0xb706d898 s MetacelloMCVersionSpecLoader>load 0xb71948d0 s MetacelloMCVersion>executeLoadFromArray: 0xb719492c s [] in MetacelloMCVersion>fetchRequiredFromArray: 0xb7194988 s [] in MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: 0xb706d96c s BlockClosure>on:do: 0xb706d2d8 s MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: 0xb706d258 s [] in MetacelloMCVersion>fetchRequiredFromArray: 0xb71949e4 s BlockClosure>ensure: 0xb706d15c s [] in MetacelloMCVersion>fetchRequiredFromArray: 0xb706d1e4 s MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: 0xb706d0e4 s MetacelloMCVersion>fetchRequiredFromArray: 0xb706ccc0 s [] in MetacelloMCVersion>doLoadRequiredFromArray: 0xb715327c s BlockClosure>ensure: 0xb706cc34 s MetacelloMCVersion>doLoadRequiredFromArray: 0xb71532d8 s MetacelloMCVersion>load 0xb7153334 s UndefinedObject>(nil) 0xb7153390 s OpalCompiler>evaluate 0xb706ab30 s RubSmalltalkEditor>evaluate:andDo: 0xb706a7f4 s RubSmalltalkEditor>highlightEvaluateAndDo: 0xb7152edc s [] in GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: 0xb7152f38 s RubEditingArea(RubAbstractTextArea)>handleEdit: 0xb706a784 s [] in GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: 0xb7152f94 s WorldState>runStepMethodsIn: 0xb7152ff0 s WorldMorph>runStepMethods 0xb706a1cc s WorldState>doOneCycleNowFor: 0xb715304c s WorldState>doOneCycleFor: 0xb71530a8 s WorldMorph>doOneCycle 0xb6686f8c s [] in MorphicUIManager>spawnNewProcess 0xb6686c28 s [] in BlockClosure>newProcess Most recent primitives primCreatePipe new: at:put: at:put: basicNew basicNew: basicNew basicNew: primSQFileSetBlocking: basicNew: basicAt:put: basicNew: basicAt:put: at:put: basicNew primSigPipeNumber basicNew wait at:put: signal primForwardSignal:toSemaphore: wait at:put: signal primCreatePipe new: at:put: at:put: basicNew basicNew: basicNew basicNew: primSQFileSetNonBlocking: basicNew: basicAt:put: basicNew: basicAt:put: at:put: basicNew signal basicNew: basicAt:put: basicNew: basicAt:put: at:put: new: basicNew new: replaceFrom:to:with:startingAt: basicNew basicNew: primSQFileSetNonBlocking: basicNew stringHash:initialHash: primOSFileHandle: basicNew wait at:put: signal primAioEnable:forSemaphore:externalObject: basicNew objectAt: basicNew: stackp: basicNew primitiveResume wait wait signal wait signal primAioHandle:exceptionEvents:readEvents:writeEvents: signal basicNew: basicAt:put: primSQFileSetNonBlocking: basicNew: basicAt:put: basicNew: basicAt:put: at:put: basicNew basicNew wait signal primUTCMicrosecondsClock + >= + < primSignal:atUTCMicroseconds: wait signal wait wait relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: primUTCMicrosecondsClock >= signal + primSignal:atUTCMicroseconds: wait basicNew basicNew basicNew basicNew signal basicNew signal basicNew new: wait new: at:put: at:put: at:put: basicNew: at:put: basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: basicNew new: at:put: new: basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: at:put: basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: at:put: at:put: at:put: new: replaceFrom:to:with:startingAt: primSizeOfPointer new: at:put: at:put: at:put: primSizeOfPointer basicNew: basicNew at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: new: basicNew new: at:put: at:put: at:put: at:put: at:put: at:put: new: replaceFrom:to:with:startingAt: new: at:put: at:put: primGetCurrentWorkingDirectory basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: primForkExec:stdIn:stdOut:stdErr:argBuf:argOffsets:envBuf:envOffsets:workingDir: primGetPid primGetPid primGetPid basicNew basicNew wait at:put: signal wait shallowCopy new: replaceFrom:to:with:startingAt: signal wait replaceFrom:to:with:startingAt: at:put: signal primCloseNoError: primCloseNoError: primCloseNoError: signal basicNew: basicNew basicNew basicNew wait signal primUTCMicrosecondsClock + >= + < primSignal:atUTCMicroseconds: wait signal wait relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: basicNew: primRead:into:startingAt:count: basicNew signal wait basicNew: basicNew basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: signal basicNew signal basicNew new: wait signal wait signal primAioHandle:exceptionEvents:readEvents:writeEvents: signal wait relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: stack page bytes 4096 available headroom 3300 minimum unused headroom 2152 (SIGUSR1) 2015-06-03 14:15 GMT+02:00 David T. Lewis <[hidden email]>: On Wed, Jun 03, 2015 at 07:05:15AM +0200, Thierry Goubier wrote: |
In reply to this post by Jose San Leandro
On Tue, Jun 2, 2015 at 5:10 PM, Jose San Leandro
<[hidden email]> wrote: > Hi, > > In one of our projects we are using Pharo4. The image gets built by gradle, > which loads the Metacello project. Sometimes, we see the build process > hangs. It just don't progress. > > When adding local gitfiletree:// dependencies manually through Monticello > after a while Pharo gets frozen. It's not always the same repository, it's > not always the same number of repositories before it hangs. > > I launched the image with strace, and attached gdb to the frozen process. > It turns out It's waiting for a lock that gets never released. > Perhaps try each of the experimental delay schedulers under World > System > Settings > System > Delay Scheduler. I no reason to think this will help, its easy to try (as a shotgun approach to troubleshooting). cheers -ben |
In reply to this post by Jose San Leandro
On Wed, Jun 03, 2015 at 05:03:18PM +0200, Jose San Leandro wrote:
> Unfortunately it doesn't fix it, or at least I get the same sympthoms. Thanks for trying it. Sorry it did not help :-/ Dave > > Sending SIGUSR1 prints this: > > SIGUSR1 Wed Jun 3 16:53:50 2015 > > > /home/chous/toolbox/pharo-4.0/pharo-vm/pharo > pharo VM version: 3.9-7 #1 Thu Apr 2 00:51:45 CEST 2015 gcc 4.6.3 > [Production ITHB VM] > Built from: NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: > 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 > With: NBCogit NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: > 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 > Revision: https://github.com/pharo-project/pharo-vm.git Commit: > 32d18ba0f2db9bee7f3bdbf16bdb24fe4801cfc5 Date: 2015-03-24 11:08:14 +0100 > By: Esteban Lorenzano <[hidden email]> Jenkins build #14904 > Build host: Linux pharo-linux 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7 > 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux > plugin path: /home/chous/toolbox/pharo-4.0/pharo-vm/ [default: > /home/chous/toolbox/pharo-4.0/pharo-vm/] > > > C stack backtrace & registers: > eax 0xff981e94 ebx 0xff981db0 ecx 0xff981e48 edx 0xff981dfc > edi 0xff981c80 esi 0xff981c80 ebp 0xff981d18 esp 0xff981d64 > eip 0xff981f78 > *[0xff981f78] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a33a2] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a3649] > linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf773acc0] > /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x28)[0x809d8c8] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x810868c] > linux-gate.so.1(__kernel_sigreturn+0x0)[0xf773acb0] > /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x5e)[0x809d8fe] > /home/chous/toolbox/pharo/pharo-vm/pharo(aioPoll+0x22f)[0x809f0af] > /home/chous/toolbox/pharo-4.0/pharo-vm/vm-display-X11(+0xe671)[0xf772a671] > /home/chous/toolbox/pharo/pharo-vm/pharo(ioRelinquishProcessorForMicroseconds+0x17)[0x80a1887] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80767fa] > [0xb4a2fe0c] > [0xb4a2d700] > [0xb53b9382] > [0xb4a2d648] > [0x5b] > > > All Smalltalk process stacks (active first): > Process 0xb6d930c4 priority 10 > 0xff9ad450 M ProcessorScheduler class>idleProcess 0xb4d935c0: a(n) > ProcessorScheduler class > 0xff9ad470 I [] in ProcessorScheduler class>startUp 0xb4d935c0: a(n) > ProcessorScheduler class > 0xff9ad490 I [] in BlockClosure>newProcess 0xb6d92fe8: a(n) BlockClosure > > suspended processes > Process 0xb68e1984 priority 50 > 0xff9a6490 M WeakArray class>finalizationProcess 0xb4d93790: a(n) WeakArray > class > 0xb69beb68 s [] in WeakArray class>restartFinalizationProcess > 0xb68e1924 s [] in BlockClosure>newProcess > > Process 0xb5ced038 priority 80 > 0xff9af490 M DelayMicrosecondScheduler>runTimerEventLoop 0xb5bb6f9c: a(n) > DelayMicrosecondScheduler > 0xb6098314 s [] in DelayMicrosecondScheduler>startTimerEventLoop > 0xb5cecfd8 s [] in BlockClosure>newProcess > > Process 0xb68ec880 priority 40 > 0xff9b2478 M [] in UnixOSProcessAccessor>(nil) 0xb60dc6d0: a(n) > UnixOSProcessAccessor > 0xff9b2490 M BlockClosure>repeat 0xb68ef2d4: a(n) BlockClosure > 0xb68ef278 s [] in UnixOSProcessAccessor>(nil) > 0xb68ec820 s [] in BlockClosure>newProcess > > Process 0xb6d92d78 priority 60 > 0xff98742c M InputEventFetcher>waitForInput 0xb5a09718: a(n) > InputEventFetcher > 0xff987450 M InputEventFetcher>eventLoop 0xb5a09718: a(n) InputEventFetcher > 0xff987470 I [] in InputEventFetcher>installEventLoop 0xb5a09718: a(n) > InputEventFetcher > 0xff987490 I [] in BlockClosure>newProcess 0xb6d92c9c: a(n) BlockClosure > > Process 0xb6f25f94 priority 60 > 0xb6f25fcc s SmalltalkImage>lowSpaceWatcher > 0xb71523e4 s [] in SmalltalkImage>installLowSpaceWatcher > 0xb6f25f34 s [] in BlockClosure>newProcess > > Process 0xb73a4e7c priority 30 > 0xff99b470 M [] in AioEventHandler>handleExceptions:readEvents:writeEvents: > 0xb73a49e4: a(n) AioEventHandler > 0xff99b490 I [] in BlockClosure>newProcess 0xb73a4d90: a(n) BlockClosure > Process 0xb6686c88 priority 40 > 0xffa073d0 M [] in Delay>wait 0xb73a63fc: a(n) Delay > 0xffa073f0 M BlockClosure>ifCurtailed: 0xb73a6614: a(n) BlockClosure > 0xffa0740c M Delay>wait 0xb73a63fc: a(n) Delay > 0xffa07428 M PipeableOSProcess(PipeJunction)>outputOn: 0xb73a0d34: a(n) > PipeableOSProcess > 0xffa07444 M PipeableOSProcess(PipeJunction)>output 0xb73a0d34: a(n) > PipeableOSProcess > 0xffa0746c M [] in MCFileTreeGitRepository class>runOSProcessGitCommand:in: > 0xb611fa88: a(n) MCFileTreeGitRepository class > 0xffa0748c M BlockClosure>ensure: 0xb739d9dc: a(n) BlockClosure > 0xff9e538c M MCFileTreeGitRepository class>runOSProcessGitCommand:in: > 0xb611fa88: a(n) MCFileTreeGitRepository class > 0xff9e53ac M MCFileTreeGitRepository class>runGitCommand:in: 0xb611fa88: > a(n) MCFileTreeGitRepository class > 0xff9e53cc M MCFileTreeGitRepository>gitCommand:in: 0xb612926c: a(n) > MCFileTreeGitRepository > 0xff9e53f4 M MCFileTreeGitRepository>gitVersionsForPackage: 0xb612926c: > a(n) MCFileTreeGitRepository > 0xff9e543c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: > a(n) MCFileTreeGitRepository > 0xff9e5458 M FileSystemDirectoryEntry(Object)>in: 0xb71b3fe8: a(n) > FileSystemDirectoryEntry > 0xff9e548c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: > a(n) MCFileTreeGitRepository > 0xffa04310 M BlockClosure>cull: 0xb71b4894: a(n) BlockClosure > 0xffa04338 I [] in Job>run 0xb71b48b4: a(n) Job > 0xffa04350 M BlockClosure>on:do: 0xb71b56b8: a(n) BlockClosure > 0xffa0437c I [] in Job>run 0xb71b48b4: a(n) Job > 0xffa0439c M BlockClosure>ensure: 0xb71b4980: a(n) BlockClosure > 0xffa043c4 I Job>run 0xb71b48b4: a(n) Job > 0xffa043e4 I MorphicUIManager(UIManager)>displayProgress:from:to:during: > 0xb50a8790: a(n) MorphicUIManager > 0xffa04414 I ByteString(String)>displayProgressFrom:to:during: 0xb61238d8: > a(n) ByteString > 0xffa04444 M MCFileTreeGitRepository>loadAllFileNames 0xb612926c: a(n) > MCFileTreeGitRepository > 0xffa04464 I MCFileTreeGitRepository>allFileNames 0xb612926c: a(n) > MCFileTreeGitRepository > 0xffa0448c M MCFileTreeGitRepository>goferVersionFrom: 0xb612926c: a(n) > MCFileTreeGitRepository > 0xff9e238c I > MetacelloCachingGoferResolvedReference(GoferResolvedReference)>version > 0xb71b3134: a(n) MetacelloCachingGoferResolvedReference > 0xff9e23a4 M MetacelloCachingGoferResolvedReference>version 0xb71b3134: > a(n) MetacelloCachingGoferResolvedReference > 0xff9e23bc M [] in > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e23e0 M OrderedCollection>do: 0xb71b3234: a(n) OrderedCollection > 0xff9e240c M [] in > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e2424 M BlockClosure>on:do: 0xb71b3334: a(n) BlockClosure > 0xff9e244c M > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e2490 M [] in > MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: 0xb706d83c: a(n) > MetacelloFetchingMCSpecLoader > 0xff9ae318 M MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: > 0xb50e8b94: a(n) MetacelloPharo30Platform > 0xff9ae338 M MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae358 M MetacelloPackageSpec>loadUsing:gofer: 0xb706be54: a(n) > MetacelloPackageSpec > 0xff9ae37c M [] in > MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae3a0 M OrderedCollection>do: 0xb70c807c: a(n) OrderedCollection > 0xff9ae3c0 M > MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae3f0 I [] in > MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae410 M BlockClosure>ensure: 0xb70c813c: a(n) BlockClosure > 0xff9ae438 I MetacelloLoaderPolicy>pushLoadDirective:during: 0xb706cb7c: > a(n) MetacelloLoaderPolicy > 0xff9ae460 I MetacelloLoaderPolicy>pushLinearLoadDirectivesDuring:for: > 0xb706cb7c: a(n) MetacelloLoaderPolicy > 0xff9ae488 I > MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xb70c33c0 s MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>load > 0xb706d898 s MetacelloMCVersionSpecLoader>load > 0xb71948d0 s MetacelloMCVersion>executeLoadFromArray: > 0xb719492c s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb7194988 s [] in > MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: > 0xb706d96c s BlockClosure>on:do: > 0xb706d2d8 s > MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: > 0xb706d258 s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb71949e4 s BlockClosure>ensure: > 0xb706d15c s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb706d1e4 s MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: > 0xb706d0e4 s MetacelloMCVersion>fetchRequiredFromArray: > 0xb706ccc0 s [] in MetacelloMCVersion>doLoadRequiredFromArray: > 0xb715327c s BlockClosure>ensure: > 0xb706cc34 s MetacelloMCVersion>doLoadRequiredFromArray: > 0xb71532d8 s MetacelloMCVersion>load > 0xb7153334 s UndefinedObject>(nil) > 0xb7153390 s OpalCompiler>evaluate > 0xb706ab30 s RubSmalltalkEditor>evaluate:andDo: > 0xb706a7f4 s RubSmalltalkEditor>highlightEvaluateAndDo: > 0xb7152edc s [] in > GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: > 0xb7152f38 s RubEditingArea(RubAbstractTextArea)>handleEdit: > 0xb706a784 s [] in > GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: > 0xb7152f94 s WorldState>runStepMethodsIn: > 0xb7152ff0 s WorldMorph>runStepMethods > 0xb706a1cc s WorldState>doOneCycleNowFor: > 0xb715304c s WorldState>doOneCycleFor: > 0xb71530a8 s WorldMorph>doOneCycle > 0xb6686f8c s [] in MorphicUIManager>spawnNewProcess > 0xb6686c28 s [] in BlockClosure>newProcess > > Most recent primitives > primCreatePipe > new: > at:put: > at:put: > basicNew > basicNew: > basicNew > basicNew: > primSQFileSetBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > primSigPipeNumber > basicNew > wait > at:put: > signal > primForwardSignal:toSemaphore: > wait > at:put: > signal > primCreatePipe > new: > at:put: > at:put: > basicNew > basicNew: > basicNew > basicNew: > primSQFileSetNonBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > signal > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > new: > basicNew > new: > replaceFrom:to:with:startingAt: > basicNew > basicNew: > primSQFileSetNonBlocking: > basicNew > stringHash:initialHash: > primOSFileHandle: > basicNew > wait > at:put: > signal > primAioEnable:forSemaphore:externalObject: > basicNew > objectAt: > basicNew: > stackp: > basicNew > primitiveResume > wait > wait > signal > wait > signal > primAioHandle:exceptionEvents:readEvents:writeEvents: > signal > basicNew: > basicAt:put: > primSQFileSetNonBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > basicNew > wait > signal > primUTCMicrosecondsClock > + > >= > + > < > primSignal:atUTCMicroseconds: > wait > signal > wait > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > primUTCMicrosecondsClock > >= > signal > + > primSignal:atUTCMicroseconds: > wait > basicNew > basicNew > basicNew > basicNew > signal > basicNew > signal > basicNew > new: > wait > new: > at:put: > at:put: > at:put: > basicNew: > at:put: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > basicNew > new: > at:put: > new: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > at:put: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > at:put: > at:put: > at:put: > new: > replaceFrom:to:with:startingAt: > primSizeOfPointer > new: > at:put: > at:put: > at:put: > primSizeOfPointer > basicNew: > basicNew > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > new: > basicNew > new: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > new: > replaceFrom:to:with:startingAt: > new: > at:put: > at:put: > primGetCurrentWorkingDirectory > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > primForkExec:stdIn:stdOut:stdErr:argBuf:argOffsets:envBuf:envOffsets:workingDir: > primGetPid > primGetPid > primGetPid > basicNew > basicNew > wait > at:put: > signal > wait > shallowCopy > new: > replaceFrom:to:with:startingAt: > signal > wait > replaceFrom:to:with:startingAt: > at:put: > signal > primCloseNoError: > primCloseNoError: > primCloseNoError: > signal > basicNew: > basicNew > basicNew > basicNew > wait > signal > primUTCMicrosecondsClock > + > >= > + > < > primSignal:atUTCMicroseconds: > wait > signal > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > basicNew: > primRead:into:startingAt:count: > basicNew > signal > wait > basicNew: > basicNew > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > signal > basicNew > signal > basicNew > new: > wait > signal > wait > signal > primAioHandle:exceptionEvents:readEvents:writeEvents: > signal > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > > stack page bytes 4096 available headroom 3300 minimum unused headroom 2152 > > (SIGUSR1) > > > 2015-06-03 14:15 GMT+02:00 David T. Lewis <[hidden email]>: > > > On Wed, Jun 03, 2015 at 07:05:15AM +0200, Thierry Goubier wrote: > > > Hi Dave, > > > > > > Le 03/06/2015 03:15, David T. Lewis a ?crit : > > > >Hi Thierry and Jose, > > > > > > > >I am reading this thread with interest and will help if I can. > > > > > > > >I do have one idea that we have not tried before. I have a theory that > > > >this may > > > >be an intermittent problem caused by SIGCHLD signals (from the external > > OS > > > >process > > > >when it exits) being missed by the > > UnixOSProcessAccessor>>grimReaperProcess > > > >that handles them. > > > > > > > >If this is happening, then I may be able to change grimReaperProcess to > > > >work around the problem. > > > > > > > >When you see the OS deadlock condition, are you able tell if your Pharo > > VM > > > >process has subprocesses in the zombie state (indicating that > > > >grimReaperProcess > > > >did not clean them up)? The unix command "ps -axf | less" will let you > > look > > > >at the process tree and that may give us a clue if this is happening. > > > > > > I found it very easy to reproduce and I do have a zombie children > > > process to the pharo process. > > > > Jose confirms this also (thanks). > > > > Can you try filing in the attached UnixOSProcessAccessor>>grimReaperProcess > > and see if it helps? I do not know if it will make a difference, but the > > idea is to put a timeout on the semaphore that is waiting for signals from > > SIGCHLD. I am hoping that if these signals are sometimes being missed, then > > the timeout will allow the process to recover from the problem. > > > > > > > > > > Interesting enough, the lock-up happens in a very specific place, a call > > > to git branch, which is a very short command returning just a few > > > characters (where all other commands have longuer outputs). Reducing the > > > frequency of the calls to git branch by a bit of caching reduces the > > > chances of a lock-up. > > > > > > > This is a good clue, and it may indicate a different kind of problem (so > > maybe I am looking in the wrong place). Ben's suggestion of adding a delay > > to the external process sounds like a good idea to help troubleshoot it. > > > > Dave > > > > > > |
In reply to this post by Jose San Leandro
Hi Jose,
I have pushed a new version of GitFileTree (the development version for Pharo4) with a complete rewrite of the underlying OSProcess use. Could you test to see if it solves your deadlocks? It should also be a tad faster. Regards, Thierry Le 03/06/2015 17:03, Jose San Leandro a écrit : > Unfortunately it doesn't fix it, or at least I get the same sympthoms. > |
Hi, So far it works perfect. I'll let you know if it happens again.2015-06-11 23:28 GMT+02:00 Thierry Goubier <[hidden email]>: Hi Jose, |
2015-06-18 10:32 GMT+02:00 Jose San Leandro <[hidden email]>:
Thanks.
You're welcome. Just a question: which version of the vm are you using? Or which zeroconf scripts are you using to download Pharo? I made some changes related to OSProcess in the latest vm (and they have been integrated); if, say, you're using the normal Pharo4 vm, then it would mean that your problem was solved by changing the way GitFileTree uses OSProcess. Thierry
|
Free forum by Nabble | Edit this page |