Hi All, responding to Andrew here because this is generally of interest to the vm-list.
On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <[hidden email]> wrote: Hmmm. Thanks for the advice -- we now build with -O3, and all's well. So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3). If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after. I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be. Some time some of us (me included) could really do to put the effort into understanding what the issue is. It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour. I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x).
best, Eliot |
2011/9/26 Eliot Miranda <[hidden email]>: > > Hi All, > responding to Andrew here because this is generally of interest to the vm-list. > > On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <[hidden email]> wrote: >> >> Hmmm. Thanks for the advice -- we now build with -O3, and all's well. >> I've run the VM at full load (mostly compiling) for 30 hours without a >> hiccup. Interesting that -O2 is problematic, but -O3 isn't; I assumed >> that higher optimisations would make things less stable, not more so. >> And we get a 17% speed increase. >> >> My GCC is: >> $ gcc --version >> gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3 > > So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3). If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after. I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be. Some time some of us (me included) could really do to put the effort into understanding what the issue is. It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour. I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x). > Again some nasty kind of undefined shifts on signed/unsigned ints? Or a macro expansion leading to subtle subexpression ordering (++i/++i) ? Or one of the many dark zones here: http://www.vmunix.com/~gabor/c/draft.html#601 Being able to enumerate such a long list without any omission is already something! What a beautiful language ! Nicolas >> >> - Andrew >> >> On 2011.09.25 23:12:50 -0700, Eliot Miranda <[hidden email]> wrote: >> > On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard <[hidden email]> wrote: >> > >> > > Actually, it looks like I was wrong. After rebuiding everything from >> > > scratch, I've been unable to reproduce these crashes, except for the >> > > one with unix-4.4.7.image. >> > > >> > > Sorry for the false alarm. r2495 looks pretty good, at both -O0 and >> > > -O1. It still crashes at -O2, but that's not a huge concern. >> > > >> > >> > Which gcc are you using? Here at Cadence on a much older 32-bit machine >> > using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :) >> > >> > >> > > >> > > >> > > On 2011.09.24 08:07:47 +0200, Andrew Gaylard <[hidden email]> wrote: >> > > > On 2011.09.23 13:26:06 -0700, Eliot Miranda <[hidden email]> >> > > wrote: >> > > > > Thank you, Andrew, you nailed it. I've found the bug via your stack >> > > trace >> > > > > below. Huge relief. Thanks! New VMs and explanation to the list >> > > soon. >> > > > >> > > > Alas, we spoke too soon. -2495 exhibits the same symptoms; traces and >> > > > gdb transcripts are attached. >> > > > >> > > > - vm-*-2495.0.txt are from our basic.image, running the test-runner. >> > > > - vm-*-2495.1.txt are from Squeak4.2-10966.image, running the >> > > test-runner. >> > > > - vm-*-2495.2.txt are from unix-4.4.7.image, having just started up the >> > > VM. >> > > > >> > > > The first two of these appear to be the same problem I encountered >> > > > with -2493. The backtraces certainly look very similar. >> > > > >> > > > The third one is rather different. Looking at the stack trace, the >> > > > 'rcvr' variable in ceSendsupertonumArgs is 17039140, which is >> > > > de-referenced in line 10733, causing a SEGV; the handler duly confirms >> > > > the faulting address as si_addr = 0x103ff24: >> > > > >> > > > $ perl -e 'print 0x103ff24' >> > > > 17039140 > > > > -- > best, > Eliot > > |
Hi: On 27 Sep 2011, at 08:45, Nicolas Cellier wrote: > Again some nasty kind of undefined shifts on signed/unsigned ints? > Or a macro expansion leading to subtle subexpression ordering (++i/++i) ? > Or one of the many dark zones here: > http://www.vmunix.com/~gabor/c/draft.html#601 > > Being able to enumerate such a long list without any omission is > already something! > What a beautiful language ! Kind of similar to the problems previous mentioned in this thread, we got a optimization bug in the RoarVM codebase too. It is biting me with an infinite loop when I use a GCC >4.2 or Intel compiler and enable optimization >O1 on two specific files. Still, I think the bug is somewhere completely else. I tried to go through the files and disable optimization for particular functions, without useful result... It is jumping. Perhaps there are multiple places where we use unspecified code. Are there any tools that could be useful to find such things? I tried the Clang static analyzer, but without much success. It also gives a lot of warnings about potentially uninitialized variables in the primitives. It does not like the primitiveFail/successFlag checks at all. Best regards Stefan -- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525 |
On Tue, Sep 27, 2011 at 12:05 AM, Stefan Marr <[hidden email]> wrote:
The only way I know to do this quickly is to get a reproducible case that runs to failure from start-up without user intervention and run the two VMs side-by-side on that case. If the bug shows itself when tracing is turned on then it can be relatively easy to find the point at which the two diverge and backtrack from there.
I tried the Clang static analyzer, but without much success. :)
best, Eliot |
Free forum by Nabble | Edit this page |