linux build stability

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

linux build stability

Eliot Miranda-2
 
Hi All,

     you may already know that there have been strange stability problems with the Cog VM on linux.  Problems with the heartbeat appear to derive from specific compilations, one compilation of the same source producing an executable that will crash, another producing one that won't.  recent testing at Teleplace showed that an effect due to what was presumed to be a compiler bug (specifically the optimization level used to compile the heartbeat, high causing a crash) was not repeatable.  So today in building new production VMs for Teleplace I decided to do three parallel linux builds and see if all produced the same results.  While there are macros used in the source that are date dependent (use of __DATE__) AFAIA there are none apart from version.c/version.o that depend on time, and no timestamps or current directory paths in linux objects, and so, provided different compilations of the same source are done on the same day, the results should be bit-identical.  In my experiment this turns out not to be the case, which is more than a little alarming.

What I'm seeing is different results duplicating unixbuild/bld to unixbuild/bldb and unixbuild/bldc, doing identical configures and makes in each of the three directories and then comparing resulting objects.  I see this in a bare metal laptop with local sources running CERN SLC5 and on a Parallels VM running CentOS 5.3 (both derived from RHEL).  I'm using gcc 4.1.2.  Here's a script that shows example differences:

bld$ for f in *.o vm/*.o; do echo $f;cmp $f ../bldb/$f; cmp $f ../bldc/$f; done
disabledPlugins.o
disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 200, line 4
disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 200, line 4
version.o
version.o ../bldb/version.o differ: byte 166, line 3
version.o ../bldc/version.o differ: byte 166, line 3
vm/aio.o
vm/cogit.o
vm/debug.o
vm/gcc3x-cointerp.o
vm/osExports.o
vm/sqExternalSemaphores.o
vm/sqHeapMap.o
vm/sqLinuxHeartbeat.o
vm/sqLinuxWatchdog.o
vm/sqLinuxWatchdog.o ../bldb/vm/sqLinuxWatchdog.o differ: byte 33, line 1
vm/sqLinuxWatchdog.o ../bldc/vm/sqLinuxWatchdog.o differ: byte 33, line 1
vm/sqNamedPrims.o
vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 6346, line 30
vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 6346, line 30
vm/sqTicker.o
vm/sqUnixCharConv.o
vm/sqUnixExternalPrims.o
vm/sqUnixMain.o
vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 31415, line 170
vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 31414, line 170
vm/sqUnixMemory.o
vm/sqUnixThreads.o
vm/sqUnixVMProfile.o
vm/sqVirtualMachine.o

Using objdump --disassemble I can see for example that sqLinuxWatchdog.o and sqUnixMain.o differ only in the symbol table, not the executable code.  So perhaps this is not meaningful, and merely noise.  But with simple files like disabledPlugins.c that different objects are produced at all in different runs is rather worrying:

bld$ cat disabledPlugins.c
/* this should be in a header file, but it isn't.  ho hum. */
typedef struct {
  char *pluginName;
  char *primitiveName;
  void *primitiveAddress;
} sqExport;
sqExport vm_display_Quartz_exports[] = { 0, 0, 0 };
sqExport vm_display_custom_exports[] = { 0, 0, 0 };
sqExport vm_display_fbdev_exports[] = { 0, 0, 0 };
sqExport vm_sound_MacOSX_exports[] = { 0, 0, 0 };
sqExport vm_sound_NAS_exports[] = { 0, 0, 0 };
sqExport vm_sound_OSS_exports[] = { 0, 0, 0 };
sqExport vm_sound_Sun_exports[] = { 0, 0, 0 };
sqExport vm_sound_custom_exports[] = { 0, 0, 0 };


I wonder
- do you see the same effect?
- does this happen with gcc versions other than 4.1.2?
- does it happen on non-RHEL-derived distros?
- is this a meaningful signal or just harmless noise?
- what am I doing wrong?

Clearly I need to look more carefully but I thought I'd ask y'all in order to understand and hopefully solve the build instabilities as swiftly as possible.

If you do want to try and reproduce this simply duplicate the build directory (unixbuild/bld in the Cog VM source) twice and do three separate configures and makes, one in each of the build directories, each from the same source code.  Then run some variation fo the script above to compare the object files so produced.

best
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

David T. Lewis
 
On Tue, Feb 01, 2011 at 10:17:35PM -0800, Eliot Miranda wrote:
>
> I wonder
> - do you see the same effect?
> - does this happen with gcc versions other than 4.1.2?
> - does it happen on non-RHEL-derived distros?

Yes, yes, and yes. I get similar results reproducing your tests on
SuSE 64, gcc 4.5.0 (see below).

> - is this a meaningful signal or just harmless noise?

Given that the following are all identical I would somewhat favor
the harmless noise theory:
  vm/sqTicker.o
  vm/gcc3x-cointerp.o
  vm/sqTicker.o
  vm/sqUnixMemory.o
  vm/sqUnixThreads.o
  vm/sqUnixVMProfile.o

Test results on my system:

$ uname -a
Linux linux-jh8m 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/version
Linux version 2.6.34.7-0.7-desktop (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
$  gcc --version
gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The configuration is:
../../platforms/unix/config/configure CC="gcc -m32" CFLAGS="-g -O2 -msse2 -D_GNU_SOURCE -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LDFLAGS="-m32" LIBS=-lpthread

I manually edited ALSA out of the Makefiles (libtool problem, probably missing the 32-bit libraries).

Generated new sources from using image/VMMaker-Squeak4.1.image (from SVN).

Built three times in blda, bldb, bldc with the same configure/make.

Ran script from blda:
for f in *.o vm/*.o
do
  echo $f
  cmp $f ../bldb/$f
  cmp $f ../bldc/$f
done

Results:
disabledPlugins.o
disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
version.o
version.o ../bldb/version.o differ: byte 158, line 2
version.o ../bldc/version.o differ: byte 158, line 2
vm/aio.o
vm/cogit.o
vm/debug.o
vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
vm/gcc3x-cointerp.o
vm/osExports.o
vm/sqExternalSemaphores.o
vm/sqHeapMap.o
vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
vm/sqNamedPrims.o
vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
vm/sqTicker.o
vm/sqUnixCharConv.o
vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
vm/sqUnixExternalPrims.o
vm/sqUnixHeartbeat.o
vm/sqUnixMain.o
vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
vm/sqUnixMemory.o
vm/sqUnixThreads.o
vm/sqUnixVMProfile.o
vm/sqVirtualMachine.o

Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

David T. Lewis
 
For what it's worth, some but not all of the differences go away
if the files are stripped to remove symbol tables. The remaining
differences seem to be in version.o and sqUnixMain.o.

HTH,
- Dave (enjoying a day off work with a foot of beautiful fresh new
  snow here in Michigan, a pot of good coffee, birds swarming around
  the bird feeder, and light snowflakes drifting horizontally past
  the window)

Comparing files after configure/make:

squeak
squeak ../bldb/squeak differ: byte 845880, line 1305
squeak ../bldc/squeak differ: byte 845879, line 1305
disabledPlugins.o
disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
version.o
version.o ../bldb/version.o differ: byte 158, line 2
version.o ../bldc/version.o differ: byte 158, line 2
vm/debug.o
vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
vm/sqHeapMap.o
vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
vm/sqNamedPrims.o
vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
vm/sqUnixCharConv.o
vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
vm/sqUnixMain.o
vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46

After stripping the object files and squeak executable:

squeak
squeak ../bldb/squeak differ: byte 845880, line 1305
squeak ../bldc/squeak differ: byte 845879, line 1305
disabledPlugins.o
version.o
version.o ../bldb/version.o differ: byte 84, line 2
version.o ../bldc/version.o differ: byte 83, line 2
vm/debug.o
vm/sqHeapMap.o
vm/sqNamedPrims.o
vm/sqUnixCharConv.o
vm/sqUnixMain.o
vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 13999, line 54
vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 13998, line 54



On Wed, Feb 02, 2011 at 12:40:14PM -0500, David T. Lewis wrote:

> On Tue, Feb 01, 2011 at 10:17:35PM -0800, Eliot Miranda wrote:
> >
> > I wonder
> > - do you see the same effect?
> > - does this happen with gcc versions other than 4.1.2?
> > - does it happen on non-RHEL-derived distros?
>
> Yes, yes, and yes. I get similar results reproducing your tests on
> SuSE 64, gcc 4.5.0 (see below).
>
> > - is this a meaningful signal or just harmless noise?
>
> Given that the following are all identical I would somewhat favor
> the harmless noise theory:
>   vm/sqTicker.o
>   vm/gcc3x-cointerp.o
>   vm/sqTicker.o
>   vm/sqUnixMemory.o
>   vm/sqUnixThreads.o
>   vm/sqUnixVMProfile.o
>
> Test results on my system:
>
> $ uname -a
> Linux linux-jh8m 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux
> $ cat /proc/version
> Linux version 2.6.34.7-0.7-desktop (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
> $  gcc --version
> gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]
> Copyright (C) 2010 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> The configuration is:
> ../../platforms/unix/config/configure CC="gcc -m32" CFLAGS="-g -O2 -msse2 -D_GNU_SOURCE -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LDFLAGS="-m32" LIBS=-lpthread
>
> I manually edited ALSA out of the Makefiles (libtool problem, probably missing the 32-bit libraries).
>
> Generated new sources from using image/VMMaker-Squeak4.1.image (from SVN).
>
> Built three times in blda, bldb, bldc with the same configure/make.
>
> Ran script from blda:
> for f in *.o vm/*.o
> do
>   echo $f
>   cmp $f ../bldb/$f
>   cmp $f ../bldc/$f
> done
>
> Results:
> disabledPlugins.o
> disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
> disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
> version.o
> version.o ../bldb/version.o differ: byte 158, line 2
> version.o ../bldc/version.o differ: byte 158, line 2
> vm/aio.o
> vm/cogit.o
> vm/debug.o
> vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
> vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
> vm/gcc3x-cointerp.o
> vm/osExports.o
> vm/sqExternalSemaphores.o
> vm/sqHeapMap.o
> vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
> vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
> vm/sqNamedPrims.o
> vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqTicker.o
> vm/sqUnixCharConv.o
> vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixExternalPrims.o
> vm/sqUnixHeartbeat.o
> vm/sqUnixMain.o
> vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
> vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
> vm/sqUnixMemory.o
> vm/sqUnixThreads.o
> vm/sqUnixVMProfile.o
> vm/sqVirtualMachine.o
Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

David T. Lewis
 
Sorry to keep replying to myself, but one more followup. In each case
for which differences exists after stripping the symbol table, the
difference is due to a date stamp string.

In the case of sqUnixMain.c, it is the VM_BUILD_STRING macro used in
getAttribute(). The version.c file contains also a date string. Both
of these date strings are created by configure. After accounting for
these and after stripping the the symbol tables, I see no remaining
differences in executable code.

I can't explain why the compiler might generated different symbol
tables, but given that the resulting executables end up being identical
except for the date string macros, this does not look like an issue
that would affect run time stability.

Dave

On Wed, Feb 02, 2011 at 02:04:06PM -0500, David T. Lewis wrote:

>  
> For what it's worth, some but not all of the differences go away
> if the files are stripped to remove symbol tables. The remaining
> differences seem to be in version.o and sqUnixMain.o.
>
> HTH,
> - Dave (enjoying a day off work with a foot of beautiful fresh new
>   snow here in Michigan, a pot of good coffee, birds swarming around
>   the bird feeder, and light snowflakes drifting horizontally past
>   the window)
>
> Comparing files after configure/make:
>
> squeak
> squeak ../bldb/squeak differ: byte 845880, line 1305
> squeak ../bldc/squeak differ: byte 845879, line 1305
> disabledPlugins.o
> disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
> disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
> version.o
> version.o ../bldb/version.o differ: byte 158, line 2
> version.o ../bldc/version.o differ: byte 158, line 2
> vm/debug.o
> vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
> vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
> vm/sqHeapMap.o
> vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
> vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
> vm/sqNamedPrims.o
> vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqUnixCharConv.o
> vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixMain.o
> vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
> vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
>
> After stripping the object files and squeak executable:
>
> squeak
> squeak ../bldb/squeak differ: byte 845880, line 1305
> squeak ../bldc/squeak differ: byte 845879, line 1305
> disabledPlugins.o
> version.o
> version.o ../bldb/version.o differ: byte 84, line 2
> version.o ../bldc/version.o differ: byte 83, line 2
> vm/debug.o
> vm/sqHeapMap.o
> vm/sqNamedPrims.o
> vm/sqUnixCharConv.o
> vm/sqUnixMain.o
> vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 13999, line 54
> vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 13998, line 54
>
>
>
> On Wed, Feb 02, 2011 at 12:40:14PM -0500, David T. Lewis wrote:
> > On Tue, Feb 01, 2011 at 10:17:35PM -0800, Eliot Miranda wrote:
> > >
> > > I wonder
> > > - do you see the same effect?
> > > - does this happen with gcc versions other than 4.1.2?
> > > - does it happen on non-RHEL-derived distros?
> >
> > Yes, yes, and yes. I get similar results reproducing your tests on
> > SuSE 64, gcc 4.5.0 (see below).
> >
> > > - is this a meaningful signal or just harmless noise?
> >
> > Given that the following are all identical I would somewhat favor
> > the harmless noise theory:
> >   vm/sqTicker.o
> >   vm/gcc3x-cointerp.o
> >   vm/sqTicker.o
> >   vm/sqUnixMemory.o
> >   vm/sqUnixThreads.o
> >   vm/sqUnixVMProfile.o
> >
> > Test results on my system:
> >
> > $ uname -a
> > Linux linux-jh8m 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux
> > $ cat /proc/version
> > Linux version 2.6.34.7-0.7-desktop (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
> > $  gcc --version
> > gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]
> > Copyright (C) 2010 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > The configuration is:
> > ../../platforms/unix/config/configure CC="gcc -m32" CFLAGS="-g -O2 -msse2 -D_GNU_SOURCE -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LDFLAGS="-m32" LIBS=-lpthread
> >
> > I manually edited ALSA out of the Makefiles (libtool problem, probably missing the 32-bit libraries).
> >
> > Generated new sources from using image/VMMaker-Squeak4.1.image (from SVN).
> >
> > Built three times in blda, bldb, bldc with the same configure/make.
> >
> > Ran script from blda:
> > for f in *.o vm/*.o
> > do
> >   echo $f
> >   cmp $f ../bldb/$f
> >   cmp $f ../bldc/$f
> > done
> >
> > Results:
> > disabledPlugins.o
> > disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
> > disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
> > version.o
> > version.o ../bldb/version.o differ: byte 158, line 2
> > version.o ../bldc/version.o differ: byte 158, line 2
> > vm/aio.o
> > vm/cogit.o
> > vm/debug.o
> > vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
> > vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
> > vm/gcc3x-cointerp.o
> > vm/osExports.o
> > vm/sqExternalSemaphores.o
> > vm/sqHeapMap.o
> > vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
> > vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
> > vm/sqNamedPrims.o
> > vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
> > vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
> > vm/sqTicker.o
> > vm/sqUnixCharConv.o
> > vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
> > vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
> > vm/sqUnixExternalPrims.o
> > vm/sqUnixHeartbeat.o
> > vm/sqUnixMain.o
> > vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
> > vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
> > vm/sqUnixMemory.o
> > vm/sqUnixThreads.o
> > vm/sqUnixVMProfile.o
> > vm/sqVirtualMachine.o
Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

Eliot Miranda-2
 


On Wed, Feb 2, 2011 at 11:39 AM, David T. Lewis <[hidden email]> wrote:
Sorry to keep replying to myself, but one more followup. In each case
for which differences exists after stripping the symbol table, the
difference is due to a date stamp string.

In the case of sqUnixMain.c, it is the VM_BUILD_STRING macro used in
getAttribute(). The version.c file contains also a date string. Both
of these date strings are created by configure. After accounting for
these and after stripping the the symbol tables, I see no remaining
differences in executable code.

I can't explain why the compiler might generated different symbol
tables, but given that the resulting executables end up being identical
except for the date string macros, this does not look like an issue
that would affect run time stability.

Agreed.  Looking more carefully this morning using -save-temps to save the post-processed .i and output .s I see that all differences are valid.  Thanks David!


Dave

On Wed, Feb 02, 2011 at 02:04:06PM -0500, David T. Lewis wrote:
>
> For what it's worth, some but not all of the differences go away
> if the files are stripped to remove symbol tables. The remaining
> differences seem to be in version.o and sqUnixMain.o.
>
> HTH,
> - Dave (enjoying a day off work with a foot of beautiful fresh new
>   snow here in Michigan, a pot of good coffee, birds swarming around
>   the bird feeder, and light snowflakes drifting horizontally past
>   the window)
>
> Comparing files after configure/make:
>
> squeak
> squeak ../bldb/squeak differ: byte 845880, line 1305
> squeak ../bldc/squeak differ: byte 845879, line 1305
> disabledPlugins.o
> disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
> disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
> version.o
> version.o ../bldb/version.o differ: byte 158, line 2
> version.o ../bldc/version.o differ: byte 158, line 2
> vm/debug.o
> vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
> vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
> vm/sqHeapMap.o
> vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
> vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
> vm/sqNamedPrims.o
> vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
> vm/sqUnixCharConv.o
> vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
> vm/sqUnixMain.o
> vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
> vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
>
> After stripping the object files and squeak executable:
>
> squeak
> squeak ../bldb/squeak differ: byte 845880, line 1305
> squeak ../bldc/squeak differ: byte 845879, line 1305
> disabledPlugins.o
> version.o
> version.o ../bldb/version.o differ: byte 84, line 2
> version.o ../bldc/version.o differ: byte 83, line 2
> vm/debug.o
> vm/sqHeapMap.o
> vm/sqNamedPrims.o
> vm/sqUnixCharConv.o
> vm/sqUnixMain.o
> vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 13999, line 54
> vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 13998, line 54
>
>
>
> On Wed, Feb 02, 2011 at 12:40:14PM -0500, David T. Lewis wrote:
> > On Tue, Feb 01, 2011 at 10:17:35PM -0800, Eliot Miranda wrote:
> > >
> > > I wonder
> > > - do you see the same effect?
> > > - does this happen with gcc versions other than 4.1.2?
> > > - does it happen on non-RHEL-derived distros?
> >
> > Yes, yes, and yes. I get similar results reproducing your tests on
> > SuSE 64, gcc 4.5.0 (see below).
> >
> > > - is this a meaningful signal or just harmless noise?
> >
> > Given that the following are all identical I would somewhat favor
> > the harmless noise theory:
> >   vm/sqTicker.o
> >   vm/gcc3x-cointerp.o
> >   vm/sqTicker.o
> >   vm/sqUnixMemory.o
> >   vm/sqUnixThreads.o
> >   vm/sqUnixVMProfile.o
> >
> > Test results on my system:
> >
> > $ uname -a
> > Linux linux-jh8m 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux
> > $ cat /proc/version
> > Linux version 2.6.34.7-0.7-desktop (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
> > $  gcc --version
> > gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]
> > Copyright (C) 2010 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > The configuration is:
> > ../../platforms/unix/config/configure CC="gcc -m32" CFLAGS="-g -O2 -msse2 -D_GNU_SOURCE -DNDEBUG -DITIMER_HEARTBEAT=1 -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=0" LDFLAGS="-m32" LIBS=-lpthread
> >
> > I manually edited ALSA out of the Makefiles (libtool problem, probably missing the 32-bit libraries).
> >
> > Generated new sources from using image/VMMaker-Squeak4.1.image (from SVN).
> >
> > Built three times in blda, bldb, bldc with the same configure/make.
> >
> > Ran script from blda:
> > for f in *.o vm/*.o
> > do
> >   echo $f
> >   cmp $f ../bldb/$f
> >   cmp $f ../bldc/$f
> > done
> >
> > Results:
> > disabledPlugins.o
> > disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 197, line 4
> > disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 197, line 4
> > version.o
> > version.o ../bldb/version.o differ: byte 158, line 2
> > version.o ../bldc/version.o differ: byte 158, line 2
> > vm/aio.o
> > vm/cogit.o
> > vm/debug.o
> > vm/debug.o ../bldb/vm/debug.o differ: byte 776, line 8
> > vm/debug.o ../bldc/vm/debug.o differ: byte 776, line 8
> > vm/gcc3x-cointerp.o
> > vm/osExports.o
> > vm/sqExternalSemaphores.o
> > vm/sqHeapMap.o
> > vm/sqHeapMap.o ../bldb/vm/sqHeapMap.o differ: byte 2400, line 7
> > vm/sqHeapMap.o ../bldc/vm/sqHeapMap.o differ: byte 803, line 5
> > vm/sqNamedPrims.o
> > vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 7343, line 21
> > vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 7343, line 21
> > vm/sqTicker.o
> > vm/sqUnixCharConv.o
> > vm/sqUnixCharConv.o ../bldb/vm/sqUnixCharConv.o differ: byte 2804, line 20
> > vm/sqUnixCharConv.o ../bldc/vm/sqUnixCharConv.o differ: byte 2804, line 20
> > vm/sqUnixExternalPrims.o
> > vm/sqUnixHeartbeat.o
> > vm/sqUnixMain.o
> > vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 12285, line 46
> > vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 12294, line 46
> > vm/sqUnixMemory.o
> > vm/sqUnixThreads.o
> > vm/sqUnixVMProfile.o
> > vm/sqVirtualMachine.o

Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

johnmci
 
I recall reading a paper back in the early 2000 era about Google's finding as they were the first folks to build out extremely large linux based server farms.
Their comment was even tho we ordered oh say 1000 machines from vendor X and they were technically "identical" and we ran the same bit identical software we found that there was always a few percent that behaved differently.  Due to issues with hard drive, memory, or the fact the vendor changed suppliers for some particular week, karma or cosmic rays?   

The uptake was even if two machines were the "same" and the software bit identical, behaviour might be different. Sometimes this was explainable, other times, Karma? 


On 2011-02-02, at 12:35 PM, Eliot Miranda wrote:



On Wed, Feb 2, 2011 at 11:39 AM, David T. Lewis <[hidden email]> wrote:
Sorry to keep replying to myself, but one more followup. In each case
for which differences exists after stripping the symbol table, the
difference is due to a date stamp string.

In the case of sqUnixMain.c, it is the VM_BUILD_STRING macro used in
getAttribute(). The version.c file contains also a date string. Both
of these date strings are created by configure. After accounting for
these and after stripping the the symbol tables, I see no remaining
differences in executable code.

I can't explain why the compiler might generated different symbol
tables, but given that the resulting executables end up being identical
except for the date string macros, this does not look like an issue
that would affect run time stability.

Agreed.  Looking more carefully this morning using -save-temps to save the post-processed .i and output .s I see that all differences are valid.  Thanks David!


--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

Igor Stasenko

On 2 February 2011 21:42, John M McIntosh
<[hidden email]> wrote:
>
> I recall reading a paper back in the early 2000 era about Google's finding as they were the first folks to build out extremely large linux based server farms.
> Their comment was even tho we ordered oh say 1000 machines from vendor X and they were technically "identical" and we ran the same bit identical software we found that there was always a few percent that behaved differently.  Due to issues with hard drive, memory, or the fact the vendor changed suppliers for some particular week, karma or cosmic rays?
> The uptake was even if two machines were the "same" and the software bit identical, behaviour might be different. Sometimes this was explainable, other times, Karma?
>

:)

I am still leaning more towards thinking that there are some unusual
bug in Cog code, rather than in compiler.

Or some concurrency issue, because hearbeat runs in separate thread
than vm thread.

> On 2011-02-02, at 12:35 PM, Eliot Miranda wrote:
>
>

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

David T. Lewis
 
>
> Or some concurrency issue, because hearbeat runs in separate thread
> than vm thread.

Hmmm... That's a thought. This is a complete shot in the dark, but the
following does not look thread safe to me:

  static void
  high_performance_tick_handler(int sig, struct siginfo *sig_info, void *context)
  {
  static int tickCheckInProgress;
 
          if (tickCheckInProgress) return;
 
          tickCheckInProgress = 1;
          checkHighPriorityTickees(ioUTCMicroseconds());
          tickCheckInProgress = 0;
  }

This should be using a mutex, no?

Dave

Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

Igor Stasenko

On 3 February 2011 00:07, David T. Lewis <[hidden email]> wrote:

>
>>
>> Or some concurrency issue, because hearbeat runs in separate thread
>> than vm thread.
>
> Hmmm... That's a thought. This is a complete shot in the dark, but the
> following does not look thread safe to me:
>
>  static void
>  high_performance_tick_handler(int sig, struct siginfo *sig_info, void *context)
>  {
>  static int tickCheckInProgress;
>
>          if (tickCheckInProgress) return;
>
>          tickCheckInProgress = 1;
>          checkHighPriorityTickees(ioUTCMicroseconds());
>          tickCheckInProgress = 0;
>  }
>
> This should be using a mutex, no?
>

well, at least make it volatile.
Also, i saw Eliot introduced CAS primitive, it could be used here as well.

> Dave
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: linux build stability

Eliot Miranda-2
In reply to this post by David T. Lewis
 


On Wed, Feb 2, 2011 at 3:07 PM, David T. Lewis <[hidden email]> wrote:
>
> Or some concurrency issue, because hearbeat runs in separate thread
> than vm thread.

Hmmm... That's a thought. This is a complete shot in the dark, but the
following does not look thread safe to me:

 static void
 high_performance_tick_handler(int sig, struct siginfo *sig_info, void *context)
 {
 static int tickCheckInProgress;

         if (tickCheckInProgress) return;

         tickCheckInProgress = 1;
         checkHighPriorityTickees(ioUTCMicroseconds());
         tickCheckInProgress = 0;
 }

This should be using a mutex, no?

Indeed it doesn't look thread-safe, and it isn't, but it doesn't need to be.  It is a signal handler whose sigaction does not contain SA_NODEFER and so it cannot be reentered since the OS guarantees not to deliver the signal until any previous instance of the handler has returned.  Hence CAS is unnecessary, but a good comment isn't :)  I'll try and comment it here rather than where the sigaction is established:

    ticker_handler_action.sa_sigaction = high_performance_tick_handler;
    /* N.B. We _do not_ include SA_NODEFER to specifically prevent reentrancy
     * during the heartbeat. We /must/ include SA_RESTART to avoid issues with
     * e.g. ODBC connections.
     */
    ticker_handler_action.sa_flags = SA_RESTART | SA_ONSTACK;
    sigemptyset(&ticker_handler_action.sa_mask);
    if (sigaction(TICKER_SIGNAL, &ticker_handler_action, 0)) {
        perror("ioInitHeartbeat sigaction");
        exit(1);
    }

best
Eliot


Dave