Win64 Builds broken, slow build times?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Win64 Builds broken, slow build times?

Tom Beckmann
 
Hi everyone,

I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
2809                    memset(theStackMemory, 0, stackPagesBytes);
(gdb) bt
#0  0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
#1  0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
#2  0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
#3  0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
#4  0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195

The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

Thank you for any pointers!
Tom
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Eliot Miranda-2
 
Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann <[hidden email]> wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809                    memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0  0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1  0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2  0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3  0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4  0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup.  I have been making these commits in recent weeks in the context of getting 64-bit Terf working.  Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works.  See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it.  Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

marcel.taeumel
 
Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Eliot Miranda-2
 
Hi Tom, Hi Marcel,

    I also see that the mingw32/Cygwin/clang build is broken but the MSVC/Clang build is not.  

If you use gdb to find out where the mingw32/Cygwin/clang breaks you will see that it is in the zeroing of the stack zone memory after the initial alloca of the stack zone.  The stack pointer gets set to a lower value by the alloca, as expected, but the stack memory is not committed so when the memset starts writing to the memory pointed to by the stack pointer it segfaults.

I initially had the same problem with the MSVC/Clang build, but at a different point, the JIT.  The JIT stack allocates the memory it uses for generating abstract instructions, etc, when generating machine code.  It can stack allocate over a megabyte.  To fix the crash I used the linker’s /STACK=size,committed flag to give the executable a 2mb fully committed stack, and this fixed the crashes.

I am using the MSVC/Clang build for Terf and we have had no problems with the core VM since.  However, in looking at SoundPlugin issues I did try the mingw32/Cygwin/clang build last week and saw the stack zone alloca crash that I expect is the cause of the breakage you observe.  I did have time to add a —stack size,committed flag to attempt to give the executable a 2mb fully committed stack, but this did not work and did not fix the crash, which remains in the same place.  I conclude that the way I tried to add the —stack size,committed flag is incorrect, although the linked did not produce any error messages.

I wish I had time to look at this but I don’t.  If anyone does have time, then my suggestion is to do a MSVC/Clang build alongside the mingw32/Cygwin/clang one, and find out how to introspect the executable to list its stack allocation parameters.  I know that MSVC’s editbin can be used to set these parameters but don’t know of a program to list them.  One obvious test is to use editbin to set the stack allocation parameters of the mingw32/Cygwin/clang build.  If my hypothesis is correct then it should produce a vm that starts up, and then the attempt to fix is simple, find out how to get the mingw32/Cygwin/clang linker to set correctly the stack allocation parameters.

HTH

On May 18, 2020, at 11:40 PM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)


Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Nicolas Cellier
 
Hi Eliot,
removing -mno-stack-arg-probe option from the makefile solves the cygwin build.
Maybe we can just make the option msvc-build specific as a temporary workaround...

Le mer. 12 août 2020 à 16:47, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tom, Hi Marcel,

    I also see that the mingw32/Cygwin/clang build is broken but the MSVC/Clang build is not.  

If you use gdb to find out where the mingw32/Cygwin/clang breaks you will see that it is in the zeroing of the stack zone memory after the initial alloca of the stack zone.  The stack pointer gets set to a lower value by the alloca, as expected, but the stack memory is not committed so when the memset starts writing to the memory pointed to by the stack pointer it segfaults.

I initially had the same problem with the MSVC/Clang build, but at a different point, the JIT.  The JIT stack allocates the memory it uses for generating abstract instructions, etc, when generating machine code.  It can stack allocate over a megabyte.  To fix the crash I used the linker’s /STACK=size,committed flag to give the executable a 2mb fully committed stack, and this fixed the crashes.

I am using the MSVC/Clang build for Terf and we have had no problems with the core VM since.  However, in looking at SoundPlugin issues I did try the mingw32/Cygwin/clang build last week and saw the stack zone alloca crash that I expect is the cause of the breakage you observe.  I did have time to add a —stack size,committed flag to attempt to give the executable a 2mb fully committed stack, but this did not work and did not fix the crash, which remains in the same place.  I conclude that the way I tried to add the —stack size,committed flag is incorrect, although the linked did not produce any error messages.

I wish I had time to look at this but I don’t.  If anyone does have time, then my suggestion is to do a MSVC/Clang build alongside the mingw32/Cygwin/clang one, and find out how to introspect the executable to list its stack allocation parameters.  I know that MSVC’s editbin can be used to set these parameters but don’t know of a program to list them.  One obvious test is to use editbin to set the stack allocation parameters of the mingw32/Cygwin/clang build.  If my hypothesis is correct then it should produce a vm that starts up, and then the attempt to fix is simple, find out how to get the mingw32/Cygwin/clang linker to set correctly the stack allocation parameters.

HTH

On May 18, 2020, at 11:40 PM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)


Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Nicolas Cellier
 
From https://archive.is/J01oT, my understanding is that if we remove stack-probe, then we are not anymore protected from a stack overflow when trying to allocate more than a page size on stack...
Does it help?

Le mer. 12 août 2020 à 17:06, Nicolas Cellier <[hidden email]> a écrit :
Hi Eliot,
removing -mno-stack-arg-probe option from the makefile solves the cygwin build.
Maybe we can just make the option msvc-build specific as a temporary workaround...

Le mer. 12 août 2020 à 16:47, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tom, Hi Marcel,

    I also see that the mingw32/Cygwin/clang build is broken but the MSVC/Clang build is not.  

If you use gdb to find out where the mingw32/Cygwin/clang breaks you will see that it is in the zeroing of the stack zone memory after the initial alloca of the stack zone.  The stack pointer gets set to a lower value by the alloca, as expected, but the stack memory is not committed so when the memset starts writing to the memory pointed to by the stack pointer it segfaults.

I initially had the same problem with the MSVC/Clang build, but at a different point, the JIT.  The JIT stack allocates the memory it uses for generating abstract instructions, etc, when generating machine code.  It can stack allocate over a megabyte.  To fix the crash I used the linker’s /STACK=size,committed flag to give the executable a 2mb fully committed stack, and this fixed the crashes.

I am using the MSVC/Clang build for Terf and we have had no problems with the core VM since.  However, in looking at SoundPlugin issues I did try the mingw32/Cygwin/clang build last week and saw the stack zone alloca crash that I expect is the cause of the breakage you observe.  I did have time to add a —stack size,committed flag to attempt to give the executable a 2mb fully committed stack, but this did not work and did not fix the crash, which remains in the same place.  I conclude that the way I tried to add the —stack size,committed flag is incorrect, although the linked did not produce any error messages.

I wish I had time to look at this but I don’t.  If anyone does have time, then my suggestion is to do a MSVC/Clang build alongside the mingw32/Cygwin/clang one, and find out how to introspect the executable to list its stack allocation parameters.  I know that MSVC’s editbin can be used to set these parameters but don’t know of a program to list them.  One obvious test is to use editbin to set the stack allocation parameters of the mingw32/Cygwin/clang build.  If my hypothesis is correct then it should produce a vm that starts up, and then the attempt to fix is simple, find out how to get the mingw32/Cygwin/clang linker to set correctly the stack allocation parameters.

HTH

On May 18, 2020, at 11:40 PM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)


Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Nicolas Cellier
 
So the question is why is -mno-stack-arg-probe required, optimization apart?
If for no other purpose than optimization, we shall better remove this form cygwin build, until we find out how to reserve (committed) stack space...

Le mer. 12 août 2020 à 17:42, Nicolas Cellier <[hidden email]> a écrit :
From https://archive.is/J01oT, my understanding is that if we remove stack-probe, then we are not anymore protected from a stack overflow when trying to allocate more than a page size on stack...
Does it help?

Le mer. 12 août 2020 à 17:06, Nicolas Cellier <[hidden email]> a écrit :
Hi Eliot,
removing -mno-stack-arg-probe option from the makefile solves the cygwin build.
Maybe we can just make the option msvc-build specific as a temporary workaround...

Le mer. 12 août 2020 à 16:47, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tom, Hi Marcel,

    I also see that the mingw32/Cygwin/clang build is broken but the MSVC/Clang build is not.  

If you use gdb to find out where the mingw32/Cygwin/clang breaks you will see that it is in the zeroing of the stack zone memory after the initial alloca of the stack zone.  The stack pointer gets set to a lower value by the alloca, as expected, but the stack memory is not committed so when the memset starts writing to the memory pointed to by the stack pointer it segfaults.

I initially had the same problem with the MSVC/Clang build, but at a different point, the JIT.  The JIT stack allocates the memory it uses for generating abstract instructions, etc, when generating machine code.  It can stack allocate over a megabyte.  To fix the crash I used the linker’s /STACK=size,committed flag to give the executable a 2mb fully committed stack, and this fixed the crashes.

I am using the MSVC/Clang build for Terf and we have had no problems with the core VM since.  However, in looking at SoundPlugin issues I did try the mingw32/Cygwin/clang build last week and saw the stack zone alloca crash that I expect is the cause of the breakage you observe.  I did have time to add a —stack size,committed flag to attempt to give the executable a 2mb fully committed stack, but this did not work and did not fix the crash, which remains in the same place.  I conclude that the way I tried to add the —stack size,committed flag is incorrect, although the linked did not produce any error messages.

I wish I had time to look at this but I don’t.  If anyone does have time, then my suggestion is to do a MSVC/Clang build alongside the mingw32/Cygwin/clang one, and find out how to introspect the executable to list its stack allocation parameters.  I know that MSVC’s editbin can be used to set these parameters but don’t know of a program to list them.  One obvious test is to use editbin to set the stack allocation parameters of the mingw32/Cygwin/clang build.  If my hypothesis is correct then it should produce a vm that starts up, and then the attempt to fix is simple, find out how to get the mingw32/Cygwin/clang linker to set correctly the stack allocation parameters.

HTH

On May 18, 2020, at 11:40 PM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)


Eliot
_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Win64 Builds broken, slow build times?

Nicolas Cellier
 
Sorry to fragment the thread like this, but maybe this helps too:


Le mer. 12 août 2020 à 18:32, Nicolas Cellier <[hidden email]> a écrit :
So the question is why is -mno-stack-arg-probe required, optimization apart?
If for no other purpose than optimization, we shall better remove this form cygwin build, until we find out how to reserve (committed) stack space...

Le mer. 12 août 2020 à 17:42, Nicolas Cellier <[hidden email]> a écrit :
From https://archive.is/J01oT, my understanding is that if we remove stack-probe, then we are not anymore protected from a stack overflow when trying to allocate more than a page size on stack...
Does it help?

Le mer. 12 août 2020 à 17:06, Nicolas Cellier <[hidden email]> a écrit :
Hi Eliot,
removing -mno-stack-arg-probe option from the makefile solves the cygwin build.
Maybe we can just make the option msvc-build specific as a temporary workaround...

Le mer. 12 août 2020 à 16:47, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tom, Hi Marcel,

    I also see that the mingw32/Cygwin/clang build is broken but the MSVC/Clang build is not.  

If you use gdb to find out where the mingw32/Cygwin/clang breaks you will see that it is in the zeroing of the stack zone memory after the initial alloca of the stack zone.  The stack pointer gets set to a lower value by the alloca, as expected, but the stack memory is not committed so when the memset starts writing to the memory pointed to by the stack pointer it segfaults.

I initially had the same problem with the MSVC/Clang build, but at a different point, the JIT.  The JIT stack allocates the memory it uses for generating abstract instructions, etc, when generating machine code.  It can stack allocate over a megabyte.  To fix the crash I used the linker’s /STACK=size,committed flag to give the executable a 2mb fully committed stack, and this fixed the crashes.

I am using the MSVC/Clang build for Terf and we have had no problems with the core VM since.  However, in looking at SoundPlugin issues I did try the mingw32/Cygwin/clang build last week and saw the stack zone alloca crash that I expect is the cause of the breakage you observe.  I did have time to add a —stack size,committed flag to attempt to give the executable a 2mb fully committed stack, but this did not work and did not fix the crash, which remains in the same place.  I conclude that the way I tried to add the —stack size,committed flag is incorrect, although the linked did not produce any error messages.

I wish I had time to look at this but I don’t.  If anyone does have time, then my suggestion is to do a MSVC/Clang build alongside the mingw32/Cygwin/clang one, and find out how to introspect the executable to list its stack allocation parameters.  I know that MSVC’s editbin can be used to set these parameters but don’t know of a program to list them.  One obvious test is to use editbin to set the stack allocation parameters of the mingw32/Cygwin/clang build.  If my hypothesis is correct then it should produce a vm that starts up, and then the attempt to fix is simple, find out how to get the mingw32/Cygwin/clang linker to set correctly the stack allocation parameters.

HTH

On May 18, 2020, at 11:40 PM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi Tom,

I reported this issue about a week ago:

Bintray version squeak.cog.spur_win64x64_202005170205 is still broken. Segfaults on startup.

Best,
Marcel

Am 19.05.2020 01:02:06 schrieb Eliot Miranda <[hidden email]>:


Hi Tom,


> On May 18, 2020, at 1:44 PM, Tom Beckmann wrote:
>
> 
> Hi everyone,
>
> I just tried building build.win64x64/squeak.cog.spur on the Cog branch but got a segfault on startup. I then tentatively went back 10 commits (HEAD~10) and it worked again. This is the output I received in gdb:
>
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> 2809 memset(theStackMemory, 0, stackPagesBytes);
> (gdb) bt
> #0 0x00000000004016f3 in interpret () at ../../spur64src/vm/gcc3x-cointerp.c:2809
> #1 0x000000000052c34c in sqMain (argc=2, argv=0x1dd53a0) at ../../platforms/win32/vm/sqWin32Main.c:1709
> #2 0x000000000052c7f2 in WinMain (hInst=0x400000, hPrevInstance=0x0, lpCmdLine=0xfc437c "../../../Squeak6.0alpha-19582-64bit-202003021730-Windows/Squeak6.0alpha-19582-64bit.image", nCmdShow=10) at ../../platforms/win32/vm/sqWin32Main.c:1802
> #3 0x00000000004013c7 in __tmainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:339
> #4 0x00000000004014cb in WinMainCRTStartup () at /usr/src/debug/mingw64-x86_64-runtime-7.0.0-1/crt/crtexe.c:195
>
> The main reason I'm writing, however, is that I only haven't done a bisect yet because building the VM appears unusually slow, when compared to building on Linux, as in, orders of magnitude slower. I believe I have the same setup as we do on appveyor on windows using cygwin64. Incremental builds seem to recompile a lot of files and it appears there are race conditions when building with multiple threads (-j8). Are these known limitations of the Windows build or am I potentially just having the wrong setup?

I hope it is simply wrong setup. I have been making these commits in recent weeks in the context of getting 64-bit Terf working. Terf is 3D ICC’s Croquet-derived business communications tool which was formerly known as Teleplace and Qwaq forums and was the context in which OpenSmalltalk-vm was conceived.

I am building 64-bits using Clang 10 and MSVC and I assure you this works. See HowToBuild for how to build using this configuration.

Your configuration may be obsolete or it may be valid, and if valid we should fix it. Can you list exactly what versions of software (Cygwin or mingw, gcc, clang) you’re using your build?


> Thank you for any pointers!
> Tom

Eliot
_,,,^..^,,,_ (phone)


Eliot
_,,,^..^,,,_ (phone)