Hi Nicolas, can you give a short status report on Cog for Win64? What is not working for the JIT? Given that the stack spur vm works we should be close. So what still needs fixing? _,,,^..^,,,_ (phone)
|
Hi Eliot, I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble #if WIN64 # define WIN64ABI 1 #endif 2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
|
Hi Nicolas,
On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
I was thinking it would be natural to add it to the build.win64x64 makefiles. But I've hacked something up to add the above to cogit.c.
_,,,^..^,,,_ best, Eliot |
2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
Yes thanks, if it can work automagically then it's better. About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error: (gdb) ...snip...Continuing. Breakpoint 5, enterCogCodePopReceiver () at ../../spur64src/vm/ 5171 realCEEnterCogCodePopReceiverR (gdb) Continuing. Breakpoint 6, 0x000007fefd4ce547 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/ (gdb) call printCallStack() 0xf0b138 I Set class(HashedCollection class)>new 0xb9f8ee8: a(n) Set class 0xf0b168 M FFICallbackThunk class>startUp: 0xd06b718: a(n) FFICallbackThunk class 0xf0b1c0 M [] in SmalltalkImage>send: 0xf0b210 I OrderedCollection>do: 0xbda81d8: a(n) OrderedCollection 0xf0b260 I SmalltalkImage>send: 0xf0b2b8 I SmalltalkImage> 0xf0b310 I SmalltalkImage>snapshot: 0xc6187b0 s SmalltalkImage>snapshot: 0xbc9ee20 s SmalltalkImage>snapshot: (gdb) print reenterInterpreter $26 = {{Part = {15766176, 4294967295}}, {Part = {15766176, 15975296}}, {Part = {65001, 0}}, {Part = {4294967295, 8}}, {Part = { 1998004096, 15989488}}, {Part = {4356103, 3843995738016}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}} (gdb) cont Continuing. [Thread 1912.0x4e4 exited with code 0] gdb: unknown target exception 0xc0000028 at 0x77438078 Program received signal ?, Unknown signal. 0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ (gdb) where #0 0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ #1 0x00000000773d7eb6 in ntdll! #2 0x000007fefd4ce5a3 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/ #3 0x00000000004314f9 in returnToExecutivepostContextSw at ../../spur64src/vm/gcc3x- #4 0x000000000043a110 in activateNewMethod () at ../../spur64src/vm/gcc3x- #5 0x000000000043c6e2 in interpretMethodFromMachineCode () at ../../spur64src/vm/gcc3x- #6 0x0000000000442e19 in ceSendsupertonumArgs (selector=192910456, superNormalBar=0, rcvr=195006184, numArgs=0) at ../../spur64src/vm/gcc3x- #7 0x000000000ac000ba in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) ---------------------- It was quite long to execute the VM step by step, several setjmp/longjmp did succeed before the failing one, and I had got no idea which bug I was looking for, nor were enough exercized to decipher the C stack and/or the Smalltalk stack (in which variable they are stored etc...). So I tried a different angle of attack in end of March: let's trace the VM actions and then trace back from the failing point. Maybe it would give me a clue. My idea was to exploit something like: http://stackoverflow.com/quest Of course, cygwin is a strange linux, so I tried with dumbin, but the address were wrong. Finally it did work OK with objdump, see the final make_trace attached. Unfortunately, the debug VM traced by gdb does block before encountering the error (despite a few grep -v attempts to not catch the heartbeat). My crazy ideas do not seem to work... I also managed to get a gdb.exe.stackdump if asking to trace too many functions. I attach a gdb.log though for the curious. Since then I had no time to work on it. It requires a more intimate understanding of stack organization and other VM details. I'm not ready yet.
|
In reply to this post by Eliot Miranda-2
2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
Yes thanks, if it can work automagically then it's better. About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error: (gdb) ...snip...Continuing. Breakpoint 5, enterCogCodePopReceiver () at ../../spur64src/vm/cogitX64WIN 5171 realCEEnterCogCodePopReceiverR (gdb) Continuing. Breakpoint 6, 0x000007fefd4ce547 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/m (gdb) call printCallStack() 0xf0b138 I Set class(HashedCollection class)>new 0xb9f8ee8: a(n) Set class 0xf0b168 M FFICallbackThunk class>startUp: 0xd06b718: a(n) FFICallbackThunk class 0xf0b1c0 M [] in SmalltalkImage>send:toClassesN 0xf0b210 I OrderedCollection>do: 0xbda81d8: a(n) OrderedCollection 0xf0b260 I SmalltalkImage>send:toClassesN 0xf0b2b8 I SmalltalkImage>processStartUpL 0xf0b310 I SmalltalkImage>snapshot:andQui 0xc6187b0 s SmalltalkImage>snapshot:andQui 0xbc9ee20 s SmalltalkImage>snapshot:andQui (gdb) print reenterInterpreter $26 = {{Part = {15766176, 4294967295}}, {Part = {15766176, 15975296}}, {Part = {65001, 0}}, {Part = {4294967295, 8}}, {Part = { 1998004096, 15989488}}, {Part = {4356103, 3843995738016}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}} (gdb) cont Continuing. [Thread 1912.0x4e4 exited with code 0] gdb: unknown target exception 0xc0000028 at 0x77438078 Program received signal ?, Unknown signal. 0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/n (gdb) where #0 0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/n #1 0x00000000773d7eb6 in ntdll!TpAlpcRegisterCompletion #2 0x000007fefd4ce5a3 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/m #3 0x00000000004314f9 in returnToExecutivepostContextSw at ../../spur64src/vm/gcc3x-coint #4 0x000000000043a110 in activateNewMethod () at ../../spur64src/vm/gcc3x-coint #5 0x000000000043c6e2 in interpretMethodFromMachineCode () at ../../spur64src/vm/gcc3x-coint #6 0x0000000000442e19 in ceSendsupertonumArgs (selector=192910456, superNormalBar=0, rcvr=195006184, numArgs=0) at ../../spur64src/vm/gcc3x-coint #7 0x000000000ac000ba in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) ---------------------- It
was quite long to execute the VM step by step, several setjmp/longjmp
did succeed before the failing one, and I had got no idea which bug I
was looking for, nor were enough exercized to decipher the C stack
and/or the Smalltalk stack (in which variable they are stored etc...). So
I tried a different angle of attack in end of March: let's trace the VM
actions and then trace back from the failing point. Maybe it would give
me a clue. My idea was to exploit something like: http://stackoverflow.com/quest Of course, cygwin is a strange linux, so I tried with dumpbin, but the address were wrong. Finally it did work OK with objdump, see the final make_trace attached. Unfortunately,
the debug VM traced by gdb does block before encountering the error
(despite a few grep -v attempts to not catch the heartbeat). My crazy ideas do not seem to work... I also managed to get a gdb.exe.stackdump if asking to trace too many functions. I attach a gdb.log though for the curious. Since
then I had no time to work on it. It requires a more intimate
understanding of stack organization and other VM details. I'm not ready
yet.
|
Hi Nicolas,
On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
Ah, right. I've reread the thread. Thank you. So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp. Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack. Alternatively we might have to implement our own longjmp in JIT code for this specific purpose. Ben, what have you discovered about unwinding the stack during longjmp on win64?
_,,,^..^,,,_ best, Eliot |
2017-05-16 17:00 GMT+02:00 Eliot Miranda <[hidden email]>:
Hmm, if I stepi/nexti into longjmp then I have things like this: (gdb) 0x0000000077c07c34 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) 0x0000000077c07c37 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) 0x0000000077c07c39 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) 0x0000000077c07c3c in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) 0x0000000077c57eac in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) 0x0000000077c57eb1 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) gdb: unknown target exception 0xc0000028 at 0x77cb8078 Program received signal ?, Unknown signal. 0x0000000077cb8078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll (gdb) Is it what you don't want to see? I found this: http://www.agardner.me/golang/windows/cgo/64-bit/setjmp/longjmp/2016/02/29/go-windows-setjmp-x86.html https://sourceforge.net/p/mingw-w64/bugs/465/ I will try the builtin... |
2017-05-18 0:13 GMT+02:00 Nicolas Cellier <[hidden email]>:
And with the __builtin_sjlj hardocded directly in cointerp.c the execution goes a bit further Program received signal SIGSEGV, Segmentation fault. 0x00000000000008d4 in ?? () (gdb) bt #0 0x00000000000008d4 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) call printCallStack() 0xefaf40 M FilePath class(Behavior)>new 0x4511150: a(n) FilePath class 0xefaf70 M FilePath class>pathName:isEncoded: 0x4511150: a(n) FilePath class 0xefafc0 I FilePath class>pathName: 0x4511150: a(n) FilePath class 0xefb010 I FileDirectory class>setDefaultDirectory: 0x44faaa0: a(n) FileDirectory class 0xefb058 I FileDirectory class>startUp 0x44faaa0: a(n) FileDirectory class 0xefb088 M FileDirectory class(Behavior)>startUp: 0x44faaa0: a(n) FileDirectory class 0xefb0e0 M [] in SmalltalkImage>send:toClassesNamedIn:with: 0x4553ae0: a(n) SmalltalkImage 0xefb130 I OrderedCollection>do: 0x48a3808: a(n) OrderedCollection 0xefb180 I SmalltalkImage>send:toClassesNamedIn:with: 0x4553ae0: a(n) SmalltalkImage 0xefb1d8 I SmalltalkImage>processStartUpList: 0x4553ae0: a(n) SmalltalkImage 0xefb230 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 0x4553ae0: a(n) SmalltalkImage 0x69349f0 s SmalltalkImage>snapshot:andQuit:embedded: 0x6943690 s SmalltalkImage>snapshot:andQuit: ... |
In reply to this post by Nicolas Cellier
Hi Nicolas,
So the thing to try and find is a setjmp/longjmp implementation that does not attempt to unwind the stack and to use it for the renter interpreter and return to callback longjmps.
|
2017-05-18 1:03 GMT+02:00 Eliot Miranda <[hidden email]>:
So in mingw-w64, unless USE_NO_MINGW_SETJMP_TWO_ARGS has been defined (it is not by any default include file) there is a possibility to pass a 2nd parameter to a _setjmp, and if this 2nd argument is NULL, there will be no context unwinding. http://mingw-w64-public.narkive.com/1mUoWEfG/setjmp-longjmp-crashes-second-setjmp-argument https://patchwork.ozlabs.org/patch/437794/ grep -r USE_NO_MINGW_SETJMP_TWO_ARGS /usr/x86_64-w64-mingw32/sys-root/mingw/include Consequently, we need to amend sigsetjmp macros for WIN64 in cointerp.c header: something like: #undef sigsetjmp #undef siglongjmp #if _WIN64 # define sigsetjmp(jb,ssmf) _setjmp(jb,NULL) # define siglongjmp(jb,v) longjmp(jb,v) #elif _WIN32 # define sigsetjmp(jb,ssmf) setjmp(jb) # define siglongjmp(jb,v) longjmp(jb,v) ... It's less intrusive than __builtin_setjmp _builtin_longjmp (because the last one only work for a literal return value). After applying this patch, the startup list processes a bit further but then crashes with a SEGV, so there is yet another problem to be inquired...
|
Free forum by Nabble | Edit this page |