Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Eliot Miranda-2
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/OpenSmalltalk/opensmalltalk-vm","title":"OpenSmalltalk/opensmalltalk-vm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/OpenSmalltalk/opensmalltalk-vm"}},"updates":{"snippets":[{"icon":"PERSON","message":"@nicolas-cellier-aka-nice on bed86c5: Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.\r\nCould you retry with \"${ROOT_DIR}/build.${ARCH}/pharo.**stack**.spur/build/vm\"?\r\nAt least, this should turn the appveyor status green again."}],"action":{"name":"View Commit","url":"https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/bed86c5723c150a7eafad111a1872d739dcbae97#commitcomment-22142928"}}}</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif


2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.



Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Eliot Miranda-2
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.







--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 

Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

(gdb)
Continuing.

Breakpoint 5, enterCogCodePopReceiver () at ../../spur64src/vm/cogitX64WIN64.c:5171
5171            realCEEnterCogCodePopReceiverReg();
(gdb)
Continuing.

Breakpoint 6, 0x000007fefd4ce547 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
(gdb) call printCallStack()

          0xf0b138 I Set class(HashedCollection class)>new 0xb9f8ee8: a(n) Set class
          0xf0b168 M FFICallbackThunk class>startUp: 0xd06b718: a(n) FFICallbackThunk class
          0xf0b1c0 M [] in SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b210 I OrderedCollection>do: 0xbda81d8: a(n) OrderedCollection
          0xf0b260 I SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b2b8 I SmalltalkImage>processStartUpList: 0xba53d18: a(n) SmalltalkImage
          0xf0b310 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 0xba53d18: a(n) SmalltalkImage
         0xc6187b0 s SmalltalkImage>snapshot:andQuit:embedded:
         0xbc9ee20 s SmalltalkImage>snapshot:andQuit:
...snip...

(gdb) print reenterInterpreter
$26 = {{Part = {15766176, 4294967295}}, {Part = {15766176, 15975296}}, {Part = {65001, 0}}, {Part = {4294967295, 8}}, {Part = {
      1998004096, 15989488}}, {Part = {4356103, 3843995738016}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}},
  {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}}
(gdb) cont
Continuing.
[Thread 1912.0x4e4 exited with code 0]
gdb: unknown target exception 0xc0000028 at 0x77438078

Program received signal ?, Unknown signal.
0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb) where
#0  0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#1  0x00000000773d7eb6 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#2  0x000007fefd4ce5a3 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
#3  0x00000000004314f9 in returnToExecutivepostContextSwitch (inInterpreter=0, switchedContext=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:22130
#4  0x000000000043a110 in activateNewMethod () at ../../spur64src/vm/gcc3x-cointerp.c:15045
#5  0x000000000043c6e2 in interpretMethodFromMachineCode () at ../../spur64src/vm/gcc3x-cointerp.c:19204
#6  0x0000000000442e19 in ceSendsupertonumArgs (selector=192910456, superNormalBar=0, rcvr=195006184, numArgs=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:17228
#7  0x000000000ac000ba in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

----------------------

It was quite long to execute the VM step by step, several setjmp/longjmp did succeed before the failing one, and I had got no idea which bug I was looking for, nor were enough exercized to decipher the C stack and/or the Smalltalk stack (in which variable they are stored etc...).

So I tried a different angle of attack in end of March: let's trace the VM actions and then trace back from the failing point. Maybe it would give me a clue.
My idea was to exploit something like:

http://stackoverflow.com/questions/311840/tool-to-trace-local-function-calls-in-linux#311912

Of course, cygwin is a strange linux, so I tried with dumbin, but the address were wrong.
Finally it did work OK with objdump, see the final make_trace attached.
Unfortunately, the debug VM traced by gdb does block before encountering the error (despite a few grep -v attempts to not catch the heartbeat).
My crazy ideas do not seem to work...
I also managed to get a gdb.exe.stackdump if asking to trace too many functions.
I attach a gdb.log though for the curious.

Since then I had no time to work on it. It requires a more intimate understanding of stack organization and other VM details. I'm not ready yet.

 
2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.







--
_,,,^..^,,,_
best, Eliot



make_trace (1K) Download Attachment
gdb.log (785K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
In reply to this post by Eliot Miranda-2
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

(gdb)
Continuing.

Breakpoint 5, enterCogCodePopReceiver () at ../../spur64src/vm/cogitX64WIN64.c:5171
5171            realCEEnterCogCodePopReceiverReg();
(gdb)
Continuing.

Breakpoint 6, 0x000007fefd4ce547 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
(gdb) call printCallStack()

          0xf0b138 I Set class(HashedCollection class)>new 0xb9f8ee8: a(n) Set class
          0xf0b168 M FFICallbackThunk class>startUp: 0xd06b718: a(n) FFICallbackThunk class
          0xf0b1c0 M [] in SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b210 I OrderedCollection>do: 0xbda81d8: a(n) OrderedCollection
          0xf0b260 I SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b2b8 I SmalltalkImage>processStartUpList: 0xba53d18: a(n) SmalltalkImage
          0xf0b310 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 0xba53d18: a(n) SmalltalkImage
         0xc6187b0 s SmalltalkImage>snapshot:andQuit:embedded:
         0xbc9ee20 s SmalltalkImage>snapshot:andQuit:
...snip...

(gdb) print reenterInterpreter
$26 = {{Part = {15766176, 4294967295}}, {Part = {15766176, 15975296}}, {Part = {65001, 0}}, {Part = {4294967295, 8}}, {Part = {
      1998004096, 15989488}}, {Part = {4356103, 3843995738016}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}},
  {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}}
(gdb) cont
Continuing.
[Thread 1912.0x4e4 exited with code 0]
gdb: unknown target exception 0xc0000028 at 0x77438078

Program received signal ?, Unknown signal.
0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb) where
#0  0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#1  0x00000000773d7eb6 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#2  0x000007fefd4ce5a3 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
#3  0x00000000004314f9 in returnToExecutivepostContextSwitch (inInterpreter=0, switchedContext=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:22130
#4  0x000000000043a110 in activateNewMethod () at ../../spur64src/vm/gcc3x-cointerp.c:15045
#5  0x000000000043c6e2 in interpretMethodFromMachineCode () at ../../spur64src/vm/gcc3x-cointerp.c:19204
#6  0x0000000000442e19 in ceSendsupertonumArgs (selector=192910456, superNormalBar=0, rcvr=195006184, numArgs=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:17228
#7  0x000000000ac000ba in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

----------------------

It was quite long to execute the VM step by step, several setjmp/longjmp did succeed before the failing one, and I had got no idea which bug I was looking for, nor were enough exercized to decipher the C stack and/or the Smalltalk stack (in which variable they are stored etc...).

So I tried a different angle of attack in end of March: let's trace the VM actions and then trace back from the failing point. Maybe it would give me a clue.
My idea was to exploit something like:

http://stackoverflow.com/questions/311840/tool-to-trace-local-function-calls-in-linux#311912

Of course, cygwin is a strange linux, so I tried with dumpbin, but the address were wrong.
Finally it did work OK with objdump, see the final make_trace attached.
Unfortunately, the debug VM traced by gdb does block before encountering the error (despite a few grep -v attempts to not catch the heartbeat).
My crazy ideas do not seem to work...
I also managed to get a gdb.exe.stackdump if asking to trace too many functions.
I attach a gdb.log though for the curious.

Since then I had no time to work on it. It requires a more intimate understanding of stack organization and other VM details. I'm not ready yet.

 
2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.







--
_,,,^..^,,,_
best, Eliot



make_trace (1K) Download Attachment
gdb.log.zip (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Eliot Miranda-2
 
Hi Nicolas,

On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

Ah, right.  I've reread the thread.  Thank you.  So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp.  Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack.  Alternatively we might have to implement our own longjmp in JIT code for this specific purpose.

Ben, what have you discovered about unwinding the stack during longjmp on win64?


(gdb)
Continuing.

Breakpoint 5, enterCogCodePopReceiver () at ../../spur64src/vm/cogitX64WIN64.c:5171
5171            realCEEnterCogCodePopReceiverReg();
(gdb)
Continuing.

Breakpoint 6, 0x000007fefd4ce547 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
(gdb) call printCallStack()

          0xf0b138 I Set class(HashedCollection class)>new 0xb9f8ee8: a(n) Set class
          0xf0b168 M FFICallbackThunk class>startUp: 0xd06b718: a(n) FFICallbackThunk class
          0xf0b1c0 M [] in SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b210 I OrderedCollection>do: 0xbda81d8: a(n) OrderedCollection
          0xf0b260 I SmalltalkImage>send:toClassesNamedIn:with: 0xba53d18: a(n) SmalltalkImage
          0xf0b2b8 I SmalltalkImage>processStartUpList: 0xba53d18: a(n) SmalltalkImage
          0xf0b310 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 0xba53d18: a(n) SmalltalkImage
         0xc6187b0 s SmalltalkImage>snapshot:andQuit:embedded:
         0xbc9ee20 s SmalltalkImage>snapshot:andQuit:
...snip...

(gdb) print reenterInterpreter
$26 = {{Part = {15766176, 4294967295}}, {Part = {15766176, 15975296}}, {Part = {65001, 0}}, {Part = {4294967295, 8}}, {Part = {
      1998004096, 15989488}}, {Part = {4356103, 3843995738016}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}},
  {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}, {Part = {0, 0}}}
(gdb) cont
Continuing.
[Thread 1912.0x4e4 exited with code 0]
gdb: unknown target exception 0xc0000028 at 0x77438078

Program received signal ?, Unknown signal.
0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb) where
#0  0x0000000077438078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#1  0x00000000773d7eb6 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#2  0x000007fefd4ce5a3 in msvcrt!longjmp () from /cygdrive/c/Windows/system32/msvcrt.dll
#3  0x00000000004314f9 in returnToExecutivepostContextSwitch (inInterpreter=0, switchedContext=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:22130
#4  0x000000000043a110 in activateNewMethod () at ../../spur64src/vm/gcc3x-cointerp.c:15045
#5  0x000000000043c6e2 in interpretMethodFromMachineCode () at ../../spur64src/vm/gcc3x-cointerp.c:19204
#6  0x0000000000442e19 in ceSendsupertonumArgs (selector=192910456, superNormalBar=0, rcvr=195006184, numArgs=0)
    at ../../spur64src/vm/gcc3x-cointerp.c:17228
#7  0x000000000ac000ba in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

----------------------

It was quite long to execute the VM step by step, several setjmp/longjmp did succeed before the failing one, and I had got no idea which bug I was looking for, nor were enough exercized to decipher the C stack and/or the Smalltalk stack (in which variable they are stored etc...).

So I tried a different angle of attack in end of March: let's trace the VM actions and then trace back from the failing point. Maybe it would give me a clue.
My idea was to exploit something like:

http://stackoverflow.com/questions/311840/tool-to-trace-local-function-calls-in-linux#311912

Of course, cygwin is a strange linux, so I tried with dumpbin, but the address were wrong.
Finally it did work OK with objdump, see the final make_trace attached.
Unfortunately, the debug VM traced by gdb does block before encountering the error (despite a few grep -v attempts to not catch the heartbeat).
My crazy ideas do not seem to work...
I also managed to get a gdb.exe.stackdump if asking to trace too many functions.
I attach a gdb.log though for the curious.

Since then I had no time to work on it. It requires a more intimate understanding of stack organization and other VM details. I'm not ready yet.

 
2017-05-15 16:24 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

    can you give a short status report on Cog for Win64?  What is not working for the JIT?  Given that the stack spur vm works we should be close.  So what still needs fixing?

_,,,^..^,,,_ (phone)

On May 15, 2017, at 7:13 AM, Nicolas Cellier <[hidden email]> wrote:

Hi Esteban, currently this fails because with don't build pharo.cog.spur yet in win64.
Could you retry with "${ROOT_DIR}/build.${ARCH}/pharo.stack.spur/build/vm"?
At least, this should turn the appveyor status green again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.







--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
 


2017-05-16 17:00 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

Ah, right.  I've reread the thread.  Thank you.  So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp.  Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack.  Alternatively we might have to implement our own longjmp in JIT code for this specific purpose.

Ben, what have you discovered about unwinding the stack during longjmp on win64?


Hmm, if I stepi/nexti into longjmp then I have things like this:

(gdb)
0x0000000077c07c34 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c37 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c39 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c3c in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eac in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eb1 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
gdb: unknown target exception 0xc0000028 at 0x77cb8078

Program received signal ?, Unknown signal.
0x0000000077cb8078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)


Is it what you don't want to see?

I will try the builtin...
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
 


2017-05-18 0:13 GMT+02:00 Nicolas Cellier <[hidden email]>:


2017-05-16 17:00 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

Ah, right.  I've reread the thread.  Thank you.  So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp.  Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack.  Alternatively we might have to implement our own longjmp in JIT code for this specific purpose.

Ben, what have you discovered about unwinding the stack during longjmp on win64?


Hmm, if I stepi/nexti into longjmp then I have things like this:

(gdb)
0x0000000077c07c34 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c37 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c39 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c3c in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eac in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eb1 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
gdb: unknown target exception 0xc0000028 at 0x77cb8078

Program received signal ?, Unknown signal.
0x0000000077cb8078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)


Is it what you don't want to see?

I will try the builtin...

And with the __builtin_sjlj hardocded directly in cointerp.c the execution goes a bit further

Program received signal SIGSEGV, Segmentation fault.
0x00000000000008d4 in ?? ()
(gdb) bt
#0  0x00000000000008d4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) call printCallStack()

          0xefaf40 M FilePath class(Behavior)>new 0x4511150: a(n) FilePath class
          0xefaf70 M FilePath class>pathName:isEncoded: 0x4511150: a(n) FilePath class
          0xefafc0 I FilePath class>pathName: 0x4511150: a(n) FilePath class
          0xefb010 I FileDirectory class>setDefaultDirectory: 0x44faaa0: a(n) FileDirectory class
          0xefb058 I FileDirectory class>startUp 0x44faaa0: a(n) FileDirectory class
          0xefb088 M FileDirectory class(Behavior)>startUp: 0x44faaa0: a(n) FileDirectory class
          0xefb0e0 M [] in SmalltalkImage>send:toClassesNamedIn:with: 0x4553ae0: a(n) SmalltalkImage
          0xefb130 I OrderedCollection>do: 0x48a3808: a(n) OrderedCollection
          0xefb180 I SmalltalkImage>send:toClassesNamedIn:with: 0x4553ae0: a(n) SmalltalkImage
          0xefb1d8 I SmalltalkImage>processStartUpList: 0x4553ae0: a(n) SmalltalkImage
          0xefb230 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 0x4553ae0: a(n) SmalltalkImage
         0x69349f0 s SmalltalkImage>snapshot:andQuit:embedded:
         0x6943690 s SmalltalkImage>snapshot:andQuit:
...
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Eliot Miranda-2
In reply to this post by Nicolas Cellier
 
Hi Nicolas,

On May 17, 2017, at 3:13 PM, Nicolas Cellier <[hidden email]> wrote:



2017-05-16 17:00 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

Ah, right.  I've reread the thread.  Thank you.  So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp.  Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack.  Alternatively we might have to implement our own longjmp in JIT code for this specific purpose.

Ben, what have you discovered about unwinding the stack during longjmp on win64?


Hmm, if I stepi/nexti into longjmp then I have things like this:

(gdb)
0x0000000077c07c34 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c37 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c39 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c3c in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eac in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eb1 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
gdb: unknown target exception 0xc0000028 at 0x77cb8078

Program received signal ?, Unknown signal.
0x0000000077cb8078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)


Is it what you don't want to see?

Right.  No unwinding is necessary since the vm is merely reentering the interpreter at the same stack level from which it entered machine code.

So the thing to try and find is a setjmp/longjmp implementation that does not attempt to unwind the stack and to use it for the renter interpreter and return to callback longjmps.
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] try appveyor (bed86c5)

Nicolas Cellier
 


2017-05-18 1:03 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On May 17, 2017, at 3:13 PM, Nicolas Cellier <[hidden email]> wrote:



2017-05-16 17:00 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Tue, May 16, 2017 at 7:10 AM, Nicolas Cellier <[hidden email]> wrote:
 


2017-05-15 17:39 GMT+02:00 Eliot Miranda <[hidden email]>:
 
Hi Nicolas,

On Mon, May 15, 2017 at 7:37 AM, Nicolas Cellier <[hidden email]> wrote:
 
Hi Eliot,
I've not worked on it for a month or so, but last time I tried, the VM failed during image startup.
AFAIR, it had time to produce and execute some JIT, but I can't say more until I have access to my laptop this evening,
I've tried some crazy things like tracing every VM function thru gdb hack, unfortunately, it does not scale (slow) and also perturbates the VM.
I'll try to report ASAP.

There's a first thing needed before testing it:
either add some -DWIN64ABI compilation flag somewhere, or add these 3 lines in cogit.c preamble

#if WIN64
# define WIN64ABI 1
#endif

I was thinking it would be natural to add it to the build.win64x64 makefiles.  But I've hacked something up to add the above to cogit.c.
 
Yes thanks, if it can work automagically then it's better.
About Win64 status, maybe you remember the thread opened on vm-dev in March 2017, where I mentionned the error:

Ah, right.  I've reread the thread.  Thank you.  So given that the alignment assert never fails the next likely suspect is unwinding the stack on longjmp.  Since we would like to avoid the cost of unwinding the stack, and because we know there is nothing to run as we unwind the stack we should see if there is a form of longjmp that does not try to unwind the stack.  Alternatively we might have to implement our own longjmp in JIT code for this specific purpose.

Ben, what have you discovered about unwinding the stack during longjmp on win64?


Hmm, if I stepi/nexti into longjmp then I have things like this:

(gdb)
0x0000000077c07c34 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c37 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c39 in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c07c3c in ntdll!RtlUnwindEx () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eac in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
0x0000000077c57eb1 in ntdll!TpAlpcRegisterCompletionList () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)
gdb: unknown target exception 0xc0000028 at 0x77cb8078

Program received signal ?, Unknown signal.
0x0000000077cb8078 in ntdll!RtlRaiseStatus () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
(gdb)


Is it what you don't want to see?

Right.  No unwinding is necessary since the vm is merely reentering the interpreter at the same stack level from which it entered machine code.

So the thing to try and find is a setjmp/longjmp implementation that does not attempt to unwind the stack and to use it for the renter interpreter and return to callback longjmps.


So in mingw-w64, unless USE_NO_MINGW_SETJMP_TWO_ARGS has been defined (it is not by any default include file)
there is a possibility to pass a 2nd parameter to a _setjmp, and if this 2nd argument is NULL, there will be no context unwinding.

http://mingw-w64-public.narkive.com/1mUoWEfG/setjmp-longjmp-crashes-second-setjmp-argument
https://patchwork.ozlabs.org/patch/437794/
grep -r USE_NO_MINGW_SETJMP_TWO_ARGS /usr/x86_64-w64-mingw32/sys-root/mingw/include

Consequently, we need to amend sigsetjmp macros for WIN64 in cointerp.c header:
something like:

#undef sigsetjmp
#undef siglongjmp
#if _WIN64
# define sigsetjmp(jb,ssmf) _setjmp(jb,NULL)
# define siglongjmp(jb,v) longjmp(jb,v)
#elif _WIN32
# define sigsetjmp(jb,ssmf) setjmp(jb)
# define siglongjmp(jb,v) longjmp(jb,v)
...

It's less intrusive than __builtin_setjmp _builtin_longjmp (because the last one only work for a literal return value).

After applying this patch, the startup list processes a bit further but then crashes with a SEGV, so there is yet another problem to be inquired...