VM crash investigations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

VM crash investigations

alistairgrant
 
Hi All,

At the moment I'm spending pretty much all of my working time tracking
down VM crashes.  It sounds like there may be others working on the same
issue (Guille?, Pablo?), if so it would be good to be able to exchange
notes and hopefully reach a resolution a little earlier.

So my status...

I'm currently focusing on two corruptions that I've seen:

1. Frame Pointers aren't being updated when the receiver or rcvr/clsr
they point to is moved during scavenging / compaction.
2. The current frame pointer (framePointer) contains an address that is
in a free stack page.


I'm doing all the investigation using a Pharo minimal image, so there's
no FreeType, and from what I've seen FFI isn't being used.

The script I'm using to reproduce the crash is at:
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/444#issuecomment-555001612
(the good part about this is that even the memory addresses are
consistent across runs, so it is highly reproducible).


For the Frame Pointers not being updated, what I'm seeing is that after
a scavenge has finished copying all the referenced objects, but before
the survivor spaces are exchanged, the call stack looks like:

(gdb) call printCallStack()
    0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860:
a(n) SessionManager
    0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860:
a(n) SessionManager
    0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728
  0x118a728 is a forwarded object to          0x488fca0 of slot size 7
hdr8 .....

Once the survivor spaces have been exchanged:

(gdb) call printCallStack()
    0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860:
a(n) SessionManager
    0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860:
a(n) SessionManager
    0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728 is in new space


For the framePointer containing an address in a free stack page: adding
a check during the scavenge shows that the framePointer is in a free
page.

I'm assuming it is never valid for the framePointer to be in a free
stack page, and that the receiver and rcvr/clsr should never be in new
space.  If my assumptions are wrong please let me know.

If you'd like any more information, please let me know.

Thanks,
Alistair