Re: VM crash investigations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: VM crash investigations

Stéphane Ducasse
Hi Alistair

Did you read that paper https://hal.inria.fr/hal-01152610 ?
Because may be it can give you ideas of possible problems. 

Pablo is on vacation until next monday. Guille working wednesday, thursday and friday afternoons

S

On 25 Nov 2019, at 10:18, Alistair Grant <[hidden email]> wrote:

Hi All,

At the moment I'm spending pretty much all of my working time tracking
down VM crashes.  It sounds like there may be others working on the same
issue (Guille?, Pablo?), if so it would be good to be able to exchange
notes and hopefully reach a resolution a little earlier.

So my status...

I'm currently focusing on two corruptions that I've seen:

1. Frame Pointers aren't being updated when the receiver or rcvr/clsr
they point to is moved during scavenging / compaction.
2. The current frame pointer (framePointer) contains an address that is
in a free stack page.


I'm doing all the investigation using a Pharo minimal image, so there's
no FreeType, and from what I've seen FFI isn't being used.

The script I'm using to reproduce the crash is at:
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/444#issuecomment-555001612
(the good part about this is that even the memory addresses are
consistent across runs, so it is highly reproducible).


For the Frame Pointers not being updated, what I'm seeing is that after
a scavenge has finished copying all the referenced objects, but before
the survivor spaces are exchanged, the call stack looks like:

(gdb) call printCallStack()
   0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860:
a(n) SessionManager
   0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860:
a(n) SessionManager
   0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728
 0x118a728 is a forwarded object to          0x488fca0 of slot size 7
hdr8 .....

Once the survivor spaces have been exchanged:

(gdb) call printCallStack()
   0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860:
a(n) SessionManager
   0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860:
a(n) SessionManager
   0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728 is in new space


For the framePointer containing an address in a free stack page: adding
a check during the scavenge shows that the framePointer is in a free
page.

I'm assuming it is never valid for the framePointer to be in a free
stack page, and that the receiver and rcvr/clsr should never be in new
space.  If my assumptions are wrong please let me know.

If you'd like any more information, please let me know.

Thanks,
Alistair


--------------------------------------------
Stéphane Ducasse
03 59 35 87 52
Assistant: Julie Jonas 
FAX 03 59 57 78 50
TEL 03 59 35 86 16
S. Ducasse - Inria
40, avenue Halley, 
Parc Scientifique de la Haute Borne, Bât.A, Park Plaza
Villeneuve d'Ascq 59650
France

Reply | Threaded
Open this post in threaded view
|

Re: VM crash investigations

alistairgrant
Hi Stef,

On Mon, 25 Nov 2019 at 21:36, Stéphane Ducasse
<[hidden email]> wrote:
>
> Did you read that paper https://hal.inria.fr/hal-01152610 ?
> Because may be it can give you ideas of possible problems.

That, and Eliot's blog posts, and class comments of SpurMemoryManager,
SpurPlanningCompactor, etc. (not that I understood it all)  :-)


> Pablo is on vacation until next monday. Guille working wednesday, thursday and friday afternoons

OK, thanks.

Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: VM crash investigations

ducasse


> On 26 Nov 2019, at 10:06, Alistair Grant <[hidden email]> wrote:
>
> Hi Stef,
>
> On Mon, 25 Nov 2019 at 21:36, Stéphane Ducasse
> <[hidden email]> wrote:
>>
>> Did you read that paper https://hal.inria.fr/hal-01152610 ?
>> Because may be it can give you ideas of possible problems.
>
> That, and Eliot's blog posts, and class comments of SpurMemoryManager,
> SpurPlanningCompactor, etc. (not that I understood it all)  :-)

Same here :)

>
>
>> Pablo is on vacation until next monday. Guille working wednesday, thursday and friday afternoons
>
> OK, thanks.
>
> Cheers,
> Alistair
>