Crashes on snapshot with the new compactor

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Crashes on snapshot with the new compactor

Eliot Miranda-2
 
Hi All,

    a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash.  There is a bug in the new compactor that unfortunately bites when saving.  The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects.  Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move.  (The amount of objects that can be moved in a single pass is limited by available free space.)  But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible.  Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority.  I have an example image from Esteban Lorenzano to test.  I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot.  This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit
| isImageStarting snapshotResult |
ChangesLog default logSnapshot: save andQuit: quit.

>> SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!"
...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag
"Mark the changes file and close all files as part of #processShutdownList.
If save is true, save the current state of this Smalltalk in the image file.
If quit is true, then exit to the outer OS shell.
If exitCode is not nil, then use it as exit code.
The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg |
Object flushDependents.
Object flushEvents.

...
Smalltalk processShutDownList: quit.
>> SmalltalkImage current primitiveGarbageCollect.
Cursor write show.
save ifTrue: [resuming := embeddedFlag 
ifTrue: [self snapshotEmbeddedPrimitive] 
ifFalse: [self snapshotPrimitive]]  "<-- PC frozen here on image file"
ifFalse: [resuming := false].

I do apologise for the bug.  I hope it will be fixed within a few days.

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Cuis-dev] Crashes on snapshot with the new compactor

Juan Vuletich-3
 
Hi Eliot,

Nobody has reported crashes on image save on Cuis. I never experienced one. So, I guess it is ok to wait for the VM fix, as the extra GC as a workaround doesn't seem needed in Cuis.

Thanks,

On 25/03/2017 05:27 p.m., Eliot Miranda via Cuis-dev wrote:
Hi All,

    a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash.  There is a bug in the new compactor that unfortunately bites when saving.  The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects.  Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move.  (The amount of objects that can be moved in a single pass is limited by available free space.)  But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible.  Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority.  I have an example image from Esteban Lorenzano to test.  I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot.  This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit
| isImageStarting snapshotResult |
ChangesLog default logSnapshot: save andQuit: quit.

>> SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!"
...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag
"Mark the changes file and close all files as part of #processShutdownList.
If save is true, save the current state of this Smalltalk in the image file.
If quit is true, then exit to the outer OS shell.
If exitCode is not nil, then use it as exit code.
The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg |
Object flushDependents.
Object flushEvents.

...
Smalltalk processShutDownList: quit.
>> SmalltalkImage current primitiveGarbageCollect.
Cursor write show.
save ifTrue: [resuming := embeddedFlag 
ifTrue: [self snapshotEmbeddedPrimitive] 
ifFalse: [self snapshotPrimitive]]  "<-- PC frozen here on image file"
ifFalse: [resuming := false].

I do apologise for the bug.  I hope it will be fixed within a few days.

_,,,^..^,,,_
best, Eliot
_______________________________________________ Cuis-dev mailing list [hidden email] http://cuis-smalltalk.org/mailman/listinfo/cuis-dev_cuis-smalltalk.org


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich
Reply | Threaded
Open this post in threaded view
|

Re: Crashes on snapshot with the new compactor

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
 
Hi All,

    I have fixed a bug in the compactor that accounts for the two cases I've analysed and the two fairly repeatable crashes I have at hand (three cases in all).  I hope that all those who have been experiencing crashes can start using the latest build asap.

It is fixed in these commits:

Name: VMMaker.oscog-eem.2187
Author: eem
Time: 27 March 2017, 3:00:06.676146 pm
UUID: 2259d299-65a4-42d0-a01b-4b25f5a89745
Ancestors: VMMaker.oscog-rsf.2186

SpurPlanningCompactor:
Fix a big in resetting the free chunk used for the firstUnusedFieldsSpace after non-final pasxses (i.e. on snapshot).  The old code didn't check to see if a free chunk was actually found(!!).

and

Branch: refs/heads/Cog
 Home:   https://github.com/OpenSmalltalk/opensmalltalk-vm
 Commit: 4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a
     https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a
 Author: Eliot Miranda <[hidden email]>
 Date:   2017-03-27 (Mon, 27 Mar 2017)

The bintray files are here:

On Mar 25, 2017, at 1:27 PM, Eliot Miranda <[hidden email]> wrote:

Hi All,

    a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash.  There is a bug in the new compactor that unfortunately bites when saving.  The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects.  Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move.  (The amount of objects that can be moved in a single pass is limited by available free space.)  But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible.  Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority.  I have an example image from Esteban Lorenzano to test.  I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot.  This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit
| isImageStarting snapshotResult |
ChangesLog default logSnapshot: save andQuit: quit.

>> SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!"
...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag
"Mark the changes file and close all files as part of #processShutdownList.
If save is true, save the current state of this Smalltalk in the image file.
If quit is true, then exit to the outer OS shell.
If exitCode is not nil, then use it as exit code.
The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg |
Object flushDependents.
Object flushEvents.

...
Smalltalk processShutDownList: quit.
>> SmalltalkImage current primitiveGarbageCollect.
Cursor write show.
save ifTrue: [resuming := embeddedFlag 
ifTrue: [self snapshotEmbeddedPrimitive] 
ifFalse: [self snapshotPrimitive]]  "<-- PC frozen here on image file"
ifFalse: [resuming := false].

I do apologise for the bug.  I hope it will be fixed within a few days.

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Cuis-dev] Crashes on snapshot with the new compactor

Juan Vuletich-3
 
Thanks Eliot!

On 27/03/2017 11:16 p.m., Eliot Miranda via Cuis-dev wrote:
Hi All,

    I have fixed a bug in the compactor that accounts for the two cases I've analysed and the two fairly repeatable crashes I have at hand (three cases in all).  I hope that all those who have been experiencing crashes can start using the latest build asap.

It is fixed in these commits:

Name: VMMaker.oscog-eem.2187
Author: eem
Time: 27 March 2017, 3:00:06.676146 pm
UUID: 2259d299-65a4-42d0-a01b-4b25f5a89745
Ancestors: VMMaker.oscog-rsf.2186

SpurPlanningCompactor:
Fix a big in resetting the free chunk used for the firstUnusedFieldsSpace after non-final pasxses (i.e. on snapshot).  The old code didn't check to see if a free chunk was actually found(!!).

and

Branch: refs/heads/Cog
 Home:   https://github.com/OpenSmalltalk/opensmalltalk-vm
 Commit: 4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a
     https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a
 Author: Eliot Miranda <[hidden email]>
 Date:   2017-03-27 (Mon, 27 Mar 2017)

The bintray files are here:

On Mar 25, 2017, at 1:27 PM, Eliot Miranda <[hidden email]> wrote:

Hi All,

    a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash.  There is a bug in the new compactor that unfortunately bites when saving.  The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects.  Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move.  (The amount of objects that can be moved in a single pass is limited by available free space.)  But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible.  Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority.  I have an example image from Esteban Lorenzano to test.  I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot.  This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit
| isImageStarting snapshotResult |
ChangesLog default logSnapshot: save andQuit: quit.

>> SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!"
...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag
"Mark the changes file and close all files as part of #processShutdownList.
If save is true, save the current state of this Smalltalk in the image file.
If quit is true, then exit to the outer OS shell.
If exitCode is not nil, then use it as exit code.
The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg |
Object flushDependents.
Object flushEvents.

...
Smalltalk processShutDownList: quit.
>> SmalltalkImage current primitiveGarbageCollect.
Cursor write show.
save ifTrue: [resuming := embeddedFlag 
ifTrue: [self snapshotEmbeddedPrimitive] 
ifFalse: [self snapshotPrimitive]]  "<-- PC frozen here on image file"
ifFalse: [resuming := false].

I do apologise for the bug.  I hope it will be fixed within a few days.

_,,,^..^,,,_
best, Eliot
_______________________________________________ Cuis-dev mailing list [hidden email] http://cuis-smalltalk.org/mailman/listinfo/cuis-dev_cuis-smalltalk.org


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich