Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolai Hess-3-2


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolai Hess-3-2


2017-03-14 16:56 GMT+01:00 Nicolai Hess <[hidden email]>:


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

Wouldn't it be possible to just *disable* the GC (scavange) when entering a primitive ?
 


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Clément Bera-4


On Tue, Mar 14, 2017 at 8:57 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:56 GMT+01:00 Nicolai Hess <[hidden email]>:


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 


That's clearly the best solution unless someone figures out a better d) solution.
 
d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

Wouldn't it be possible to just *disable* the GC (scavange) when entering a primitive ?

If you disable only the scavenges and some of the objects are in old space and a fullGC happens, then the same problem happens.

During the call-backs, an infinite amount of objects can be allocated. Where do you allocate such objects if you disable the GC ?

 


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by Nicolai Hess-3-2


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

philippeback


Le 14 mars 2017 17:34, "Eliot Miranda" <[hidden email]> a écrit :


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]

There is also the stacked callbacks story.
So a callback is triggered from a callback etc.

I think that pinning the objects one knows shouldn't move is the best "control freak" version. At least when something fails one knows why.
Not callback related but the FT2Handle bug comes to mind...

Maybe this moves such objects to old space today. But it doesn't mean that this is the end of it GC wise. Why not have a special space for pinned objects? After all they aren't old nor new. Some external code needs them so, do not touch unless so told.

Ending up in half baked state like in the mentioned Display example is really bad for having confidencd in the system.

With UFFI and callbacks we have a very powerful mechanism to take in lots of interesting features.

As a library wrapping developer I want to know if the thing bombs because of me or because of the external library and exclude the VM code from that. Because I want to trust the VM and GC to do the right thing and not a "mostly works a lot of the times but sometimes I'll just blow in your face mate" version.

If there are leaks I want to see pinned objects space grow. And then look into that with tools. And get it fixed.

This concern is really a key to our future.
That is in a sense what made nodejs so valued.

Let's not make it crappy and focused on the Athens/Form story only. Let's solve it for good. Our future selves are going to thank us.
 
Phil


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Igor Stasenko
In reply to this post by Eliot Miranda-2
Here's my d)
implement callback functions in C, or in native form => no need for entering the smalltalk execution => no risk of GC => nothing to worry about.

I guess nobody will like it (and will be right, of course ;) , but it is how it was originally done. I used NativeBoost to implement those callback functions and they're won't cause any GC problems.

That, of course, gave me solution in this concrete case, but not in general.. i.e. : if you have another callback that cannot be implemented na(t)ively, then 
you facing similar problems, mainly: how to work around the problem, that primitive(s) that using callbacks may capture state, that are subject of GC activity.

In general , then, i think such primitive should be (re)written in such way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way, from general interfacing/implementation standpoint.
I would just add extra interface for using it especially in primitives, so that it 
1) won't punish primitive writer with too much coding
2) automatically handle primitive/callback nesting e.g. 
primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2 -> adds roots2 -> calls fn2 -> callback2 ... 


something like this:

static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots

or, since we have green threading, then maybe better will be in this form: 

rootsId := static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.



On 14 March 2017 at 18:33, Eliot Miranda <[hidden email]> wrote:
 


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot




--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Igor Stasenko


On 15 March 2017 at 11:53, Igor Stasenko <[hidden email]> wrote:
Here's my d)
implement callback functions in C, or in native form => no need for entering the smalltalk execution => no risk of GC => nothing to worry about.

I guess nobody will like it (and will be right, of course ;) , but it is how it was originally done. I used NativeBoost to implement those callback functions and they're won't cause any GC problems.

That, of course, gave me solution in this concrete case, but not in general.. i.e. : if you have another callback that cannot be implemented na(t)ively, then 
you facing similar problems, mainly: how to work around the problem, that primitive(s) that using callbacks may capture state, that are subject of GC activity.

In general , then, i think such primitive should be (re)written in such way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way, from general interfacing/implementation standpoint.
I would just add extra interface for using it especially in primitives, so that it 
1) won't punish primitive writer with too much coding
2) automatically handle primitive/callback nesting e.g. 
primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2 -> adds roots2 -> calls fn2 -> callback2 ... 


something like this:

static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots

or, since we have green threading, then maybe better will be in this form: 

rootsId := static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.

oops, i meant:

static initialized once myprimooptable = [ a,b,c].
rootsId := vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.
 
the idea is mainly, that VM answers you some key, using which you can later identify, what to release.


On 14 March 2017 at 18:33, Eliot Miranda <[hidden email]> wrote:
 


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot




--
Best regards,
Igor Stasenko.



--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by Igor Stasenko
Hi Igor,

On Wed, Mar 15, 2017 at 2:53 AM, Igor Stasenko <[hidden email]> wrote:
 
Here's my d)
implement callback functions in C, or in native form => no need for entering the smalltalk execution => no risk of GC => nothing to worry about.

I guess nobody will like it (and will be right, of course ;) , but it is how it was originally done. I used NativeBoost to implement those callback functions and they're won't cause any GC problems.

yes, I like this.  I was wondering why the callbacks solution was used at all yesterday.  All they do is redirect to the cairo library.  What are the reasons?  Tedious to write and maintain the necessary simple plugin?

Clément pointed out a really ugly problem with the current implementation.  If one calls back into Pharo from the BitBlt primitives and then reinvokes BitBlt, say by innocently putting a halt in those callbacks, then the original BitBlt's state will get overwritten by the BitBlt invocations in the callback's dynamic exert.  At least with my changes the BitBlt primitive will abort, rather than continue with the invalid state.


That, of course, gave me solution in this concrete case, but not in general.. i.e. : if you have another callback that cannot be implemented na(t)ively, then 
you facing similar problems, mainly: how to work around the problem, that primitive(s) that using callbacks may capture state, that are subject of GC activity.

In general , then, i think such primitive should be (re)written in such way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way, from general interfacing/implementation standpoint.
I would just add extra interface for using it especially in primitives, so that it 
1) won't punish primitive writer with too much coding
2) automatically handle primitive/callback nesting e.g. 
primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2 -> adds roots2 -> calls fn2 -> callback2 ... 


something like this:

static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots

or, since we have green threading, then maybe better will be in this form: 

rootsId := static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.

We kind of have this with the addGCRoot: interface.  But I think it's much better to design the system so that the primitive fails and can be retried.  The problem there is having to have the primitive failure code check and roll back.  For example in the copyBits primitive one sees

((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self copyBits].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self copyBits].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self copyBits].

This is really scruffy because...GrafPort implements copyBits, so this ends up not just retrying the primitive but running a lot more besides.  One way to write it is


((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: BitBlt].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext method methodClass].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext methodClass].

but that's ugly.

A mechanism that was in the VM would be nice.  The state for the invocation is saved on the stack.  So there could be a special failure path for this kind of recursive invocation problem; another send-back such as doesNotUnderstand: attemptToReturn:through:.  Note (I'm sure you know this Igor)  that there are primitives such as the ThreadedFFIPlugin's call-out primitive that very much expect to be invoked recursively and have no problem with it.




On 14 March 2017 at 18:33, Eliot Miranda <[hidden email]> wrote:
 


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot




--
Best regards,
Igor Stasenko.




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

philippeback
There is more that Cairo callbacks in userland. I am using libstrophe for Xmpp and there are tons of callbacks and not being able to use them is putting Pharo out of the equation b/c the main loop is basically invoking callbacks for each evenr type.

Phil

Le 16 mars 2017 00:44, "Eliot Miranda" <[hidden email]> a écrit :
 
Hi Igor,

On Wed, Mar 15, 2017 at 2:53 AM, Igor Stasenko <[hidden email]> wrote:
 
Here's my d)
implement callback functions in C, or in native form => no need for entering the smalltalk execution => no risk of GC => nothing to worry about.

I guess nobody will like it (and will be right, of course ;) , but it is how it was originally done. I used NativeBoost to implement those callback functions and they're won't cause any GC problems.

yes, I like this.  I was wondering why the callbacks solution was used at all yesterday.  All they do is redirect to the cairo library.  What are the reasons?  Tedious to write and maintain the necessary simple plugin?

Clément pointed out a really ugly problem with the current implementation.  If one calls back into Pharo from the BitBlt primitives and then reinvokes BitBlt, say by innocently putting a halt in those callbacks, then the original BitBlt's state will get overwritten by the BitBlt invocations in the callback's dynamic exert.  At least with my changes the BitBlt primitive will abort, rather than continue with the invalid state.


That, of course, gave me solution in this concrete case, but not in general.. i.e. : if you have another callback that cannot be implemented na(t)ively, then 
you facing similar problems, mainly: how to work around the problem, that primitive(s) that using callbacks may capture state, that are subject of GC activity.

In general , then, i think such primitive should be (re)written in such way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way, from general interfacing/implementation standpoint.
I would just add extra interface for using it especially in primitives, so that it 
1) won't punish primitive writer with too much coding
2) automatically handle primitive/callback nesting e.g. 
primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2 -> adds roots2 -> calls fn2 -> callback2 ... 


something like this:

static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots

or, since we have green threading, then maybe better will be in this form: 

rootsId := static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.

We kind of have this with the addGCRoot: interface.  But I think it's much better to design the system so that the primitive fails and can be retried.  The problem there is having to have the primitive failure code check and roll back.  For example in the copyBits primitive one sees

((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self copyBits].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self copyBits].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self copyBits].

This is really scruffy because...GrafPort implements copyBits, so this ends up not just retrying the primitive but running a lot more besides.  One way to write it is


((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: BitBlt].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext method methodClass].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext methodClass].

but that's ugly.

A mechanism that was in the VM would be nice.  The state for the invocation is saved on the stack.  So there could be a special failure path for this kind of recursive invocation problem; another send-back such as doesNotUnderstand: attemptToReturn:through:.  Note (I'm sure you know this Igor)  that there are primitives such as the ThreadedFFIPlugin's call-out primitive that very much expect to be invoked recursively and have no problem with it.




On 14 March 2017 at 18:33, Eliot Miranda <[hidden email]> wrote:
 


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot




--
Best regards,
Igor Stasenko.




--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Igor Stasenko
In reply to this post by Eliot Miranda-2


On 16 March 2017 at 01:44, Eliot Miranda <[hidden email]> wrote:
 
Hi Igor,

On Wed, Mar 15, 2017 at 2:53 AM, Igor Stasenko <[hidden email]> wrote:
 
Here's my d)
implement callback functions in C, or in native form => no need for entering the smalltalk execution => no risk of GC => nothing to worry about.

I guess nobody will like it (and will be right, of course ;) , but it is how it was originally done. I used NativeBoost to implement those callback functions and they're won't cause any GC problems.

yes, I like this.  I was wondering why the callbacks solution was used at all yesterday.  All they do is redirect to the cairo library.  What are the reasons?  Tedious to write and maintain the necessary simple plugin?

Clément pointed out a really ugly problem with the current implementation.  If one calls back into Pharo from the BitBlt primitives and then reinvokes BitBlt, say by innocently putting a halt in those callbacks, then the original BitBlt's state will get overwritten by the BitBlt invocations in the callback's dynamic exert.  At least with my changes the BitBlt primitive will abort, rather than continue with the invalid state.


That, of course, gave me solution in this concrete case, but not in general.. i.e. : if you have another callback that cannot be implemented na(t)ively, then 
you facing similar problems, mainly: how to work around the problem, that primitive(s) that using callbacks may capture state, that are subject of GC activity.

In general , then, i think such primitive should be (re)written in such way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way, from general interfacing/implementation standpoint.
I would just add extra interface for using it especially in primitives, so that it 
1) won't punish primitive writer with too much coding
2) automatically handle primitive/callback nesting e.g. 
primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2 -> adds roots2 -> calls fn2 -> callback2 ... 


something like this:

static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots

or, since we have green threading, then maybe better will be in this form: 

rootsId := static initialized once myprimooptable = [ a,b,c].
vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.

We kind of have this with the addGCRoot: interface.  But I think it's much better to design the system so that the primitive fails and can be retried.  The problem there is having to have the primitive failure code check and roll back.  For example in the copyBits primitive one sees

((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self copyBits].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self copyBits].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self copyBits].

This is really scruffy because...GrafPort implements copyBits, so this ends up not just retrying the primitive but running a lot more besides.  One way to write it is


((sourceForm isForm) and: [sourceForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: BitBlt].
((destForm isForm) and: [destForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext method methodClass].
((halftoneForm isForm) and: [halftoneForm unhibernate])
ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass: thisContext methodClass].

but that's ugly.

A mechanism that was in the VM would be nice.  The state for the invocation is saved on the stack.  So there could be a special failure path for this kind of recursive invocation problem; another send-back such as doesNotUnderstand: attemptToReturn:through:.  Note (I'm sure you know this Igor)  that there are primitives such as the ThreadedFFIPlugin's call-out primitive that very much expect to be invoked recursively and have no problem with it.


Yes, i know it. But keep in mind, that reentrant FFI callout mechanism means *just* reentrant FFI callout code, it doesn't means that things, it will be calling, automagically become reentrant as well.
And the above note about "Clément pointed out a really ugly problem" is good example of it :)
And since you cannot predict/prevent/pretend/protect every single piece of code, that FFI users are going to call, there's no solution to that, unless users understand what they do and be aware of pitfalls and consequences.

It is quite easy to say "meh.. your FFI don't works.. go away, come year later.. i will use other (put other language/software system there)".. mostly because of ignorance, that there's a limits on what FFI can do and what can't.

--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Stephane Ducasse-3
Hi igor

but this is also important that if the user is a nice guy and the
library is acting well we do not have to write a plugin for each
callback, no?
To me forcing people to write a plugin is not nice.
Having solution that would work on normal use case is really nice to have.

Stef


On Thu, Mar 16, 2017 at 12:04 PM, Igor Stasenko <[hidden email]> wrote:

>
>
> On 16 March 2017 at 01:44, Eliot Miranda <[hidden email]> wrote:
>>
>>
>> Hi Igor,
>>
>> On Wed, Mar 15, 2017 at 2:53 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>>
>>> Here's my d)
>>> implement callback functions in C, or in native form => no need for
>>> entering the smalltalk execution => no risk of GC => nothing to worry about.
>>>
>>> I guess nobody will like it (and will be right, of course ;) , but it is
>>> how it was originally done. I used NativeBoost to implement those callback
>>> functions and they're won't cause any GC problems.
>>
>>
>> yes, I like this.  I was wondering why the callbacks solution was used at
>> all yesterday.  All they do is redirect to the cairo library.  What are the
>> reasons?  Tedious to write and maintain the necessary simple plugin?
>>
>> Clément pointed out a really ugly problem with the current implementation.
>> If one calls back into Pharo from the BitBlt primitives and then reinvokes
>> BitBlt, say by innocently putting a halt in those callbacks, then the
>> original BitBlt's state will get overwritten by the BitBlt invocations in
>> the callback's dynamic exert.  At least with my changes the BitBlt primitive
>> will abort, rather than continue with the invalid state.
>>
>>>
>>> That, of course, gave me solution in this concrete case, but not in
>>> general.. i.e. : if you have another callback that cannot be implemented
>>> na(t)ively, then
>>> you facing similar problems, mainly: how to work around the problem, that
>>> primitive(s) that using callbacks may capture state, that are subject of GC
>>> activity.
>>>
>>> In general , then, i think such primitive should be (re)written in such
>>> way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best way,
>>> from general interfacing/implementation standpoint.
>>> I would just add extra interface for using it especially in primitives,
>>> so that it
>>> 1) won't punish primitive writer with too much coding
>>> 2) automatically handle primitive/callback nesting e.g.
>>> primitive1 -> adds roots1 -> calls fn -> callback -> st code ->
>>> primitive2 -> adds roots2 -> calls fn2 -> callback2 ...
>>>
>>>
>>> something like this:
>>>
>>> static initialized once myprimooptable = [ a,b,c].
>>> vm pushPrimRoots: myooptable.
>>> self do things primitive does.
>>> vm popPrimRoots
>>>
>>> or, since we have green threading, then maybe better will be in this
>>> form:
>>>
>>> rootsId := static initialized once myprimooptable = [ a,b,c].
>>> vm pushPrimRoots: myooptable.
>>> self do things primitive does.
>>> vm popPrimRoots: rootsId.
>>
>>
>> We kind of have this with the addGCRoot: interface.  But I think it's much
>> better to design the system so that the primitive fails and can be retried.
>> The problem there is having to have the primitive failure code check and
>> roll back.  For example in the copyBits primitive one sees
>>
>> ((sourceForm isForm) and: [sourceForm unhibernate])
>> ifTrue: [^ self copyBits].
>> ((destForm isForm) and: [destForm unhibernate])
>> ifTrue: [^ self copyBits].
>> ((halftoneForm isForm) and: [halftoneForm unhibernate])
>> ifTrue: [^ self copyBits].
>>
>> This is really scruffy because...GrafPort implements copyBits, so this
>> ends up not just retrying the primitive but running a lot more besides.  One
>> way to write it is
>>
>>
>> ((sourceForm isForm) and: [sourceForm unhibernate])
>> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
>> BitBlt].
>> ((destForm isForm) and: [destForm unhibernate])
>> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
>> thisContext method methodClass].
>> ((halftoneForm isForm) and: [halftoneForm unhibernate])
>> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
>> thisContext methodClass].
>>
>> but that's ugly.
>>
>> A mechanism that was in the VM would be nice.  The state for the
>> invocation is saved on the stack.  So there could be a special failure path
>> for this kind of recursive invocation problem; another send-back such as
>> doesNotUnderstand: attemptToReturn:through:.  Note (I'm sure you know this
>> Igor)  that there are primitives such as the ThreadedFFIPlugin's call-out
>> primitive that very much expect to be invoked recursively and have no
>> problem with it.
>>
>
> Yes, i know it. But keep in mind, that reentrant FFI callout mechanism means
> *just* reentrant FFI callout code, it doesn't means that things, it will be
> calling, automagically become reentrant as well.
> And the above note about "Clément pointed out a really ugly problem" is good
> example of it :)
> And since you cannot predict/prevent/pretend/protect every single piece of
> code, that FFI users are going to call, there's no solution to that, unless
> users understand what they do and be aware of pitfalls and consequences.
>
> It is quite easy to say "meh.. your FFI don't works.. go away, come year
> later.. i will use other (put other language/software system there)"..
> mostly because of ignorance, that there's a limits on what FFI can do and
> what can't.
>
> --
> Best regards,
> Igor Stasenko.

Loading...