BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

EstebanLM
 
Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

LawsonEnglish
 
I mentioned to Eliot  ages ago that I saw what may have been a similar bug when I accidentally called the 32-bit version of OpenGL using 64-bit parameter variables. Nothing crashed, but things became *very* strange, with entire bits of functionality in Squeak itself simply ceasing to function.

I’ve been quite sick for the past few years, so I never followed up (sorry Elliot) in trying to figure out what exactly I had done to trigger the erratic behavior.

L

> On Mar 10, 2017, at 08:35, Esteban Lorenzano <[hidden email]> wrote:
>
>
> Hi,
>
> I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.
>
> And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).
>
> Here is the easiest way to reproduce it (in mac):
>
> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
> wget files.pharo.org/get-files/60/pharo64.zip
> wget files.pharo.org/get-files/60/sources.zip
> unzip pharo64-mac-latest.zip
> unzip pharo64.zip
> unzip sources.zip
> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"
>
> eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:
>
> SmallInteger(Object)>>primitiveFailed:
> SmallInteger(Object)>>primitiveFailed
> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
> GrafPort>>copyBits
> GrafPort>>image:at:sourceRect:rule:
> FormCanvas>>image:at:sourceRect:rule:
> FormCanvas(Canvas)>>drawImage:at:sourceRect:
> FormCanvas(Canvas)>>drawImage:at:
> VGTigerDemo>>runDemo
> VGTigerDemo class>>runDemo
> UndefinedObject>>DoIt
> OpalCompiler>>evaluate
> OpalCompiler(AbstractCompiler)>>evaluate:
> [ result := Smalltalk compiler evaluate: aStream.
> self hasSessionChanged
> ifFalse: [ self stdout
> print: result;
> lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
> BlockClosure>>on:do:
> EvaluateCommandLineHandler>>evaluate:
> EvaluateCommandLineHandler>>evaluateArguments
> EvaluateCommandLineHandler>>activate
> EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
> [ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
> BlockClosure>>on:do:
> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
> [ self
> handleArgument:
> (self arguments
> ifEmpty: [ '' ]
> ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
> BlockClosure>>on:do:
> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
> PharoCommandLineHandler>>activate
> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
> [ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>
> Any idea?
>
> thanks!
> Esteban

Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by EstebanLM
 
Hi Esteban,

    first of all hanks so much for the instructions.  It's so nice to just run the commands below and be able to reproduce immediately instead of having to root around through build servers, following links.  Thanks, it saves me a huge amount of time.  i really appreciate the effort.

Second, I find I can make the primitive fail much sooner if I pick up the main window and move it.  I was running it on one machine, wanted to run it on another with a different VM but the main window was in the way of the command.  So I moved the main window so I could read the command line and lo and behold it failed immediately.  So if I start the demo,and then pick up the window and move it, the primitive fails within very few seconds.  That could be a clue.

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by LawsonEnglish
 
Hi Lawson,



On Fri, Mar 10, 2017 at 7:52 AM, LEnglish <[hidden email]> wrote:

I mentioned to Eliot  ages ago that I saw what may have been a similar bug when I accidentally called the 32-bit version of OpenGL using 64-bit parameter variables. Nothing crashed, but things became *very* strange, with entire bits of functionality in Squeak itself simply ceasing to function.

I’ve been quite sick for the past few years, so I never followed up (sorry Elliot) in trying to figure out what exactly I had done to trigger the erratic behavior.
 
    No need to apologize!  I hope you find frequent respite within your illness, and that it doesn't irk you unbearably.  That's a very interesting failure mode!  Maybe we'll find time to look at it one day.

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Ben Coman
In reply to this post by Eliot Miranda-2
 
Maybe related?
http://forum.world.st/image-not-opening-td4938134.html

On Sat, Mar 11, 2017 at 4:13 AM, Eliot Miranda <[hidden email]> wrote:

>
> Hi Esteban,
>
>     first of all hanks so much for the instructions.  It's so nice to just run the commands below and be able to reproduce immediately instead of having to root around through build servers, following links.  Thanks, it saves me a huge amount of time.  i really appreciate the effort.
>
> Second, I find I can make the primitive fail much sooner if I pick up the main window and move it.  I was running it on one machine, wanted to run it on another with a different VM but the main window was in the way of the command.  So I moved the main window so I could read the command line and lo and behold it failed immediately.  So if I start the demo,and then pick up the window and move it, the primitive fails within very few seconds.  That could be a clue.
>
> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:
>>
>>
>> Hi,
>>
>> I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.
>>
>> And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).
>>
>> Here is the easiest way to reproduce it (in mac):
>>
>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>> wget files.pharo.org/get-files/60/pharo64.zip
>> wget files.pharo.org/get-files/60/sources.zip
>> unzip pharo64-mac-latest.zip
>> unzip pharo64.zip
>> unzip sources.zip
>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"
>>
>> eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:
>>
>> SmallInteger(Object)>>primitiveFailed:
>> SmallInteger(Object)>>primitiveFailed
>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
>> GrafPort>>copyBits
>> GrafPort>>image:at:sourceRect:rule:
>> FormCanvas>>image:at:sourceRect:rule:
>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>> FormCanvas(Canvas)>>drawImage:at:
>> VGTigerDemo>>runDemo
>> VGTigerDemo class>>runDemo
>> UndefinedObject>>DoIt
>> OpalCompiler>>evaluate
>> OpalCompiler(AbstractCompiler)>>evaluate:
>> [ result := Smalltalk compiler evaluate: aStream.
>> self hasSessionChanged
>>         ifFalse: [ self stdout
>>                         print: result;
>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
>> BlockClosure>>on:do:
>> EvaluateCommandLineHandler>>evaluate:
>> EvaluateCommandLineHandler>>evaluateArguments
>> EvaluateCommandLineHandler>>activate
>> EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
>> [ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
>> BlockClosure>>on:do:
>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>> [ self
>>         handleArgument:
>>                 (self arguments
>>                         ifEmpty: [ '' ]
>>                         ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
>> BlockClosure>>on:do:
>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>> PharoCommandLineHandler>>activate
>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>
>> Any idea?
>>
>> thanks!
>> Esteban
>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by EstebanLM
 
Hi Esteban,

   turns out it is a Slang bug affecting the Cogit, specifically pc-mapping.  The generateMapAt:start: method was being wrongly generated with the following types:

    unsigned char annotation;
    usqIntptr_t delta;
    sqInt i;
    AbstractInstruction *instruction;
    sqInt length;
    usqIntptr_t location;
    usqIntptr_t mapEntry;
    sqInt maxDelta;
    usqIntptr_t mcpc;

whereas they should be

    unsigned char annotation;
    sqInt delta;
    sqInt i;
    AbstractInstruction *instruction;
    sqInt length;
    usqInt location;
    sqInt mapEntry;
    sqInt maxDelta;
    usqIntptr_t mcpc;

Specifically Slang was getting the type of unsigned - unsigned wrong, answering unsigned instead of signed.  This is the operative expression:

      delta := mcpc - location / backEnd codeGranularity

where both mclc and location are unsigned.  I'm fixing this but now because it is in Slang, and hence will very probably cause pervasive changes throughout the generated source, I need to take care and not rush to a fix.  I'll keep you posted.



On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by EstebanLM
 
Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

EstebanLM
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolas Cellier
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot



Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolai Hess-3-2
 


2017-03-12 13:36 GMT+01:00 Nicolas Cellier <[hidden email]>:
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot





Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolas Cellier
 


2017-03-12 13:43 GMT+01:00 Nicolai Hess <[hidden email]>:
 


2017-03-12 13:36 GMT+01:00 Nicolas Cellier <[hidden email]>:
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot







Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolas Cellier
In reply to this post by Nicolai Hess-3-2
 
Ah, and the example does not work better in 32bits pharo, so no need to invoke a 64 bits problem.

Also, at image startup, there's a gray background. And last time I restarted the 32bits image, here is the kind of artefacts I got at first menu click:

Images intégrées 1

IMO, all your image startup problem (like crash at startup) are related.

2017-03-12 13:43 GMT+01:00 Nicolai Hess <[hidden email]>:
 


2017-03-12 13:36 GMT+01:00 Nicolas Cellier <[hidden email]>:
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot







Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
 
Hi Nicolas, Hi Esteban,

On Sun, Mar 12, 2017 at 6:31 AM, Nicolas Cellier <[hidden email]> wrote:
 
Ah, and the example does not work better in 32bits pharo, so no need to invoke a 64 bits problem.

Also, at image startup, there's a gray background. And last time I restarted the 32bits image, here is the kind of artefacts I got at first menu click:

Images intégrées 1

IMO, all your image startup problem (like crash at startup) are related.

[At the end of my message is something important for the author and maintainers of AthensCairoSurface class; if you're that person please read the end of my message, point c) below].


I agree.  I think that callbacks are very important.  Yesterday I reimplemented ownVM: and disownVM: for the non-threaded standard configuration (our current VMs) to preserve the argumentCount variable across the callback.  argumentCount is used by many primitives (via interpreterProxy methodArgumentCount) to cut-back the arguments on returning from the primitive (other primitives use a consent if they know what their argument count is).  Without doing this the argumentCount variable will be changed by the interpreter sends or interpreter primitives executed during a callback, and on return may be completely different from the value it had in the primitive that executed the callback(s).  Hence when the primitive that executes a callback tries to return it may pop the wrong number pif values off of the stack.

I fixed this so that ownVM, called when a callback is initiated, remembers the argumentCount and answers it encoded as the result of ownVM.  diwownVM is called when the callback returns to C and is given the result of ownVM as an argument.  So disownVM restores argumentCount to its value before the callback, which is presumably the right value for the primitive that is executing the callback.  [Note that in the threaded VM the argumentCount /and/ newMethod are remembered in per-thread data and restored when a thread takes ownership of the VM; i.e. in the threaded VM it does the right thing (tm) ].

Fixing this had several effects.  First the screen was a normal size, with a black background except in the VGTigerDemo screen area, second, the garbage pixels at the bottom of the screen disappeared, finally the crash happens regularly on the third complete revolution of the tiger's head.  Here's the VM code (I'll commit soon):

StackInterpreter>>ownVM: threadIndexAndFlags
<api>
<inline: false>
"This is the entry-point for plugins and primitives that wish to reacquire the VM after having
released it via disownVM or callbacks that want to acquire it without knowing their ownership
status.  While this exists for the threaded FFI VM we use it to reset the argumentCount after a callback.

Answer the argumentCount encoded as a SmallInteger if the current thread is the VM thread.
Answer -1 if the current thread is unknown to the VM and fails to take ownership."
| amInVMThread |
<var: 'amInVMThread' declareC: 'extern sqInt amInVMThread(void)'>
self cCode: [] inSmalltalk: [amInVMThread := 1. amInVMThread class].
self amInVMThread ifFalse:
[^-1].
self assert: primFailCode = 0.
^objectMemory integerObjectOf: argumentCount

StackInterpreter>>disownVM: flags
<api>
<inline: false>
"Release the VM to other threads and answer the current thread's index.

This is the entry-point for plugins and primitives that wish to release the VM while
performing some operation that may potentially block, and for callbacks returning
back to some blocking operation.  While this exists for the threaded FFI VM we use
it to reset the argumentCount after a callback."
self assert: ((objectMemory isIntegerObject: flags)
and: [(objectMemory integerValueOf: flags)
between: 0
and: (self argumentCountOfMethodHeader: -1)]).
self assert: primFailCode = 0.
argumentCount := objectMemory integerValueOf: flags.
^0

and this is the code in thunkEntry (the callback entry-point) that calls ownVM: and then disownVM:

long
thunkEntry(void *thunkp, sqIntptr_t *stackp)
{
    VMCallbackContext vmcc;
    VMCallbackContext *previousCallbackContext;
    int flags, returnType;
...
    if ((flags = interpreterProxy->ownVM(0)) < 0) {
        fprintf(stderr,"Warning; callback failed to own the VM\n");
        return -1;
    }

    if (!(returnType = setjmp(vmcc.trampoline))) {
        previousCallbackContext = getRMCC();
        setRMCC(&vmcc);
        vmcc.thunkp = thunkp;
        vmcc.stackp = stackp + 2; /* skip address of retpc & retpc (thunk) */
        vmcc.intregargsp = 0;
        vmcc.floatregargsp = 0;
        interpreterProxy->sendInvokeCallbackContext(&vmcc);
...
    }
    setRMCC(previousCallbackContext);
    interpreterProxy->disownVM(flags);

    switch (returnType) {

    case retword:   return vmcc.rvs.valword;
...

So while I have no fix for the bug,

a) I understand that there is a serious issue with callbacks not restoring sufficient state on return from callback (ideally both argumentCOunt /and/ newMethod should be preserved iff a primitive invoking a callback can fail after invoking a callback, but this is a /really/ bad idea; primitives should only fail if they have no effects, so this isn't an important issue; restoring argumentCOunt is essential and my fix does that).

b) the crash is now 100% repeatable and happens always on the third rotation of the tiger's head, so I'm optimistic I can understand the bug

c) the code that creates the callbacks waste cycles installing showSurface and unlockSurface callbacks that do nothing.  Instead both createUnlockSurfaceFn and createShowSurfaceFn should simply answer 0.  These callbacks don't need to happen and callbacks are complex enough that avoiding two of the three calklbacks involved in a bitblt could speed things up.

More info as I have it.


2017-03-12 13:43 GMT+01:00 Nicolai Hess <[hidden email]>:
 


2017-03-12 13:36 GMT+01:00 Nicolas Cellier <[hidden email]>:
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot











--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by Nicolas Cellier
 
Hi Nicolas, Hi Esteban, Hi Igor,

On Sun, Mar 12, 2017 at 5:36 AM, Nicolas Cellier <[hidden email]> wrote:
 
Couldn't it be that some Smalltalk memory has been relocated? (I'm thinking of the DisplayScreen bits)

That's indeed the main bug.  There are issues with callbacks that confuse the error reporting and make it look like the primSignal:andReturnAs:fromContext: primitive fails when in fact it is the copyBits primitive that is failing.  But the problem is that the copyBits primitive calls back through the lockSurfaces and unlockSurfaces callbacks, and when a scavenge happens this moves destForm or sourceForm and the primitive attempts to access them after the callback and hence ends up accessing bad data and causes the primitive to fail.

I have fixed lockSurfaces and unlockSurfaces to not access destForm and sourceForm after the lockSurfaces and unlockSurfaces calls (by making sourceHandle and destHandle inst vars of BitBltSimulation) but it's not fully fixed yet.  I think querySurfaces is still involved and then there's the showDisplayBits:Left:Top:Right:Bottom: at the end.

I think I should commit the fixes I have so far and we should work towards removing all stale accesses from BitBltSimulation>>copyBits and BitBltSimulation>>warpBits.

These fixes include changing  platforms/Cross/plugins/SurfacePlugin/SurfacePlugin.c so that the callback functions can be optional, which reduces the number of callbacks and ups the frame rate a bit.

2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <[hidden email]>:
 
Hi, 

this is lockSurfaceFn: 

createLockSurfaceFn
^ FFICallback 
signature: #(void * (void *handle, int *pitch, int x, int y, int w, int h))
block: [ :handle :pitch :x :y :w :h |
pitch signedLongAt: 1 put: (self get_stride: handle).
self get_data: handle ]

and

createUnlockSurfaceFn
^ FFICallback 
signature: #(int (void *handle, int x, int y, int w, int h))
block: [ :handle :x :y :w :h | 0 "Do nothing” ]

cheers!
Esteban

On 12 Mar 2017, at 03:34, Eliot Miranda <[hidden email]> wrote:

Hi Esteban,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

My original theory is wrong.  As you suspected it is something to do with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.  Can you tell me what the callback is and what code installs it into the lockSurfaceFn and unlockSurfaceFn?


And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



-- 
_,,,^..^,,,_
best, Eliot







--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by EstebanLM
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolai Hess-3-2
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolas Cellier
In reply to this post by Eliot Miranda-2
 
Hi Eliot,
For the DisplayScreen bits, since it is a rather large data usually, I would be in favour to have it pinned in old space, and this could happen from image side.
For every other use case, I don't know. b) looks simpler to me, but you seem to say it's a question of performance.
From what I understand, this could happen thru any kind of callback and also in threaded primitives/FFI.
I don't have enough neuron available to think of d)

2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Nicolai Hess-3-2
In reply to this post by Nicolai Hess-3-2
 


2017-03-14 16:56 GMT+01:00 Nicolai Hess <[hidden email]>:


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

Wouldn't it be possible to just *disable* the GC (scavange) when entering a primitive ?
 


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot



Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Clément Béra
 


On Tue, Mar 14, 2017 at 8:57 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:56 GMT+01:00 Nicolai Hess <[hidden email]>:


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 


That's clearly the best solution unless someone figures out a better d) solution.
 
d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

Wouldn't it be possible to just *disable* the GC (scavange) when entering a primitive ?

If you disable only the scavenges and some of the objects are in old space and a fullGC happens, then the same problem happens.

During the call-backs, an infinite amount of objects can be allocated. Where do you allocate such objects if you disable the GC ?

 


 

I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot





Reply | Threaded
Open this post in threaded view
|

Re: BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda-2
In reply to this post by Nicolai Hess-3-2
 


On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <[hidden email]> wrote:
 


2017-03-14 16:46 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi Esteban, Hi Igor, Hi All,

On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <[hidden email]> wrote:

Hi,

I’m tumbling into an error in Pharo, because we use callbacks intensively, in Athens(cairo)-to-World conversion in particular, and people is sending always their crash reports… we made the whole conversion a lot more robust since problems started to arise, but now I hit a wall I cannot solve: I think problem is in something in callbacks.

And problem is showing very easy on 64bits (while in 32bits it takes time and is more random).

 I responded in the "image not opening" thread, but it's the same problem.  I really want to hear what y'all think because I'll happily implement a fix, but I want to know which one y'all think is a good idea.  Here's my reply (edits between [ & ] to add information):


On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <[hidden email]> wrote:

I'm pretty confident [I know] this is to do with bugs in the Athens surface code which assumes that callbacks can be made in the existing copyBits and warpBits primitive.  They can't do this safely because a GC (scavenge) can happen during a callback, which then causes chaos when the copyBits primitive tries to access objects that have been moved under its feet.

I've done work to fix callbacks so that when there is a failure it is the copyBits primitive that fails, instead of apparently the callback return primitive.  One of the apparent effects of this fix is to stop the screen opening up too small; another is getting the background colour right, and yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more work is required to fix the copyBits and warpBits primitives.  There are a few approaches one might take:

a)  fixing the primitive so that it saves and restores oops around the callbacks using the external oop table [InterpreterProxy>>addGCRoot: & removeGCRoot:].  That's a pain but possible. [It's a pain because all the derived pointers (the start of the destForm, sourceForm, halftoneForm and colorMapTable) must be recomputed also, and of course most of the time the objects don't move; we only scavenge about once every 2 seconds in normal running]

b) fixing the primitive so that it pins the objects it needs before ever invoking a callback [this is a pain because pinning an object causes it to be tenured to old space if it is in new space; objects can't be pinned in new space, so instead the pin operation forwards the new space object to an old space copy if required and answers its location in old space, so a putative withPinnedObjectsDo: operation for the copyBits primitive looks like
withPinnedFormsDo: aBlock
<inline: #always>
self cppIf: SPURVM & false
ifTrue:
[| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
 (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
[bitBltOop := interpreterProxy pinObject: bitBltOop].
(destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
[destForm := interpreterProxy pinObject: destForm].
(sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
[sourceForm := interpreterProxy pinObject: sourceForm].
(halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
[halftoneForm := interpreterProxy pinObject: halftoneForm].
aBlock value.
 bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
ifFalse: [aBlock value]
   and tenuring objects to old space is not ideal because they are only collected by a full GC, so doing this would at least tenure the bitBltOop which is very likely to be in new space]

c) fixing the primitive so that it uses the scavenge and fullGC counters in the VM to detect if a GC occurred during one of the callbacks and would fail the primitive [if it detected that a GC had occurred in any of the surface functions].   The primitive would then simply be retried. 

d) ?

Wouldn't it be possible to just pause the GC (scavange) when entering a primitive ?

I don't think so.  There is a callback occurring.  If the computation executed by the callback requires a GC the application will abort if a GC cannot be done.  Right?  This is the case here.


I like c) as it's very lightweight, but it has issues.  It is fine to use for callbacks *before* cop[yBits and warpBits move any bits (the lockSurface and querySurface functions).  But it's potentially erroneous after the unlockSurface primitive.  For example, a primitive which does an xor with the screen can't simply be retried as the first, falling pass, would have updated the destination bits but not displayed them via unlockSurface.  But I think it could be arranged that no objects are accessed after unlockSurface, which should naturally be the last call in the primitive (or do I mean showSurface?).  So the approach would be to check for GCs occurring during querySurface and lockSurface, failing if so, and then caching any and all state needed by unlockSurface and showSurface in local variables.  This way no object state is accessed to make the unlockSurface and showSurface calls, and no bits are moved before the queryDurface and lockSurface calls.

If we used a failure code such as #'object may move' then the primitives could answer this when a GC during callbacks is detected and then the primitive could be retried only when required.


[Come on folks, please comment.  I want to know which idea you like best.  We could fix this quickly.  But right now it feels like I'm talking to myself.]


Here is the easiest way to reproduce it (in mac):

wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
wget files.pharo.org/get-files/60/pharo64.zip
wget files.pharo.org/get-files/60/sources.zip
unzip pharo64-mac-latest.zip
unzip pharo64.zip
unzip sources.zip
./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval "VGTigerDemo runDemo"

eventually (like 5-6 seconds after, if not immediately), you will have a stack like this:

SmallInteger(Object)>>primitiveFailed:
SmallInteger(Object)>>primitiveFailed
SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
GrafPort>>copyBits
GrafPort>>image:at:sourceRect:rule:
FormCanvas>>image:at:sourceRect:rule:
FormCanvas(Canvas)>>drawImage:at:sourceRect:
FormCanvas(Canvas)>>drawImage:at:
VGTigerDemo>>runDemo
VGTigerDemo class>>runDemo
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
[ result := Smalltalk compiler evaluate: aStream.
self hasSessionChanged
        ifFalse: [ self stdout
                        print: result;
                        lf ] ] in EvaluateCommandLineHandler>>evaluate: in Block: [ result := Smalltalk compiler evaluate: aStream....
BlockClosure>>on:do:
EvaluateCommandLineHandler>>evaluate:
EvaluateCommandLineHandler>>evaluateArguments
EvaluateCommandLineHandler>>activate
EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ aCommandLinehandler activateWith: commandLine ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
[ self
        handleArgument:
                (self arguments
                        ifEmpty: [ '' ]
                        ifNotEmpty: [ :arguments | arguments first ]) ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
[ super activateWith: aCommandLine ] in PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]

Any idea?

thanks!
Esteban



--
_,,,^..^,,,_
best, Eliot






--
_,,,^..^,,,_
best, Eliot
12