A bug due to doing the right thing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

A bug due to doing the right thing

Bryce Kampjes

I decided to move the native instruction pointer into the same slot as
the interpreted instruction pointer as this appeared to be the right
thing to do.

The advantage of maintaining the interpreted instruction pointer was
an natively compiled context could at any time, except when it was
executing, be executed by the interpreted. That has been a great help
because it made integration much easier back before blocks as I only
needed to handle the cases that I wanted to. Unfortunately natively
compiled blocks can not use the same format as interpreted blocks
as there's no spare slot to use for the natively compiled instruction
pointer.

One of the reasons that moving the instruction pointer is the correct
thing to do is it will not introduce new bugs but it will make rare
bugs more common. A bug that would only occur for a native block will
now occur for native methods as well. This is a good thing as it makes
the bugs easier to find. Another reason is I'll need to change the
native method context format to introduce full method inlining.

One of the remaining bugs that showed up after moving the native
instruction pointers was happening due to the integration code for
process switching. Originally there had been no such code however
as a native method context could be interpreted some problems were
hidden.


I think I've just figured out this bug. There's one compiled method
Delay>>wait. To create the bug it's also necessary to run the
GraphVizBaseTests. The system seems to survive fine executing normally
even though Delay>>wait is called often.  Here's a log of it:



  7ba84330 Entering exuperyReturn Delay>wait
  7ba84330 Entering primitiveWait Delay>wait
  7ba84330 entering transferTo: from Delay>wait
  7bf95d1c Entering exuperyReturn Delay>wait
  7bf95d1c Exiting exuperyReturn [] in
     UnixOSProcessAccessor>grimReaperProcess
  7bf95d1c transfered to [] in
     UnixOSProcessAccessor>grimReaperProcess
  exiting transferTo:
  7bf95d1c Exiting primitiveWait [] in
     UnixOSProcessAccessor>grimReaperProcess
  7bf95d1c Exiting exuperyReturn BlockContext>repeat
  7bf95d1c primitiveValue failed in BlockContext>repeat
  7bf95d1c primitiveValue failed in BlockContext>repeat

The first number is the address of the process object.

What's happening is the first process returns into a native
Delay>>wait which calls Semaphore>>wait (the primitiveWait in the
logs). The primitiveWait transfers execution to the new process then
tries to execute native code (so an interpreted process can transfer
control to a natively compiled process). It executes the return from
method code natively moving control into the block in
grimReaperProcess.

The primitive wait returns back into the native code which called
primitiveWait. This native code then natively performs a method return
moving execution up to BlockContext>>repeat.

Block context repeat now tries to re-execute the block which fails
because it has never returned so appears to be still executing.

The bug is because the simplistic way I tried to introduce basic
support for returning into compiled methods doesn't handle the case
where the transferring method was compiled as well as the returning
method.

Up until now, I've been able to ignore how process switching happens
because it always happens on the other side of a message send. I'm
sure that there's been a few subtle unrepeatable bugs caused by me
ignoring it.

The solution is to redo the integration code so that it'll correctly
handle both leaving native code and leaving interpreted code
correctly. It shouldn't be hard but it will require a bit more
understanding than I had when I tried the first time.


Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery