An event driven Squeak VM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

An event driven Squeak VM

Andreas.Raab
 
Folks -

I had an interesting thought today that I'd like to run by you because I
think it might just work. I have been thinking for a long time how to
make the Squeak VM be "truly event driven" that is invoke it in response
to OS or other events instead of having the VM poll. There are lots of
good reasons for this starting from not blocking when popping up an OS
context menu or standard dialog, over being able to embed the VM into
other apps (browser plugin etc), up to properly dealing with
suspend/resume. There are various problems that could be dealt with more
easily if the VM would be truly event driven.

Today it occurred to me that there might be a relatively simple way to
deal with that problem merely by having interpret() run until "there is
no more work to do" and return from interpret() when it's all said and
done. The trick is that instead of running the idle loop, the VM would
determine that it has no more work to do when there is no runnable
process, so when it finds that there is no runnable process it would
return from interpret saying "my work's done here, there is no more code
to run at this point, ask me again when an external event comes in".

The changes would be fairly straight forward: First, nuke the idle loop
and allow wakeHighestPriority to return nil when there's no runnable
process. Second, have transferTo: do a longjmp to the registered
vmExitBuf to leave interpret(). Third, have interpret register the
vmExitBuf and wake up the highest priorty process like here:

interpret
     "install jmpbuf for main interpreter"
     (self setjmp: vmExitBuf) == 0 ifTrue:[
         self checkForInterrupts. "timers etc"
         "transferTo: longjmps if arg is nil so no need to check"
         self transferTo: self wakeHighestPriority.

         "this is the current interpret() implementation"
         self internalizeIPandSP.
        self fetchNextBytecode.
        [true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable].

     ].

At this point we can write a client loop that effectively looks like:

   /* run the interpreter */
   while(!done) {
     /* check for new events */
     ioProcessEvents();
     /* run processes resulting from the events */
     interpret();
   }

Now, obviously this is inefficient, we'd want to replace the
ioProcessEvents() call with something more elaborate that reacts to the
incoming OS events, takes the next scheduled delay into account, checks
for socket handles etc. But I'm sure you're getting the idea. Instead of
wasting our time in the idleProcess, we just return when there's no more
work to do and it's up to the caller to run interpret() as often or as
rarely as desired.

I also think that this scheme could be made backwards compatible by
ensuring that we never call interpret() recursively. In this case an
"old" image with the idle process would run the way it does today, and a
"new" image without the idle process would live in the shiny new event
driven world and return as needed.

What do you think? Any reasons why this wouldn't work?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

johnmci
 

On 2009-11-10, at 12:00 AM, Andreas Raab wrote:
> Now, obviously this is inefficient, we'd want to replace the  
> ioProcessEvents() call with something more elaborate that reacts to  
> the incoming OS events, takes the next scheduled delay into account,  
> checks for socket handles etc. But I'm sure you're getting the idea.  
> Instead of wasting our time in the idleProcess, we just return when  
> there's no more work to do and it's up to the caller to run interpret
> () as often or as rarely as desired.

Well I'm not sure what exactly you are trying to fix or optimize, but  
let me review what happens for the rest of the platforms.

When ioRelinquishProcessorForMicroseconds is called and given a bogus  
magic wait value,  on OS-X we
calculate the wait time based on getNextWakeupTick() and the current  
time with regard to if the
getNextWakeupTick() is zero or in the past.

The getNextWakeupTick() is the time in ms that  the Delay logic thinks  
it has to wait for the set of delays sorted by time to terminate.
I note that this value could be zero on non-morphic systems, but on  
morphic it's always about 16-20 ms because the
Morphic stepper logic wakes up every 1/50 second.

I believe on unix system it just uses the bogus wait value which is  
1000 microseconds.

We then schedule a wait using either a somewhat correct wait value or  
in the unix/linux case 1000 microseconds.

How long the wait takes is determined by the wait logic and the flavor  
of unix, it could be 1 ms, or it could be 10, or 100 ms.
Apple attempts to ensure the time you ask to wait will be the time  
given. Typically unix/linux users see high CPU usage for idle squeak  
images since the wait time could be 1ms, but
might be 10 ms.

Now the wait can be terminated early because a interrupt event (i/o,  
sockets, ui) happen. if so any async file io, or socket interrupts are  
serviced immediately setting flags/semaphores
for the squeak vm to service within the next few ms.

Issues:
If an external pthread signals a squeak semaphore this does not wake  
the sleeping squeak vm pthread.

also see
http://isqueak.org/ioRelinquishProcessorForMicroseconds


Now as for the proposed changes, we would just call the  
ioRelinquishProcessorForMicroseconds() logic after the interpreter()  
ends.
On register rich machines (ancient powerpc machines this change would  
be technically more expensive since it loads up 20 some registers
on the entry to interpret() but give it's call rate of 50 times a  
second no-one will notice.


--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Andreas.Raab
 
John M McIntosh wrote:
> Well I'm not sure what exactly you are trying to fix or optimize

Sorry for not being clear. I'm trying to address several concrete problems:

1) Dragging a Squeak main window by its label currently blocks the VM.
Sounds stops playing, animations stop animating etc.

2) Opening OS file dialogs or context menus block the VM. Same issues.
You've had this problem in Sophie, we've added a specific solution for
Teleplace and I'd like to generalize this into a generic solution.

3) Make it easy to embed a Squeak VM by making it possible to deliver an
event and run the resulting code. Allowing for example to run the VM as
a browser plugin by calling interpret() in response to the
browser-delivered events.

Abstractly speaking, I am trying to turn around the way interpret() is
being run. Instead of calling interpret() and it then calling
ioProcessEvents() I want to call interpret in response to events
generated elsewhere and have it return when it's done handling the
events up to this point. This addresses all the issues mentioned above.

> Now as for the proposed changes, we would just call the
> ioRelinquishProcessorForMicroseconds() logic after the interpreter() ends.
> On register rich machines (ancient powerpc machines this change would be
> technically more expensive since it loads up 20 some registers
> on the entry to interpret() but give it's call rate of 50 times a second
> no-one will notice.

For example. I am *not* proposing to implement the "trivial client loop"
that I was talking about earlier. Rather, what I would do on Windows is
call interpret() from the appropriate WNDPROC in response to a UI event.
The main loop would be a "standard windows event loop" without any
references to interpret(). I would expect that the other platforms do
whatever is appropriate, the trivial client loop was intended purely as
an example.

As a consequence of the changes, you would be able to have other main
loops (via MFC, Cocoa, wxWidgets etc. you name it) and call interpret()
in response to incoming events.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko
In reply to this post by Andreas.Raab

2009/11/10 Andreas Raab <[hidden email]>:

>
> Folks -
>
> I had an interesting thought today that I'd like to run by you because I
> think it might just work. I have been thinking for a long time how to make
> the Squeak VM be "truly event driven" that is invoke it in response to OS or
> other events instead of having the VM poll. There are lots of good reasons
> for this starting from not blocking when popping up an OS context menu or
> standard dialog, over being able to embed the VM into other apps (browser
> plugin etc), up to properly dealing with suspend/resume. There are various
> problems that could be dealt with more easily if the VM would be truly event
> driven.
>
> Today it occurred to me that there might be a relatively simple way to deal
> with that problem merely by having interpret() run until "there is no more
> work to do" and return from interpret() when it's all said and done. The
> trick is that instead of running the idle loop, the VM would determine that
> it has no more work to do when there is no runnable process, so when it
> finds that there is no runnable process it would return from interpret
> saying "my work's done here, there is no more code to run at this point, ask
> me again when an external event comes in".
>
> The changes would be fairly straight forward: First, nuke the idle loop and
> allow wakeHighestPriority to return nil when there's no runnable process.
> Second, have transferTo: do a longjmp to the registered vmExitBuf to leave
> interpret(). Third, have interpret register the vmExitBuf and wake up the
> highest priorty process like here:
>
> interpret
>    "install jmpbuf for main interpreter"
>    (self setjmp: vmExitBuf) == 0 ifTrue:[
>        self checkForInterrupts. "timers etc"
>        "transferTo: longjmps if arg is nil so no need to check"
>        self transferTo: self wakeHighestPriority.
>
>        "this is the current interpret() implementation"
>        self internalizeIPandSP.
>        self fetchNextBytecode.
>        [true] whileTrue: [self dispatchOn: currentBytecode in:
> BytecodeTable].
>
>    ].
>
> At this point we can write a client loop that effectively looks like:
>
>  /* run the interpreter */
>  while(!done) {
>    /* check for new events */
>    ioProcessEvents();
>    /* run processes resulting from the events */
>    interpret();
>  }
>
> Now, obviously this is inefficient, we'd want to replace the
> ioProcessEvents() call with something more elaborate that reacts to the
> incoming OS events, takes the next scheduled delay into account, checks for
> socket handles etc. But I'm sure you're getting the idea. Instead of wasting
> our time in the idleProcess, we just return when there's no more work to do
> and it's up to the caller to run interpret() as often or as rarely as
> desired.
>
> I also think that this scheme could be made backwards compatible by ensuring
> that we never call interpret() recursively. In this case an "old" image with
> the idle process would run the way it does today, and a "new" image without
> the idle process would live in the shiny new event driven world and return
> as needed.
>
> What do you think? Any reasons why this wouldn't work?
>
First, a big +1 :)

But i don't like using setjmp/longjumps and instead it would be good
to have a way to tell to return normally.

I did this in Hydra, by using:

internalQuickCheckForInterrupts
        "With new event-driven architecture in HydraVM, here we need to check
        if there any event in event queue, and if it there, return from
interpret() function"

        self inline: true.

        (self ioIsQueueEmpty: eventQueue cReference) ifFalse: [
                self externalizeIPandSP.
                self returnFromInterpret.
        ]

where #returnFromInterpret is a macro:

#define returnFromInterpret() return

of course, it should be used only inside intepret() function,
otherwise it will return from something else with unpredictable
results :)

It could be generalized by adding 'needToLeaveInterpret' flag so, in
the code above it won't check the event queue (since Squeak VM doesn't
have one), but this flag instead. Then any primitive could tell to
leave the interpret() when its needed (#forceInterruptCheck comes to
mind), and since each primitive call followed by
#internalQuickCheckForInterrupts, we can be sure that we are indeed
return from interpret() at safe point and without need in using
longjumps.
Take a look at senders of #internalQuickCheckForInterrupts in Hydra VMMaker.

And indeed, an outer loop which calling interpret() looking similar to
what you proposed:

interpretLoop
        [true] whileTrue: [
                self interpret.
                self unlockInterpreter. "give a chance other threads to do something with us"
                self lockInterpreter.
                self handleEvents.
        ]

Another thing, in Hydra , a VMMaker code never calls ioProcessEvents directly.
Instead i'm using a native thread, which runs in background and
periodically producing an event , which when handled, calls
ioProcessEvents(), but this logic is totally out of control of core
interpreter and platform specific.
Also, same loop cares about producing a delay semaphore signaling.

On platforms, which not support threading, i suggest just change the
#checkForInterrupts, and instruct to leave interpret() function when
counter goes zero, by setting the 'needToLeaveInterpret' flag. Then, a
platform-specific code could call ioProcessEvents() or whatever it
likes to, in the loop which enclosing the call to intepret().

Apart from this, lies the callbacks support. I prefer the function
which expects a callback to be called outside interpret().
Something like following:

needToLeaveInterpret := 1.
interpreterProxy callThisFunctionWhenOutsideOfInterpret: #myFunc

and myFunc() calls something which expecting the callback:

myFunc() {

  result = callSomeExternalFunction(mycallback);

}

mycallback( x, y ,z )
{
/*  ... convert/preserve passed-in arguments or whatever needed */
....
/*  here , we are outside the 'interpret()' but inside some
platform-specific interpreterLoop();
  lets call special interpreterLoopInsideCallback() , which returns
when user code calls leaveCallback()...*/

 interpreterLoopInsideCallback();

/*  ... convert return value  or whatever needed */
 return someResult;
}

and voila, no recursive calls to interpret(), and no need in setjmp/longjump :)

P.S. one of the major reasons to leave interpret() and do something
outside of it, that we are much safer at this point to call any
platform-specific code or external functions without any risk that VM
state (like context/process/GC) will be damaged or handled improperly.
We will benefit much from such separation.

> Cheers,
>  - Andreas
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

johnmci
In reply to this post by Andreas.Raab
 
Ok, well
On 2009-11-10, at 12:03 PM, Andreas Raab wrote:

> John M McIntosh wrote:
>> Well I'm not sure what exactly you are trying to fix or optimize
>
> Sorry for not being clear. I'm trying to address several concrete  
> problems:
>
> 1) Dragging a Squeak main window by its label currently blocks the  
> VM. Sounds stops playing, animations stop animating etc.
>
> 2) Opening OS file dialogs or context menus block the VM. Same  
> issues. You've had this problem in Sophie, we've added a specific  
> solution for Teleplace and I'd like to generalize this into a  
> generic solution.

These are windows issues?

Ok for os-x these problems don't go away. Why? Well the Squeak is run  
as a part of the main UI thread.  The choice here is:

(a) Run the squeak VM from time to time as a subtask of the UI thread,  
in the past this gave poor performance since the overhead is high, and  
the setup for interp in powerpc was high.
However in this model as it services a UI task  like menu interaction  
then squeak doesn't run.

(b) Run the main UI event processing as a subtask of the Squeak  
thread, the current mode. This stops the VM from running as it  
services UI events, again the VM is calling ioProcessEvent and that
is busy servicing the menu interaction. Cost is cheaper since we just  
ask if there are any events pending every 16ms or so.  This is why  
when on os-x when the morphic task is dead then the VM slowly
responds to window/menu interactions since there is a 1/2 second or so  
ioProcessEvent callback in checkForInterrupts()

So why not run the Squeak VM as a separate thread from the UI Thread,  
yes we do that for the iPhone. However any UI interacts from the VM  
has to run on the UI thread, fine you can work with that.
Now add in desire to run Objective-C methods, easy to deadlock between  
UI Thread callbacks and Squeak thread running message send on UI  
Thread.  Besides all that hassle this applies to FFI
calls where when building Sophie we discovered it was just illegal to  
call Quicktime routines from a background thread.   Starts to get  
complicated, lets add in the fact you cann't even create an
object in a background thread for use on the foreground thread because  
UI objects have semaphores which are thread aware and a cross thread  
lock isn't allowed.


>
> 3) Make it easy to embed a Squeak VM by making it possible to  
> deliver an event and run the resulting code. Allowing for example to  
> run the VM as a browser plugin by calling interpret() in response to  
> the browser-delivered events.

Been there done that, I think the code is still lurking, something  
about asking if we should terminate the interp() loop for the browser  
to run.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Andreas.Raab
 
John M McIntosh wrote:
> Ok, well

So does that mean you are:
[ ] in favor,
[ ] against such a change,
[ ] don't care,
[ ] think it can't work at all?
I'm not clear about your reply you seem to be saying all of the above at
some point or other.

Cheers,
   - Andreas

> On 2009-11-10, at 12:03 PM, Andreas Raab wrote:
>
>> John M McIntosh wrote:
>>> Well I'm not sure what exactly you are trying to fix or optimize
>>
>> Sorry for not being clear. I'm trying to address several concrete
>> problems:
>>
>> 1) Dragging a Squeak main window by its label currently blocks the VM.
>> Sounds stops playing, animations stop animating etc.
>>
>> 2) Opening OS file dialogs or context menus block the VM. Same issues.
>> You've had this problem in Sophie, we've added a specific solution for
>> Teleplace and I'd like to generalize this into a generic solution.
>
> These are windows issues?
>
> Ok for os-x these problems don't go away. Why? Well the Squeak is run as
> a part of the main UI thread.  The choice here is:
>
> (a) Run the squeak VM from time to time as a subtask of the UI thread,
> in the past this gave poor performance since the overhead is high, and
> the setup for interp in powerpc was high.
> However in this model as it services a UI task  like menu interaction
> then squeak doesn't run.
>
> (b) Run the main UI event processing as a subtask of the Squeak thread,
> the current mode. This stops the VM from running as it services UI
> events, again the VM is calling ioProcessEvent and that
> is busy servicing the menu interaction. Cost is cheaper since we just
> ask if there are any events pending every 16ms or so.  This is why when
> on os-x when the morphic task is dead then the VM slowly
> responds to window/menu interactions since there is a 1/2 second or so
> ioProcessEvent callback in checkForInterrupts()
>
> So why not run the Squeak VM as a separate thread from the UI Thread,
> yes we do that for the iPhone. However any UI interacts from the VM has
> to run on the UI thread, fine you can work with that.
> Now add in desire to run Objective-C methods, easy to deadlock between
> UI Thread callbacks and Squeak thread running message send on UI
> Thread.  Besides all that hassle this applies to FFI
> calls where when building Sophie we discovered it was just illegal to
> call Quicktime routines from a background thread.   Starts to get
> complicated, lets add in the fact you cann't even create an
> object in a background thread for use on the foreground thread because
> UI objects have semaphores which are thread aware and a cross thread
> lock isn't allowed.
>
>
>>
>> 3) Make it easy to embed a Squeak VM by making it possible to deliver
>> an event and run the resulting code. Allowing for example to run the
>> VM as a browser plugin by calling interpret() in response to the
>> browser-delivered events.
>
> Been there done that, I think the code is still lurking, something about
> asking if we should terminate the interp() loop for the browser to run.
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>   Twitter:  
> squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

johnmci
 

On 2009-11-10, at 4:35 PM, Andreas Raab wrote:

> John M McIntosh wrote:
>> Ok, well
>
> So does that mean you are:
> [ ] in favor,
> [ ] against such a change,
> [ ] don't care,
> [ ] think it can't work at all?
> I'm not clear about your reply you seem to be saying all of the  
> above at some point or other.

Er, all of the above, from early practice on slow machines it gave  
sluggish response to early forms of the browser plugin.

However if you feel this will fix issues on the Windows platform then  
it's not a problem for the os-x and iphone version to
work with it.  For the iPhone perhaps we flip the model to call  
interpret() for some set time on each pass thru the runloop.

Current we have

     #define browserPluginReturnIfNeeded() if (plugInTimeToReturn())  
{ReturnFromInterpret();}
     #define browserPluginInitialiseIfNeeded()

plugInTimeToReturn on os-x is defined to check flags that are set if  
the VM is running as a sub-process of the browser
and the browser has terminated, thus we want to terminate the VM.

So I'm wondering what the check will be, how many times per second  
will it be executed, and will
interpret() run for a set number of milliseconds, a suggested amount,  
or a number of bytes codes?
Or if I read correctly we exit from interpret() when there are no  
processes to run, then go back in
when we think an external event happens, or a Delay terminates.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
 


On Tue, Nov 10, 2009 at 5:56 PM, John M McIntosh <[hidden email]> wrote:


On 2009-11-10, at 4:35 PM, Andreas Raab wrote:

John M McIntosh wrote:
Ok, well

So does that mean you are:
[ ] in favor,
[ ] against such a change,
[ ] don't care,
[ ] think it can't work at all?
I'm not clear about your reply you seem to be saying all of the above at some point or other.

Er, all of the above, from early practice on slow machines it gave sluggish response to early forms of the browser plugin.

However if you feel this will fix issues on the Windows platform then it's not a problem for the os-x and iphone version to
work with it.  For the iPhone perhaps we flip the model to call interpret() for some set time on each pass thru the runloop.

Current we have

   #define browserPluginReturnIfNeeded() if (plugInTimeToReturn()) {ReturnFromInterpret();}
   #define browserPluginInitialiseIfNeeded()

plugInTimeToReturn on os-x is defined to check flags that are set if the VM is running as a sub-process of the browser
and the browser has terminated, thus we want to terminate the VM.

So I'm wondering what the check will be, how many times per second will it be executed, and will
interpret() run for a set number of milliseconds, a suggested amount, or a number of bytes codes?
Or if I read correctly we exit from interpret() when there are no processes to run, then go back in
when we think an external event happens, or a Delay terminates.


The latter.  The ideal thing for the VM to do when it discovers it has no processes to run is to call select, poll et al, with a timeout determined from the pending delay.  This the VM will wake up as soon as it has work to do, either because input is available or a delay has expired.  This is more difficult on platforms like Windows where there is no common wait-for-input mechanism other than increasingly obsolete WaitNextEvent style interfaces (the problem being using an event-driven non-blocking socket interface, yuck).

With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.

Andreas, note that my threaded work already arranges to exit the interpreter/JIT combination wen a thread has nothing to do.  We should coordinate changes so that we maintain compatibility.

Re callbacks, an approach like Alien's is a good one.  I'm not quite at the stage with the threaded FFI work to be implementing callbacks, but it is on my list and the Alien approach is a general one.
 
--

===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

johnmci
 

On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:

> With the threaded Squeak VM I'm working on one can go one better and  
> have a number of image-level processes that block in the FFI and a  
> number of worker threads in the VM that block on OS semaphores  
> waiting for the VM to give them something to do.

Obviously now you have to give a bit more details on this. Is it like  
the hydra VM? Or entirely different?


--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
 


On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:

On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:

With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.

Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?

Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.

The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.

On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.

Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.

In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.

The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.

Having cheap non-blocking calls allows e.g.
- the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
- socket calls to be blocking calls in the image
- Smalltalk code to call select/poll/WaitForMultipleEvents

There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.

HTH
eliot



--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko

2009/11/11 Eliot Miranda <[hidden email]>:

>
>
>
> On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>>
>> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>>
>>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>>
>> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>
> Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
> The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
> On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
> Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
> In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
> The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
> Having cheap non-blocking calls allows e.g.
> - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
> - socket calls to be blocking calls in the image
> - Smalltalk code to call select/poll/WaitForMultipleEvents
> There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
> HTH
> eliot

I used a mutex in Hydra (each interpreter has own mutex), so any
operation, which requires synchronization should be performed
only after obtaining the mutex ownership.
And sure, if crafted carefully, one could release the mutex before
doing an external call, and "try" get it back again after call
completed.
If use mutexes, provided by OS, then you don't need a heartbeat
process, obviously because you can simply put wait on mutex. So, i
suppose you introducing the heardbeat to minimize the overhead of
using synchronization primitives provided by OS, and instead using a
low-level assembly code.

Just one minor thing - you mentioned the table of threads. What if
some routine creating a new thread, which get unnoticed by VM, so its
not registered in the VM 'threads' table,  but then such thread
attempts to obtain an ownership on interpreter somehow?

About inter-image communication in Hydra. The main problem that you
need to pass a buffer between heads, so you need to get a lock on a
recepient, while still keeping a lock on sender interpreter. But this
could lead to deadlock, if recepient in own turn attempts to do the
same.
So, the solution, unfortunately, is to copy buffer to C heap (using
malloc().. yeah :( ), and pass an event with pointer to such buffer,
which then could be handled by recepient as soon as it ready to do so,
in event handling routine.

One more thing:
  socket calls to be blocking calls in the image

Assuming that VM use blocking sockets, then call will block the thread
& some of the image-side process.
Then hearbeat thread at some point sees that VM has no owning thread
and so, allows another thread, waiting in the queue to take ownership
on VM.
But what if there is no such thread? There is a choice: allocate new
native thread and let it continue running VM, or just ignore &  skip
over for the next heat beat.
I'd like to hear what you choose. Because depending from direction
taken, on server image, which simultaneously serves, say 100
connections you may end up either with 100 + 1 native threads, or less
(fixed) number of them but with risk to unable to run any VM code
until some of the blocking calls completes.

I'd like to note that either of above alternatives having a quite bad
scalability potential.
I'd prefer to have a pool of threads, each of them serving N
connections. The size of threads pool should be 2x-3x number of
processor cores on host, because making more than that will not make
any real difference, since single core can serve only single native
thread while others will just consume the memory resources, like
address space etc.

>>
>>
>> --
>> ===========================================================================
>> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
>> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
>> ===========================================================================
>>


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
 


On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:

2009/11/11 Eliot Miranda <[hidden email]>:
>
>
>
> On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>>
>> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>>
>>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>>
>> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>
> Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
> The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
> On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
> Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
> In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
> The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
> Having cheap non-blocking calls allows e.g.
> - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
> - socket calls to be blocking calls in the image
> - Smalltalk code to call select/poll/WaitForMultipleEvents
> There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
> HTH
> eliot

I used a mutex in Hydra (each interpreter has own mutex), so any
operation, which requires synchronization should be performed
only after obtaining the mutex ownership.
And sure, if crafted carefully, one could release the mutex before
doing an external call, and "try" get it back again after call
completed.
If use mutexes, provided by OS, then you don't need a heartbeat
process, obviously because you can simply put wait on mutex. So, i
suppose you introducing the heardbeat to minimize the overhead of
using synchronization primitives provided by OS, and instead using a
low-level assembly code.

Just one minor thing - you mentioned the table of threads. What if
some routine creating a new thread, which get unnoticed by VM, so its
not registered in the VM 'threads' table,  but then such thread
attempts to obtain an ownership on interpreter somehow?

This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.

 
About inter-image communication in Hydra. The main problem that you
need to pass a buffer between heads, so you need to get a lock on a
recepient, while still keeping a lock on sender interpreter. But this
could lead to deadlock, if recepient in own turn attempts to do the
same.
So, the solution, unfortunately, is to copy buffer to C heap (using
malloc().. yeah :( ), and pass an event with pointer to such buffer,
which then could be handled by recepient as soon as it ready to do so,
in event handling routine.

But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
 
One more thing:
 socket calls to be blocking calls in the image

Assuming that VM use blocking sockets, then call will block the thread
& some of the image-side process.
Then hearbeat thread at some point sees that VM has no owning thread
and so, allows another thread, waiting in the queue to take ownership
on VM.
But what if there is no such thread? There is a choice: allocate new
native thread and let it continue running VM, or just ignore &  skip
over for the next heat beat.
I'd like to hear what you choose. Because depending from direction
taken, on server image, which simultaneously serves, say 100
connections you may end up either with 100 + 1 native threads, or less
(fixed) number of them but with risk to unable to run any VM code
until some of the blocking calls completes.

 There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.

I'd like to note that either of above alternatives having a quite bad
scalability potential.
I'd prefer to have a pool of threads, each of them serving N
connections. The size of threads pool should be 2x-3x number of
processor cores on host, because making more than that will not make
any real difference, since single core can serve only single native
thread while others will just consume the memory resources, like
address space etc.

That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.


>>
>>
>> --
>> ===========================================================================
>> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
>> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
>> ===========================================================================
>>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko

2009/11/11 Eliot Miranda <[hidden email]>:

>
>
>
> On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>> >>
>> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>> >>
>> >>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>> >>
>> >> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>> >
>> > Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
>> > The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
>> > On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
>> > Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
>> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
>> > The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
>> > Having cheap non-blocking calls allows e.g.
>> > - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
>> > - socket calls to be blocking calls in the image
>> > - Smalltalk code to call select/poll/WaitForMultipleEvents
>> > There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
>> > HTH
>> > eliot
>>
>> I used a mutex in Hydra (each interpreter has own mutex), so any
>> operation, which requires synchronization should be performed
>> only after obtaining the mutex ownership.
>> And sure, if crafted carefully, one could release the mutex before
>> doing an external call, and "try" get it back again after call
>> completed.
>> If use mutexes, provided by OS, then you don't need a heartbeat
>> process, obviously because you can simply put wait on mutex. So, i
>> suppose you introducing the heardbeat to minimize the overhead of
>> using synchronization primitives provided by OS, and instead using a
>> low-level assembly code.
>>
>> Just one minor thing - you mentioned the table of threads. What if
>> some routine creating a new thread, which get unnoticed by VM, so its
>> not registered in the VM 'threads' table,  but then such thread
>> attempts to obtain an ownership on interpreter somehow?
>
> This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.
>

Yes. Just to make sure everything is ok with that :)

>>
>> About inter-image communication in Hydra. The main problem that you
>> need to pass a buffer between heads, so you need to get a lock on a
>> recepient, while still keeping a lock on sender interpreter. But this
>> could lead to deadlock, if recepient in own turn attempts to do the
>> same.
>> So, the solution, unfortunately, is to copy buffer to C heap (using
>> malloc().. yeah :( ), and pass an event with pointer to such buffer,
>> which then could be handled by recepient as soon as it ready to do so,
>> in event handling routine.
>
> But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
>

of course i could. but the task is to minimize the overhead, possibly
even without buffer copy overhead (that where pinning GC would be
really helpfull). i don't think that OS facilities not copying data
buffer to secure location before passing it between the sockets.
Because once it releases the sender, while still waiting receiver to
be ready to retrieve the data, it can't guarantee that given buffer
will not be used for something else, hence it inevitable should either
copy buffer contents to secure location or block the sender.

>>
>> One more thing:
>>  socket calls to be blocking calls in the image
>>
>> Assuming that VM use blocking sockets, then call will block the thread
>> & some of the image-side process.
>> Then hearbeat thread at some point sees that VM has no owning thread
>> and so, allows another thread, waiting in the queue to take ownership
>> on VM.
>> But what if there is no such thread? There is a choice: allocate new
>> native thread and let it continue running VM, or just ignore &  skip
>> over for the next heat beat.
>> I'd like to hear what you choose. Because depending from direction
>> taken, on server image, which simultaneously serves, say 100
>> connections you may end up either with 100 + 1 native threads, or less
>> (fixed) number of them but with risk to unable to run any VM code
>> until some of the blocking calls completes.
>
>  There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.
>>
>> I'd like to note that either of above alternatives having a quite bad
>> scalability potential.
>> I'd prefer to have a pool of threads, each of them serving N
>> connections. The size of threads pool should be 2x-3x number of
>> processor cores on host, because making more than that will not make
>> any real difference, since single core can serve only single native
>> thread while others will just consume the memory resources, like
>> address space etc.
>
> That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.

Yes, but blocking sockets won't allow you to distribute load evenly
when number of threads less than number of active sockets. All active
connections should be distributed evenly among worker threads, that
will guarantee that you consuming computing resources optimally.

And what about scheduling? Have you considered my idea to move
scheduling to language side, while on VM side, leave
very small portion (in amount of code & logic) for switching the
active processes?
I think that with introduction of JIT the overhead of language-side
sheduling will be quite small and quite acceptable given that it
allows us to change things whenever we want, without touching VM.

>>
>> >>
>> >>
>> >> --
>> >> ===========================================================================
>> >> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
>> >> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
>> >> ===========================================================================
>> >>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
 


On Wed, Nov 11, 2009 at 10:20 AM, Igor Stasenko <[hidden email]> wrote:

2009/11/11 Eliot Miranda <[hidden email]>:
>
>
>
> On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>> >>
>> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>> >>
>> >>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>> >>
>> >> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>> >
>> > Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
>> > The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
>> > On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
>> > Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
>> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
>> > The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
>> > Having cheap non-blocking calls allows e.g.
>> > - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
>> > - socket calls to be blocking calls in the image
>> > - Smalltalk code to call select/poll/WaitForMultipleEvents
>> > There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
>> > HTH
>> > eliot
>>
>> I used a mutex in Hydra (each interpreter has own mutex), so any
>> operation, which requires synchronization should be performed
>> only after obtaining the mutex ownership.
>> And sure, if crafted carefully, one could release the mutex before
>> doing an external call, and "try" get it back again after call
>> completed.
>> If use mutexes, provided by OS, then you don't need a heartbeat
>> process, obviously because you can simply put wait on mutex. So, i
>> suppose you introducing the heardbeat to minimize the overhead of
>> using synchronization primitives provided by OS, and instead using a
>> low-level assembly code.
>>
>> Just one minor thing - you mentioned the table of threads. What if
>> some routine creating a new thread, which get unnoticed by VM, so its
>> not registered in the VM 'threads' table,  but then such thread
>> attempts to obtain an ownership on interpreter somehow?
>
> This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.
>

Yes. Just to make sure everything is ok with that :)

>>
>> About inter-image communication in Hydra. The main problem that you
>> need to pass a buffer between heads, so you need to get a lock on a
>> recepient, while still keeping a lock on sender interpreter. But this
>> could lead to deadlock, if recepient in own turn attempts to do the
>> same.
>> So, the solution, unfortunately, is to copy buffer to C heap (using
>> malloc().. yeah :( ), and pass an event with pointer to such buffer,
>> which then could be handled by recepient as soon as it ready to do so,
>> in event handling routine.
>
> But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
>

of course i could. but the task is to minimize the overhead, possibly
even without buffer copy overhead (that where pinning GC would be
really helpfull). i don't think that OS facilities not copying data
buffer to secure location before passing it between the sockets.
Because once it releases the sender, while still waiting receiver to
be ready to retrieve the data, it can't guarantee that given buffer
will not be used for something else, hence it inevitable should either
copy buffer contents to secure location or block the sender.

OK, instead one can create a buffer from within Smalltalk (e.g. via Alien) and then create OS semaphores and use blocking calls to wait on those semaphores.  All I'm really trying to say is that once you have a threaded FFI you can move lots of stuff up out of the VM.  The disadvantage is that one loses platform independence, but I've long thought that Smalltalk class hierarchies are a much nicer way of creating cross-platform abstractions than great gobs iof platform-specific C code.

Andreas counters that implementing the abstractions in the VM keeps them well-defined and free from meddling.  But that runs counter to the philosophy of an open system and preventing inadvertent meddling is something Smalltalk has to do anyway  (e.g. "Process should not be redefined, proceed to store over it").  The nice things about shooting oneself in the foot by meddling with a Smalltalk system are that a) it doesn't really do any harm and b) the debugging of it can be a great learning experience.


>>
>> One more thing:
>>  socket calls to be blocking calls in the image
>>
>> Assuming that VM use blocking sockets, then call will block the thread
>> & some of the image-side process.
>> Then hearbeat thread at some point sees that VM has no owning thread
>> and so, allows another thread, waiting in the queue to take ownership
>> on VM.
>> But what if there is no such thread? There is a choice: allocate new
>> native thread and let it continue running VM, or just ignore &  skip
>> over for the next heat beat.
>> I'd like to hear what you choose. Because depending from direction
>> taken, on server image, which simultaneously serves, say 100
>> connections you may end up either with 100 + 1 native threads, or less
>> (fixed) number of them but with risk to unable to run any VM code
>> until some of the blocking calls completes.
>
>  There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.
>>
>> I'd like to note that either of above alternatives having a quite bad
>> scalability potential.
>> I'd prefer to have a pool of threads, each of them serving N
>> connections. The size of threads pool should be 2x-3x number of
>> processor cores on host, because making more than that will not make
>> any real difference, since single core can serve only single native
>> thread while others will just consume the memory resources, like
>> address space etc.
>
> That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.

Yes, but blocking sockets won't allow you to distribute load evenly
when number of threads less than number of active sockets. All active
connections should be distributed evenly among worker threads, that
will guarantee that you consuming computing resources optimally.

So in that case one needs more threads, and to support them one needs more memory.  But it shouldn't be a surprise that one needs more resources to support a higher workload. Yer takes yer choice and yer pays yer price.  And of course one can providing settings to control the per-thread stack space etc.


And what about scheduling? Have you considered my idea to move
scheduling to language side, while on VM side, leave
very small portion (in amount of code & logic) for switching the
active processes?
I think that with introduction of JIT the overhead of language-side
sheduling will be quite small and quite acceptable given that it
allows us to change things whenever we want, without touching VM.

No I haven't considered this because at least in my implementation, the scheduler and the thread manager are intimately connected.  Once a process is in a callout on a particular thread it is bound to that thread and if it calls back it'll run on that thread and only on that thread until the callout unwinds.  Further, one will be able to bind processes to specific threads to ensure that certain activities happen from a given native thread (e.g. OpenGL and the Windows debugging API both require that calls are issued from a single thread).  So the thread manager and scheduler cooperate to bind processes to threads an to arrange that the right thread switches occur when process switches require them.  Figuring out how to do that with a Smalltalk-level scheduler is more than I can manage right now :)


>>
>> >>
>> >>
>> >> --
>> >> ===========================================================================
>> >> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
>> >> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
>> >> ===========================================================================
>> >>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Andreas.Raab
 
Eliot Miranda wrote:
> Andreas counters that implementing the abstractions in the VM keeps them
> well-defined and free from meddling.  But that runs counter to the
> philosophy of an open system and preventing inadvertent meddling is
> something Smalltalk has to do anyway  (e.g. "Process should not be
> redefined, proceed to store over it").  The nice things about shooting
> oneself in the foot by meddling with a Smalltalk system are that a) it
> doesn't really do any harm and b) the debugging of it can be a great
> learning experience.

I disagree with both of these statements 100%. First, once you have
passed the first invalid pointer to some C function you will find out
the hard way that "not really doing any harm" means, oh well it'll crash
your image instead of raising a debugger. The consequence of this is
that for any serious development you will no longer want to do in-image
development; consider writing C code inside the running C app and then
have it core-dump every time you misspell something. Not exactly my
definition of "not really doing any harm". Secondly, wading through gobs
of platform specific code is only a great learning experience if you are
trying to learn platform specific stuff. Otherwise it's a useless
distraction that only gets into your way of seeing (and using) the
abstractions.

Cheers,
   - Andreas
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko
In reply to this post by Eliot Miranda-2

2009/11/11 Eliot Miranda <[hidden email]>:

>
>
>
> On Wed, Nov 11, 2009 at 10:20 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>> >> >>
>> >> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>> >> >>
>> >> >>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>> >> >>
>> >> >> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>> >> >
>> >> > Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
>> >> > The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
>> >> > On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
>> >> > Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
>> >> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
>> >> > The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
>> >> > Having cheap non-blocking calls allows e.g.
>> >> > - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
>> >> > - socket calls to be blocking calls in the image
>> >> > - Smalltalk code to call select/poll/WaitForMultipleEvents
>> >> > There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
>> >> > HTH
>> >> > eliot
>> >>
>> >> I used a mutex in Hydra (each interpreter has own mutex), so any
>> >> operation, which requires synchronization should be performed
>> >> only after obtaining the mutex ownership.
>> >> And sure, if crafted carefully, one could release the mutex before
>> >> doing an external call, and "try" get it back again after call
>> >> completed.
>> >> If use mutexes, provided by OS, then you don't need a heartbeat
>> >> process, obviously because you can simply put wait on mutex. So, i
>> >> suppose you introducing the heardbeat to minimize the overhead of
>> >> using synchronization primitives provided by OS, and instead using a
>> >> low-level assembly code.
>> >>
>> >> Just one minor thing - you mentioned the table of threads. What if
>> >> some routine creating a new thread, which get unnoticed by VM, so its
>> >> not registered in the VM 'threads' table,  but then such thread
>> >> attempts to obtain an ownership on interpreter somehow?
>> >
>> > This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.
>> >
>>
>> Yes. Just to make sure everything is ok with that :)
>>
>> >>
>> >> About inter-image communication in Hydra. The main problem that you
>> >> need to pass a buffer between heads, so you need to get a lock on a
>> >> recepient, while still keeping a lock on sender interpreter. But this
>> >> could lead to deadlock, if recepient in own turn attempts to do the
>> >> same.
>> >> So, the solution, unfortunately, is to copy buffer to C heap (using
>> >> malloc().. yeah :( ), and pass an event with pointer to such buffer,
>> >> which then could be handled by recepient as soon as it ready to do so,
>> >> in event handling routine.
>> >
>> > But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
>> >
>>
>> of course i could. but the task is to minimize the overhead, possibly
>> even without buffer copy overhead (that where pinning GC would be
>> really helpfull). i don't think that OS facilities not copying data
>> buffer to secure location before passing it between the sockets.
>> Because once it releases the sender, while still waiting receiver to
>> be ready to retrieve the data, it can't guarantee that given buffer
>> will not be used for something else, hence it inevitable should either
>> copy buffer contents to secure location or block the sender.
>
> OK, instead one can create a buffer from within Smalltalk (e.g. via Alien) and then create OS semaphores and use blocking calls to wait on those semaphores.  All I'm really trying to say is that once you have a threaded FFI you can move lots of stuff up out of the VM.  The disadvantage is that one loses platform independence, but I've long thought that Smalltalk class hierarchies are a much nicer way of creating cross-platform abstractions than great gobs iof platform-specific C code.
> Andreas counters that implementing the abstractions in the VM keeps them well-defined and free from meddling.  But that runs counter to the philosophy of an open system and preventing inadvertent meddling is something Smalltalk has to do anyway  (e.g. "Process should not be redefined, proceed to store over it").  The nice things about shooting oneself in the foot by meddling with a Smalltalk system are that a) it doesn't really do any harm and b) the debugging of it can be a great learning experience.
>
>> >>
>> >> One more thing:
>> >>  socket calls to be blocking calls in the image
>> >>
>> >> Assuming that VM use blocking sockets, then call will block the thread
>> >> & some of the image-side process.
>> >> Then hearbeat thread at some point sees that VM has no owning thread
>> >> and so, allows another thread, waiting in the queue to take ownership
>> >> on VM.
>> >> But what if there is no such thread? There is a choice: allocate new
>> >> native thread and let it continue running VM, or just ignore &  skip
>> >> over for the next heat beat.
>> >> I'd like to hear what you choose. Because depending from direction
>> >> taken, on server image, which simultaneously serves, say 100
>> >> connections you may end up either with 100 + 1 native threads, or less
>> >> (fixed) number of them but with risk to unable to run any VM code
>> >> until some of the blocking calls completes.
>> >
>> >  There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.
>> >>
>> >> I'd like to note that either of above alternatives having a quite bad
>> >> scalability potential.
>> >> I'd prefer to have a pool of threads, each of them serving N
>> >> connections. The size of threads pool should be 2x-3x number of
>> >> processor cores on host, because making more than that will not make
>> >> any real difference, since single core can serve only single native
>> >> thread while others will just consume the memory resources, like
>> >> address space etc.
>> >
>> > That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.
>>
>> Yes, but blocking sockets won't allow you to distribute load evenly
>> when number of threads less than number of active sockets. All active
>> connections should be distributed evenly among worker threads, that
>> will guarantee that you consuming computing resources optimally.
>
> So in that case one needs more threads, and to support them one needs more memory.  But it shouldn't be a surprise that one needs more resources to support a higher workload. Yer takes yer choice and yer pays yer price.  And of course one can providing settings to control the per-thread stack space etc.
>
>> And what about scheduling? Have you considered my idea to move
>> scheduling to language side, while on VM side, leave
>> very small portion (in amount of code & logic) for switching the
>> active processes?
>> I think that with introduction of JIT the overhead of language-side
>> sheduling will be quite small and quite acceptable given that it
>> allows us to change things whenever we want, without touching VM.
>
> No I haven't considered this because at least in my implementation, the scheduler and the thread manager are intimately connected.  Once a process is in a callout on a particular thread it is bound to that thread and if it calls back it'll run on that thread and only on that thread until the callout unwinds.  Further, one will be able to bind processes to specific threads to ensure that certain activities happen from a given native thread (e.g. OpenGL and the Windows debugging API both require that calls are issued from a single thread).  So the thread manager and scheduler cooperate to bind processes to threads an to arrange that the right thread switches occur when process switches require them.  Figuring out how to do that with a Smalltalk-level scheduler is more than I can manage right now :)

Hmm.. i'm not sure that binding a process to thread will guarantee
that all callouts will be made from same thread.
Consider the code (presumably using some external API, which should be
used only from main thread), which runs within the callback:

[ gl doSomething ] fork

suppose that active process is bound to particular native thread, but
the process which is forked - not. But the problem that you must also
bind a forked process to be run using same thread as a process which
created it otherwise you having virtually no guarantees that some call
will be made using wrong thread.
But you can't predict what forked process does, it may do wrong calls
or may not , so binding the forked process to same thread is very
pessimistic choice.

Instead, why not expose the new VM abilities to language, so one could
tell that specified callout should use a specified thread. Something
like:

threadHandle := Smalltalk currentThreadId.

externalFunction call: { arguments } inThread: threadHandle

while by default

externalFunction call: { arguments }

will be free to run in any thread which is controlled by VM.

Then you are freed from implementing a complex and quite fragile (as
to me) logic on VM side which magically attempts to keep all horses
full-fed :)

>>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
In reply to this post by Andreas.Raab
 


On Wed, Nov 11, 2009 at 10:58 AM, Andreas Raab <[hidden email]> wrote:

Eliot Miranda wrote:
Andreas counters that implementing the abstractions in the VM keeps them well-defined and free from meddling.  But that runs counter to the philosophy of an open system and preventing inadvertent meddling is something Smalltalk has to do anyway  (e.g. "Process should not be redefined, proceed to store over it").  The nice things about shooting oneself in the foot by meddling with a Smalltalk system are that a) it doesn't really do any harm and b) the debugging of it can be a great learning experience.

I disagree with both of these statements 100%. First, once you have passed the first invalid pointer to some C function you will find out the hard way that "not really doing any harm" means, oh well it'll crash your image instead of raising a debugger. The consequence of this is that for any serious development you will no longer want to do in-image development; consider writing C code inside the running C app and then have it core-dump every time you misspell something. Not exactly my definition of "not really doing any harm". Secondly, wading through gobs of platform specific code is only a great learning experience if you are trying to learn platform specific stuff. Otherwise it's a useless distraction that only gets into your way of seeing (and using) the abstractions.

I'm saying the opposite.  I'm saying one should implement these cross-platform abstractions up in the image.  I'm saying that shooting one's self in the foot in Smalltalk is doing something like
    Smalltalk := nil
and learning from the result.

The crash on misspellings is a straw man.  We've both been in the situation where we've had to debug VM crashes, for a number of reasons, both VM bugs and invalid FFI calls.  None of that has made us any the less keen on in-image development.

Attempting to implement cross-platform abstractions is a good learning experience if that's what you're into.  Implementing them in a high-level language with good facilities is IMO better than implementing them in C in a context where its very hard to improve them because one can't easily experiment.

If one is in the situation of causing crashes in FFI calls one can always run the VM under a debugger, in which case one is in much the same situation as running in C.  It's not much worse.  And an FFI can use e.g. exception handling around a call to catch crashes and report them back to the image without exiting the image.  This isn't particularly helpful because there's no stack backtrace, simply an exception code, but it at least allows the system to report the bug rather than just crash.  To debug it one either has to scratch one's head or fire up that low-level debugger.  Giving up on image-based development is a rather extreme reaction.

best
Eliot


Cheers,
 - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Eliot Miranda-2
In reply to this post by Igor Stasenko
 


On Wed, Nov 11, 2009 at 11:16 AM, Igor Stasenko <[hidden email]> wrote:

2009/11/11 Eliot Miranda <[hidden email]>:
>
>
>
> On Wed, Nov 11, 2009 at 10:20 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>> >> >>
>> >> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>> >> >>
>> >> >>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>> >> >>
>> >> >> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>> >> >
>> >> > Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
>> >> > The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
>> >> > On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
>> >> > Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
>> >> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
>> >> > The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
>> >> > Having cheap non-blocking calls allows e.g.
>> >> > - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
>> >> > - socket calls to be blocking calls in the image
>> >> > - Smalltalk code to call select/poll/WaitForMultipleEvents
>> >> > There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
>> >> > HTH
>> >> > eliot
>> >>
>> >> I used a mutex in Hydra (each interpreter has own mutex), so any
>> >> operation, which requires synchronization should be performed
>> >> only after obtaining the mutex ownership.
>> >> And sure, if crafted carefully, one could release the mutex before
>> >> doing an external call, and "try" get it back again after call
>> >> completed.
>> >> If use mutexes, provided by OS, then you don't need a heartbeat
>> >> process, obviously because you can simply put wait on mutex. So, i
>> >> suppose you introducing the heardbeat to minimize the overhead of
>> >> using synchronization primitives provided by OS, and instead using a
>> >> low-level assembly code.
>> >>
>> >> Just one minor thing - you mentioned the table of threads. What if
>> >> some routine creating a new thread, which get unnoticed by VM, so its
>> >> not registered in the VM 'threads' table,  but then such thread
>> >> attempts to obtain an ownership on interpreter somehow?
>> >
>> > This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.
>> >
>>
>> Yes. Just to make sure everything is ok with that :)
>>
>> >>
>> >> About inter-image communication in Hydra. The main problem that you
>> >> need to pass a buffer between heads, so you need to get a lock on a
>> >> recepient, while still keeping a lock on sender interpreter. But this
>> >> could lead to deadlock, if recepient in own turn attempts to do the
>> >> same.
>> >> So, the solution, unfortunately, is to copy buffer to C heap (using
>> >> malloc().. yeah :( ), and pass an event with pointer to such buffer,
>> >> which then could be handled by recepient as soon as it ready to do so,
>> >> in event handling routine.
>> >
>> > But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
>> >
>>
>> of course i could. but the task is to minimize the overhead, possibly
>> even without buffer copy overhead (that where pinning GC would be
>> really helpfull). i don't think that OS facilities not copying data
>> buffer to secure location before passing it between the sockets.
>> Because once it releases the sender, while still waiting receiver to
>> be ready to retrieve the data, it can't guarantee that given buffer
>> will not be used for something else, hence it inevitable should either
>> copy buffer contents to secure location or block the sender.
>
> OK, instead one can create a buffer from within Smalltalk (e.g. via Alien) and then create OS semaphores and use blocking calls to wait on those semaphores.  All I'm really trying to say is that once you have a threaded FFI you can move lots of stuff up out of the VM.  The disadvantage is that one loses platform independence, but I've long thought that Smalltalk class hierarchies are a much nicer way of creating cross-platform abstractions than great gobs iof platform-specific C code.
> Andreas counters that implementing the abstractions in the VM keeps them well-defined and free from meddling.  But that runs counter to the philosophy of an open system and preventing inadvertent meddling is something Smalltalk has to do anyway  (e.g. "Process should not be redefined, proceed to store over it").  The nice things about shooting oneself in the foot by meddling with a Smalltalk system are that a) it doesn't really do any harm and b) the debugging of it can be a great learning experience.
>
>> >>
>> >> One more thing:
>> >>  socket calls to be blocking calls in the image
>> >>
>> >> Assuming that VM use blocking sockets, then call will block the thread
>> >> & some of the image-side process.
>> >> Then hearbeat thread at some point sees that VM has no owning thread
>> >> and so, allows another thread, waiting in the queue to take ownership
>> >> on VM.
>> >> But what if there is no such thread? There is a choice: allocate new
>> >> native thread and let it continue running VM, or just ignore &  skip
>> >> over for the next heat beat.
>> >> I'd like to hear what you choose. Because depending from direction
>> >> taken, on server image, which simultaneously serves, say 100
>> >> connections you may end up either with 100 + 1 native threads, or less
>> >> (fixed) number of them but with risk to unable to run any VM code
>> >> until some of the blocking calls completes.
>> >
>> >  There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.
>> >>
>> >> I'd like to note that either of above alternatives having a quite bad
>> >> scalability potential.
>> >> I'd prefer to have a pool of threads, each of them serving N
>> >> connections. The size of threads pool should be 2x-3x number of
>> >> processor cores on host, because making more than that will not make
>> >> any real difference, since single core can serve only single native
>> >> thread while others will just consume the memory resources, like
>> >> address space etc.
>> >
>> > That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.
>>
>> Yes, but blocking sockets won't allow you to distribute load evenly
>> when number of threads less than number of active sockets. All active
>> connections should be distributed evenly among worker threads, that
>> will guarantee that you consuming computing resources optimally.
>
> So in that case one needs more threads, and to support them one needs more memory.  But it shouldn't be a surprise that one needs more resources to support a higher workload. Yer takes yer choice and yer pays yer price.  And of course one can providing settings to control the per-thread stack space etc.
>
>> And what about scheduling? Have you considered my idea to move
>> scheduling to language side, while on VM side, leave
>> very small portion (in amount of code & logic) for switching the
>> active processes?
>> I think that with introduction of JIT the overhead of language-side
>> sheduling will be quite small and quite acceptable given that it
>> allows us to change things whenever we want, without touching VM.
>
> No I haven't considered this because at least in my implementation, the scheduler and the thread manager are intimately connected.  Once a process is in a callout on a particular thread it is bound to that thread and if it calls back it'll run on that thread and only on that thread until the callout unwinds.  Further, one will be able to bind processes to specific threads to ensure that certain activities happen from a given native thread (e.g. OpenGL and the Windows debugging API both require that calls are issued from a single thread).  So the thread manager and scheduler cooperate to bind processes to threads an to arrange that the right thread switches occur when process switches require them.  Figuring out how to do that with a Smalltalk-level scheduler is more than I can manage right now :)

Hmm.. i'm not sure that binding a process to thread will guarantee
that all callouts will be made from same thread.
Consider the code (presumably using some external API, which should be
used only from main thread), which runs within the callback:

[ gl doSomething ] fork

suppose that active process is bound to particular native thread, but
the process which is forked - not. But the problem that you must also
bind a forked process to be run using same thread as a process which
created it otherwise you having virtually no guarantees that some call
will be made using wrong thread.
But you can't predict what forked process does, it may do wrong calls
or may not , so binding the forked process to same thread is very
pessimistic choice.

Instead, why not expose the new VM abilities to language, so one could
tell that specified callout should use a specified thread. Something
like:

threadHandle := Smalltalk currentThreadId.

externalFunction call: { arguments } inThread: threadHandle

while by default

externalFunction call: { arguments }

will be free to run in any thread which is controlled by VM.

Because that'll deadlock if the thread is doing something else.  Dedicating a thread to a process can avoid that.  e.g. spawn a process and bind it to a thread.  wrap the process in an API object.  The external API methods pass in requests via blocks to the hidden server process.  AFAICT Croquet uses this kind of approach to send messages between processes.


Then you are freed from implementing a complex and quite fragile (as
to me) logic on VM side which magically attempts to keep all horses
full-fed :)

The mechanisms I've implemented don't look that fragile to me.  But its too early to report any real experience.  You could be right, but what I've done so far makes sense to me :)


>>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko
In reply to this post by Andreas.Raab

2009/11/11 Andreas Raab <[hidden email]>:

>
> Eliot Miranda wrote:
>>
>> Andreas counters that implementing the abstractions in the VM keeps them
>> well-defined and free from meddling.  But that runs counter to the
>> philosophy of an open system and preventing inadvertent meddling is
>> something Smalltalk has to do anyway  (e.g. "Process should not be
>> redefined, proceed to store over it").  The nice things about shooting
>> oneself in the foot by meddling with a Smalltalk system are that a) it
>> doesn't really do any harm and b) the debugging of it can be a great
>> learning experience.
>
> I disagree with both of these statements 100%. First, once you have passed
> the first invalid pointer to some C function you will find out the hard way
> that "not really doing any harm" means, oh well it'll crash your image
> instead of raising a debugger. The consequence of this is that for any
> serious development you will no longer want to do in-image development;
> consider writing C code inside the running C app and then have it core-dump
> every time you misspell something. Not exactly my definition of "not really
> doing any harm". Secondly, wading through gobs of platform specific code is
> only a great learning experience if you are trying to learn platform
> specific stuff. Otherwise it's a useless distraction that only gets into
> your way of seeing (and using) the abstractions.
>

Excuse me for intrusion, but as you said, passing wrong pointer to
some C function having
same consequence either you doing it from C program or from smalltalk one.
So, what the difference? It is the price we pay, and will pay anyways
, when developing applications which
using unmanaged C code.
Someone has to go through the pain debugging the code which using C
libraries. And he will have to deal with
core dumps. Be it core dump of VM or core dump of any other program,
it is still a core dump - a signal that something gone wrong.

Also, trying to keep users away from platform-specific stuff is a good
intent.. but it means that someone, at some day dealt with this stuff
instead of you. One, who had to get through numerous core dumps &
crashes until he had a stable code, ready for use by others. So, what
makes that developer any different from anyone else, who trying to do
something which requires use of some platform-specific library?

> Cheers,
>  - Andreas
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: An event driven Squeak VM

Igor Stasenko
In reply to this post by Eliot Miranda-2

2009/11/11 Eliot Miranda <[hidden email]>:

>
>
>
> On Wed, Nov 11, 2009 at 11:16 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Wed, Nov 11, 2009 at 10:20 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> 2009/11/11 Eliot Miranda <[hidden email]>:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <[hidden email]> wrote:
>> >> >> >>
>> >> >> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
>> >> >> >>
>> >> >> >>> With the threaded Squeak VM I'm working on one can go one better and have a number of image-level processes that block in the FFI and a number of worker threads in the VM that block on OS semaphores waiting for the VM to give them something to do.
>> >> >> >>
>> >> >> >> Obviously now you have to give a bit more details on this. Is it like the hydra VM? Or entirely different?
>> >> >> >
>> >> >> > Orthogonal, in that it might work well with Hydra.  The basic scheme is to have a natively multi-threaded VM that is not concurrent.  Multiple native threads share the Vm such that there is only one thread running Vm code at any one time.  This the VM can make non-blocking calls to the outside world but neither the VM nor the image need to be modified to handle true concurrency.  This is the same basic architecture as in the Strongtalk and V8 VMs and notably in David Simmons' various Smalltalk VMs.
>> >> >> > The cool thing about the system is David's design.  He's been extremely generous in explaining to me his scheme, which is extremely efficient.  I've merely implemented this scheme in the context of the Cog VM.  The idea is to arrange that a threaded callout is so cheap that any and all callouts can be threaded.  This is done by arranging that a callout does not switch to another thread, instead the thread merely "disowns" the VM.  It is the job of a background heartbeat thread to detect tat a callout is long-runnijng and that the VM has effectively blocked.  The heartbeat then activates a new thread to run the VM and the new thread attempts to take ownership and will run Smalltalk code if it succeeds.
>> >> >> > On return form a callout a thread must attempt to take ownership of the VM, and if it fails, add itself to a queue of threads waiting to take back the VM and then wait on an OS semaphore until the thread owning the VM decides to give up ownership to it.
>> >> >> > Every VM thread has a unique index.  The vmOwner variable holds the index of the owning thread or 0 if the VM is unowned.  To disown the VM all a thread has to do is zero vmOwner, while remembering the value of vmOwner in a temporary.  To take ownership a thread must use a low-level lock to gain exclusive access to vmOwner, and if vmOwner is zero, set it back to the thread's index, and release the lock.  If it finds vmOwner is non-zero it releases the lock and enters the wanting ownership queue.
>> >> >> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less than 0.5ms is likely to complete without the heartbeat detecting that the VM is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All the work of changing the active process when switching between threads is deferred from callout time to when a different thread takes ownership of the VM, saving the VM state for the process that surrendered the VM and installing its own.
>> >> >> > The major wrinkle in this is that in David's VM he has a pinning garbage collector which arranges that any arguments passed out through the FFI are implicitly pinned.  We don't yet have a pinning garbage collector.  I do plan to do one.  But in the interim one quick hack, a neat idea of Andreas', is to fail calls that attempt to pass objects in new space, allowing only old objects to be passed, and to prevent the full garbage collector from running while any threaded calls are in progress.
>> >> >> > Having cheap non-blocking calls allows e.g.
>> >> >> > - the Hydra inter-VM channels to be implemented in Smalltalk code above the threaded FFI
>> >> >> > - socket calls to be blocking calls in the image
>> >> >> > - Smalltalk code to call select/poll/WaitForMultipleEvents
>> >> >> > There are still plenty of sticky issues to do with e.g. identifying threads that can do specific functions, such as the UI thread, and issuing OpenGL calls from the right thread, etc, etc.  But these are all doable, if potentially tricky to get right.  If this kind of code does migrate from the VM innards up to the image I think that's a really good thing (tm) but one will really have to know what one is doing to get it right.
>> >> >> > HTH
>> >> >> > eliot
>> >> >>
>> >> >> I used a mutex in Hydra (each interpreter has own mutex), so any
>> >> >> operation, which requires synchronization should be performed
>> >> >> only after obtaining the mutex ownership.
>> >> >> And sure, if crafted carefully, one could release the mutex before
>> >> >> doing an external call, and "try" get it back again after call
>> >> >> completed.
>> >> >> If use mutexes, provided by OS, then you don't need a heartbeat
>> >> >> process, obviously because you can simply put wait on mutex. So, i
>> >> >> suppose you introducing the heardbeat to minimize the overhead of
>> >> >> using synchronization primitives provided by OS, and instead using a
>> >> >> low-level assembly code.
>> >> >>
>> >> >> Just one minor thing - you mentioned the table of threads. What if
>> >> >> some routine creating a new thread, which get unnoticed by VM, so its
>> >> >> not registered in the VM 'threads' table,  but then such thread
>> >> >> attempts to obtain an ownership on interpreter somehow?
>> >> >
>> >> > This can only happen on a callback or other well-defined entry-point.  At these well-defined entry-points the VM checks whether there is a tag in thread-local storage (the thread's VM index).  If it is not set the VM allocates the necessary per-thread storage, assigns an index and allows the thread to continue.  On return from the entry-point the VM deallocates the storage, clears the thread-local storage and returns.
>> >> >
>> >>
>> >> Yes. Just to make sure everything is ok with that :)
>> >>
>> >> >>
>> >> >> About inter-image communication in Hydra. The main problem that you
>> >> >> need to pass a buffer between heads, so you need to get a lock on a
>> >> >> recepient, while still keeping a lock on sender interpreter. But this
>> >> >> could lead to deadlock, if recepient in own turn attempts to do the
>> >> >> same.
>> >> >> So, the solution, unfortunately, is to copy buffer to C heap (using
>> >> >> malloc().. yeah :( ), and pass an event with pointer to such buffer,
>> >> >> which then could be handled by recepient as soon as it ready to do so,
>> >> >> in event handling routine.
>> >> >
>> >> > But you could connect the two with a pair of pipes, right?  Then al that locking and buffer allocation is in the VM.  Or rather, once you have a non-blocking FFI you can just use an OS's native stream-based inter-process communications facilities.
>> >> >
>> >>
>> >> of course i could. but the task is to minimize the overhead, possibly
>> >> even without buffer copy overhead (that where pinning GC would be
>> >> really helpfull). i don't think that OS facilities not copying data
>> >> buffer to secure location before passing it between the sockets.
>> >> Because once it releases the sender, while still waiting receiver to
>> >> be ready to retrieve the data, it can't guarantee that given buffer
>> >> will not be used for something else, hence it inevitable should either
>> >> copy buffer contents to secure location or block the sender.
>> >
>> > OK, instead one can create a buffer from within Smalltalk (e.g. via Alien) and then create OS semaphores and use blocking calls to wait on those semaphores.  All I'm really trying to say is that once you have a threaded FFI you can move lots of stuff up out of the VM.  The disadvantage is that one loses platform independence, but I've long thought that Smalltalk class hierarchies are a much nicer way of creating cross-platform abstractions than great gobs iof platform-specific C code.
>> > Andreas counters that implementing the abstractions in the VM keeps them well-defined and free from meddling.  But that runs counter to the philosophy of an open system and preventing inadvertent meddling is something Smalltalk has to do anyway  (e.g. "Process should not be redefined, proceed to store over it").  The nice things about shooting oneself in the foot by meddling with a Smalltalk system are that a) it doesn't really do any harm and b) the debugging of it can be a great learning experience.
>> >
>> >> >>
>> >> >> One more thing:
>> >> >>  socket calls to be blocking calls in the image
>> >> >>
>> >> >> Assuming that VM use blocking sockets, then call will block the thread
>> >> >> & some of the image-side process.
>> >> >> Then hearbeat thread at some point sees that VM has no owning thread
>> >> >> and so, allows another thread, waiting in the queue to take ownership
>> >> >> on VM.
>> >> >> But what if there is no such thread? There is a choice: allocate new
>> >> >> native thread and let it continue running VM, or just ignore &  skip
>> >> >> over for the next heat beat.
>> >> >> I'd like to hear what you choose. Because depending from direction
>> >> >> taken, on server image, which simultaneously serves, say 100
>> >> >> connections you may end up either with 100 + 1 native threads, or less
>> >> >> (fixed) number of them but with risk to unable to run any VM code
>> >> >> until some of the blocking calls completes.
>> >> >
>> >> >  There is a simple policy that is a cap on the total number of threads the VM will allocate.  below this a new thread is allocated.  At the limit the VM will block.  But note that the pool starts at 1 and only grows as necessary up to the cap.
>> >> >>
>> >> >> I'd like to note that either of above alternatives having a quite bad
>> >> >> scalability potential.
>> >> >> I'd prefer to have a pool of threads, each of them serving N
>> >> >> connections. The size of threads pool should be 2x-3x number of
>> >> >> processor cores on host, because making more than that will not make
>> >> >> any real difference, since single core can serve only single native
>> >> >> thread while others will just consume the memory resources, like
>> >> >> address space etc.
>> >> >
>> >> > That's very similar to my numbers too.  My current default is at least two threads and no more than 32, and 2 x num processors/cores in between.  But these numbers should be configurable.  This is just to get started.
>> >>
>> >> Yes, but blocking sockets won't allow you to distribute load evenly
>> >> when number of threads less than number of active sockets. All active
>> >> connections should be distributed evenly among worker threads, that
>> >> will guarantee that you consuming computing resources optimally.
>> >
>> > So in that case one needs more threads, and to support them one needs more memory.  But it shouldn't be a surprise that one needs more resources to support a higher workload. Yer takes yer choice and yer pays yer price.  And of course one can providing settings to control the per-thread stack space etc.
>> >
>> >> And what about scheduling? Have you considered my idea to move
>> >> scheduling to language side, while on VM side, leave
>> >> very small portion (in amount of code & logic) for switching the
>> >> active processes?
>> >> I think that with introduction of JIT the overhead of language-side
>> >> sheduling will be quite small and quite acceptable given that it
>> >> allows us to change things whenever we want, without touching VM.
>> >
>> > No I haven't considered this because at least in my implementation, the scheduler and the thread manager are intimately connected.  Once a process is in a callout on a particular thread it is bound to that thread and if it calls back it'll run on that thread and only on that thread until the callout unwinds.  Further, one will be able to bind processes to specific threads to ensure that certain activities happen from a given native thread (e.g. OpenGL and the Windows debugging API both require that calls are issued from a single thread).  So the thread manager and scheduler cooperate to bind processes to threads an to arrange that the right thread switches occur when process switches require them.  Figuring out how to do that with a Smalltalk-level scheduler is more than I can manage right now :)
>>
>> Hmm.. i'm not sure that binding a process to thread will guarantee
>> that all callouts will be made from same thread.
>> Consider the code (presumably using some external API, which should be
>> used only from main thread), which runs within the callback:
>>
>> [ gl doSomething ] fork
>>
>> suppose that active process is bound to particular native thread, but
>> the process which is forked - not. But the problem that you must also
>> bind a forked process to be run using same thread as a process which
>> created it otherwise you having virtually no guarantees that some call
>> will be made using wrong thread.
>> But you can't predict what forked process does, it may do wrong calls
>> or may not , so binding the forked process to same thread is very
>> pessimistic choice.
>>
>> Instead, why not expose the new VM abilities to language, so one could
>> tell that specified callout should use a specified thread. Something
>> like:
>>
>> threadHandle := Smalltalk currentThreadId.
>>
>> externalFunction call: { arguments } inThread: threadHandle
>>
>> while by default
>>
>> externalFunction call: { arguments }
>>
>> will be free to run in any thread which is controlled by VM.
>
> Because that'll deadlock if the thread is doing something else.  Dedicating a thread to a process can avoid that.  e.g. spawn a process and bind it to a thread.  wrap the process in an API object.  The external API methods pass in requests via blocks to the hidden server process.  AFAICT Croquet uses this kind of approach to send messages between processes.

well i think that its easy to think out the solution how to tell a
callout to be performed in specific thread.
And i don't think that if you having such control at language level it
will make any difference, because you still have to deal with it
anyways, even with model which you currently employing.

>>
>> Then you are freed from implementing a complex and quite fragile (as
>> to me) logic on VM side which magically attempts to keep all horses
>> full-fed :)
>
> The mechanisms I've implemented don't look that fragile to me.  But its too early to report any real experience.  You could be right, but what I've done so far makes sense to me :)

I am right ;) , because from interpreter's point of view, switching to
different Process means just switching an active context, which is a
quite regular procedure for VM - interpreter switching an active
context all the times at each message send and return!
So, ask yourself, why VM would want to know something extra except the
context it should continue interpreting from?

Potentially a cost of switching an active process == cost of message
send. But of course, all logic which needs to pick what context the
interpreter should switch to may introduce some overhead.
But i prefer to have such logic implemented at language side, instead
to be frozen inside VM.

>>
>> >>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.


--
Best regards,
Igor Stasenko AKA sig.
12