Smalltalk › Squeak › Squeak - Dev

"callback failed to own the VM"

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

14 messages Options

ccrraaiigg

"callback failed to own the VM"

Hi--

What does it mean when I get "callback failed to own the VM" on
stderr from cogspur?

thanks again,

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

ccrraaiigg

re: "callback failed to own the VM"

After reading [1], I figured I should be using a multi-threaded Cog
VM (cogmtlinuxht). When I start it with the 4.6 final image, I get:

***

warning, processHasThreadId flag is unset; cannot function as a threaded
VM if so.
warning: VM parameter 48 indicates Process doesn't have threadId; VM
will not thread

***

What does this mean and what should I do?

thanks,

-C

[1] http://forum.world.st/callback-explanation-td4792105.html

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

ccrraaiigg

re: "callback failed to own the VM"

Hm, I still get the "failed to own the VM" message with the
multithreaded VM. I thought it would wait until it could run the
callback from a VM thread? What now?

thanks again,

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Eliot Miranda-2

re: "callback failed to own the VM"

Hi Craig,

first, the message from the non-MT VM, "Warning; callback failed to own the VM", indicates that a callback is coming in on other than the VM thread. Here's the non-MT implementation from sqVirtualMachine.c:

sqInt ownVM(sqInt threadIdAndFlags)

{

extern sqInt amInVMThread(void);

return amInVMThread() ? 0 : -1;

}

So with the normal VM any callbacks must come in on the thread that made a callout.

On Tue, Oct 6, 2015 at 7:23 AM, Craig Latta <[hidden email]> wrote:

Hm, I still get the "failed to own the VM" message with the
multithreaded VM. I thought it would wait until it could run the
callback from a VM thread? What now?

Second, the MT VM /is/ only a prototype. But second, what's the platform, etc? Do you have a reproducible case? Are you willing to use gdb et al to debug ownVM?

The issue here is when a callback comes in from some unknown thread, what do we do with it? By default it is rejected. But if there is a process in the foreignCallbackProcessSlot of the specialObjectsArray, it will be cloned to handle each "foreign" callback. Likely you don't have such a process installed, hence the callbacks getting rejected.

If you read CogInterpreterMT>>ownVM: you'll see

CogInterpreterMT>>ownVM: threadIndexAndFlags

<api>

<inline: false>

"This is the entry-point for plugins and primitives that wish to reacquire the VM after having

released it via disownVM or callbacks that want to acquire it without knowing their ownership

status. This call will block until the VM is owned by the current thread or an error occurs.

The argument should be the value answered by disownVM, or 0 for callbacks that don't know

if they have disowned or not. This is both an optimization to avoid having to query thread-

local storage for the current thread's index (since it can easily keep it in some local variable),

and a record of when an unbound process becomes affined to a thread for the dynamic

extent of some operation.

Answer 0 if the current thread is known to the VM.

Answer 1 if the current thread is unknown to the VM and takes ownership.

Answer -1 if the current thread is unknown to the VM and fails to take ownership."

| threadIndex flags vmThread myProc activeProc sched |

<var: #vmThread type: #'CogVMThread *'>

threadIndexAndFlags = 0 ifTrue:

[^self ownVMFromUnidentifiedThread].

...

and

CogInterpreterMT>>ownVMFromUnidentifiedThread

"Attempt to take ownership from a thread that as yet doesn't know its index.

This supports callbacks where the callback could originate from any thread.

Answer 0 if the owning thread is known to the VM.

Answer 1 if the owning thread is unknown to the VM and now owns the VM.

Answer -1 if the owning thread is unknown to the VM and fails to own the VM.

Answer -2 if the owning thread is unknown to the VM and there is no foreign callback process installed."

| count threadIndex vmThread |

<var: #vmThread type: #'CogVMThread *'>

<inline: false>

(threadIndex := cogThreadManager ioGetThreadLocalThreadIndex) ~= 0 ifTrue:

[ "this is a callback from a known thread"

threadIndex = cogThreadManager getVMOwner ifTrue: "the VM has not been disowned"

[self assert: (disowningVMThread isNil or: [disowningVMThread = self currentVMThread]).

disowningVMThread := nil.

self currentVMThread state: CTMAssignableOrInVM.

^VMAlreadyOwnedHenceDoNotDisown].

^self ownVM: threadIndex].

foreignCallbackPriority = 0 ifTrue:

[^-2].

...

i.e. that to accept a callback from an unknown thread the system has to have the priority at which to run callbacks from unknown threads (foreignCallbackPriority) determined, and that's set by the priority of the process filling the foreignCallbackProcessSlot in the specialObjectsArray, as set by disownVM:.

CogInterpreterMT>>disownVM: flags

"Release the VM to other threads and answer the current thread's index.

Currently valid flags:

DisownVMLockOutFullGC - prevent fullGCs while this thread disowns the VM.

OwnVMForeignThreadFlag - indicates lowest-level entry from a foreign thread

- not to be used explicitly by clients

- only set by ownVMFromUnidentifiedThread

VMAlreadyOwnedHenceDoNotDisown

- indicates an ownVM from a callback was made when

the vm was still owned.

- not to be used explicitly by clients

- only set by ownVMFromUnidentifiedThread

This is the entry-point for plugins and primitives that wish to release the VM while

performing some operation that may potentially block, and for callbacks returning

back to some blocking operation. If this thread does not reclaim the VM before-

hand then when the next heartbeat occurs the thread manager will schedule a

thread to acquire the VM which may start running the VM in place of this thread.

N.B. Most of the state needed to resume after preemption is set in preemptDisowningThread."

<api>

<inline: false>

| vmThread result |

<var: #vmThread type: #'CogVMThread *'>

...

(flags anyMask: DisownVMForProcessorRelinquish) ifTrue:

[| proc |

(proc := objectMemory splObj: foreignCallbackProcessSlot) ~= objectMemory nilObject ifTrue:

[foreignCallbackPriority := self quickFetchInteger: PriorityIndex ofObject: proc].

relinquishing := true.

self sqLowLevelMFence].

So how to install a process in the foreignCallbackProcessSlot? See SmalltalkImage>>recreateSpecialObjectsArray.

SmalltalkImage>>recreateSpecialObjectsArray

"Smalltalk recreateSpecialObjectsArray"

"To external package developers:

**** DO NOT OVERRIDE THIS METHOD. *****

If you are writing a plugin and need additional special object(s) for your own use,

use addGCRoot() function and use own, separate special objects registry "

"The Special Objects Array is an array of objects used by the Squeak virtual machine.

Its contents are critical and accesses to it by the VM are unchecked, so don't even

think of playing here unless you know what you are doing."

| newArray |

newArray := Array new: 60.

"Nil false and true get used throughout the interpreter"

newArray at: 1 put: nil.

newArray at: 2 put: false.

newArray at: 3 put: true.

...

"Used to be WeakFinalizationList for WeakFinalizationList hasNewFinalization, obsoleted by ephemeron support."

newArray at: 56 put: nil.

"reserved for foreign callback process"

>>> newArray at: 57 put: (self specialObjectsArray at: 57 ifAbsent: []).

newArray at: 58 put: #unusedBytecode.

"59 reserved for Sista counter tripped message"

newArray at: 59 put: #conditionalBranchCounterTrippedOn:.

"60 reserved for Sista class trap message"

newArray at: 60 put: #classTrapFor:.

"Now replace the interpreter's reference in one atomic operation"

self specialObjectsArray becomeForward: newArray

So chose a priority to run foreign callbacks at, write a pair of accessors to set and get the foreign callback process in the specialObjectsArray. Set the slot to a new process ([] newProcess priority: N; yourself), save and restart the image (to allow the system to initialize correctly), and then see how you get on. I expect you'll see crashes soon enough because...the MT VM is a prototype. But you may just get lucky. I certainly hope so!

Ah yes, you also have to arrange that when that process runs it replaces itself in the foreignCallbackSlot, so you'll have to work out a scheme to handle multiple foreign callbacks. I wrote a scheme for VW, but haven't got this far with the Cog MT VM.

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

Backtrace plus register contents please

On Wed, Oct 7, 2015 at 8:16 AM, Craig Latta <[hidden email]> wrote:

Things seem to be going south on assertCStackWellAligned() near the
beginning of sendInvokeCallbackContext().

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

In reply to this post by Eliot Miranda-2

Hi Craig,

so the code in tryLockVMOwner is

mov $1, %eax

mfence

xchg %eax, vmOwnerLock

sfence

sub $1, %eax "Since we only ever set the lock to 1 or 0, subtracting 1 sets EAX to 0 if the lock was already locked and non-zero if it wasn't."

ret

i.e. it attempts to set vmOwnerLock and answers if the attempt succeeded, where it will only succeed if vmOwnerLock was zero. So I guess the system is never arranging to unlock the Vm so there is never any opportunity for the foreign callback to get entry. Try adding an occasional "Processor relinquishProcessorForMicroseconds: 100" and see if that gives your call-back a window. Ideally you'd only do such a thing when the system is ready to accept callbacks. I expect the foreign callback logic is wrong in only allowing callbacks when the VM is unlocked. Instead, the system should probably see if the foreign callback priority is greater than the current process's priority, and then somehow preempt the current process (e.g. by setting stackLimit appropriately). But I have no cycles to think about this right now.

On Thu, Oct 8, 2015 at 6:10 AM, Craig Latta <[hidden email]> wrote:

Hi Eliot--

Well, I have a callback that gets as far as
ownVMFromUnidentifiedThread() now, but I just keep spinning on
tryLockVMOwner(). Help?

thanks again,

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

On Thu, Oct 8, 2015 at 3:43 PM, Craig Latta <[hidden email]> wrote:

> I guess the system is never arranging to unlock the Vm so there is
> never any opportunity for the foreign callback to get entry. Try
> adding an occasional "Processor relinquishProcessorForMicroseconds:
> 100" and see if that gives your call-back a window.

Well, my system is deadlocked in the foreign thread, forever trying
and failing to own the VM when the library I'm using attempts to invoke
my callback. I guess I have to do something more drastic, like kludge
the unlock logic.

How should the system be arranging to unlock the VM?

Hmmm, the code is better than I thought. So here's what it does:

ownVMFromUnidentifiedThread

...

count := 0. "If the current thread doesn't have an index it's new to the vm and we need to allocate a new threadInfo, failing if we can't. We also need a process in the foreignCallbackProcessSlot upon which to run the thread's eventual callback." [[cogThreadManager tryLockVMToIndex: -1] whileFalse: [self waitingPriorityIsAtLeast: foreignCallbackPriority. cogThreadManager ioTransferTimeslice]. (objectMemory splObj: foreignCallbackProcessSlot) ~= objectMemory nilObject] whileFalse: [cogThreadManager releaseVM. (count := count + 1) > 1000 ifTrue: [^-2]. cogThreadManager ioMilliSleep: 1].

and

waitingPriorityIsAtLeast: minPriority "Set the maxWaitingPriority to at least minPriority on behalf of a thread wanting to acquire the VM. If maxWaitingPriority is increased, schedule a thread activation check asap." maxWaitingPriority < minPriority ifTrue: [maxWaitingPriority := minPriority. checkThreadActivation := true. self forceInterruptCheck]

So forceInterruptCheck will force the VM to break out of machine code, if it is in machine code (this we need to check; for example it might be doing relinquishProcessorForMicroseconds:). Once in checkForEventsMayContextSwitch (the code that gets called to process events) there is an attempt to attend to checkThreadActivation:

checkForEventsMayContextSwitch: mayContextSwitch "Check for possible interrupts and handle one if necessary. Answer if a context switch has occurred." ... switched := false. self assert: deferThreadSwitch not. deferThreadSwitch := true. ... "inIOProcessEvents prevents reentrancy into ioProcessEvents and allows disabling ioProcessEvents e.g. for native GUIs. We would like to manage that here but can't since the platform code may choose to call ioProcessEvents itself in various places." (now := self ioUTCMicroseconds) >= nextPollUsecs ifTrue: [statIOProcessEvents := statIOProcessEvents + 1. self ioProcessEvents. "sets interruptPending if interrupt key pressed; may callback" nextPollUsecs := now + 20000 "msecs to wait before next call to ioProcessEvents. Note that strictly speaking we might need to update 'now' at this point since ioProcessEvents could take a very long time on some platforms"]. interruptPending ifTrue: [interruptPending := false. "reset interrupt flag" sema := objectMemory splObj: TheInterruptSemaphore. (sema ~= objectMemory nilObject and: [self synchronousSignal: sema]) ifTrue: [switched := true]]. nextWakeupUsecs ~= 0 ifTrue: [now >= nextWakeupUsecs ifTrue: [nextWakeupUsecs := 0. "set timer interrupt to 0 for 'no timer'" sema := objectMemory splObj: TheTimerSemaphore. (sema ~= objectMemory nilObject and: [self synchronousSignal: sema]) ifTrue: [switched := true]]]. ... deferThreadSwitch := false. checkThreadActivation ifTrue: [checkThreadActivation := false. self cedeToHigherPriorityThreads]. "N.B. This may not return if we do switch." self threadSwitchIfNecessary: self activeProcess from: CSCheckEvents. ^switched

So cedeToHigherPriorityThreads should do what it says. What does it do?

Well it does lots of clever things, but AFAICT it *doesn't* unlock the VM if a higher-priority foreign callback is trying to own the VM. So that would be the bug.

If you want to make progress on this you'll need to

a) build a simulator image (instructions on the blog)

b) set-up a test case in the simulator to exercise the cedeToHigherPriorityThreads logic and replicate the bug

c) have a go at rewriting cedeToHigherPriorityThreads and/or threadSwitchIfNecessary:from: to do the right thing.

Looks like the issue is at the bottom of threadSwitchIfNecessary:from: :

threadSwitchIfNecessary: newProc from: sourceCode

"Invoked from transferTo:from: to switch threads if the new process is bound or affined to some other thread."

....

(self quickFetchInteger: PriorityIndex ofObject: newProc) < maxWaitingPriority ifTrue:

[checkThreadActivation := true.

self forceInterruptCheck]

This will just spin causing the VM to do checkForEventsMayContextSwitch:.

Instead it probably needs to do something like

(self quickFetchInteger: PriorityIndex ofObject: newProc) < maxWaitingPriority ifTrue:

[checkThreadActivation := true.

self returnToSchedulingLoopAndReleaseVMOrWakeThread: vmThread source: source]

But this really needs to be thought through carefully. It's a while since I thought about this and I'm in no position to do so now.

Anyway, HTH

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

On Thu, Oct 8, 2015 at 4:33 PM, Craig Latta <[hidden email]> wrote:

Well, I threw in a (ProcessorScheduler
relinquishProcessorForMicroseconds: 100) into one of the successful
vm-thread callbacks, and now the foreign-thread callback hangs waiting
for the ioWaitOnOSSemaphore() in ownVM(). So I'm not too confident that
fixing the bug in threadSwitchIfNecessary:from: will get me moving
again. But I'll bang on it anyway, I guess.

I am now in hell...

Welcome to my world ;-)

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

In reply to this post by Eliot Miranda-2

Hi Craig,

On Thu, Oct 8, 2015 at 4:59 PM, Craig Latta <[hidden email]> wrote:

> If you want to make progress on this you'll need to
>
> a) build a simulator image (instructions on the blog)
> b) set-up a test case in the simulator to exercise the
> cedeToHigherPriorityThreads logic and replicate the bug
> c) have a go at rewriting cedeToHigherPriorityThreads and/or
> threadSwitchIfNecessary:from: to do the right thing.

Hm, I have no idea how to set up such a test case meaningfully.
This seems like one of those situations where you'd use a "plugin
simulator" (like the one for BitBlt) that relies on the behavior you're
debugging already working properly as simulation support in the
simulation host.

All you'd do is simulate a callback occurring while the VM is doing something. So...

a) write a scratch primitive called e.g. primitiveFakeAForeignCallback which forks (at a higher priority?, or at least does a yield?), and calls the relevant machinery with faked up callback data (so you can avoid all the stuff in thinkProcess which is written in C)

b) use the reader image to run this primitive via a doit containing the primitiveFakeAForeignCallback followed by some time consuming computation like 0 tinyBenchmarks.

So you start up the reader image, and type in the DoIt (followed by a !) into the dialog that stands in for reading from stdin. From your DoIt the foreign callback will start its processing in primitiveFakeAForeignCallback, and get blocked trying to take ownership of the vm. primitiveFakeAForeignCallback will return and the time-consuming computation will start and soon enough get interrupted, so the VM will enter checkForEventsMayContextSwitch: and you can step through the system to see the foreign callback attempt to get control of the VM.

BTW, this is /monstrously/ cool of you!

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

ccrraaiigg

re: "callback failed to own the VM"

> a) write a scratch primitive called e.g. primitiveFakeAForeignCallback
> which forks (at a higher priority?, or at least does a yield?), and
> calls the relevant machinery with faked up callback data (so you can
> avoid all the stuff in thinkProcess which is written in C)

Which machinery is that? callbackEnter:? I have the simulator and
reader image going.

> BTW, this is /monstrously/ cool of you!

Sure thing!

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

In reply to this post by Eliot Miranda-2

On Thu, Oct 8, 2015 at 6:27 PM, Craig Latta <[hidden email]> wrote:

> Well, I threw a (ProcessorScheduler
> relinquishProcessorForMicroseconds: 100) into one of the successful
> vm-thread callbacks, and now the foreign-thread callback hangs waiting
> for the ioWaitOnOSSemaphore() in ownVM(). So I'm not too confident
> that fixing the bug in cedeToHigherPriorityThreads and/or
> threadSwitchIfNecessary:from: will get me moving again. But I'll bang
> on it anyway, I guess.

I also notice that, at the point where things just start spinning,
I've got four threads, three of which (interpreter loop, thread
scheduling loop, foreign callback thread) are waiting on VM thread
semaphores. How is the other thread, the heartbeat thread, supposed to
get things moving?

The only rôle the heartbeat thread has in multithreading is spotting when the VM is unowned and starting off a thread to take ownership. e.g. some long-running callout occurs, which first disowns the VM. It is the heartbeat which spots that the VM is unowned and it might be sensible to start another thread to try and run the VM, hence not slowing down the callout by having it start a thread on each callout.

What's supposed to stop the threads spinning is proper interaction between them. that this isn't happening is a bug. As I said, this is a prototype work in progress.

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

re: "callback failed to own the VM"

In reply to this post by ccrraaiigg

Hi Craig,

On Fri, Oct 9, 2015 at 3:36 AM, Craig Latta <[hidden email]> wrote:

> a) write a scratch primitive called e.g. primitiveFakeAForeignCallback
> which forks (at a higher priority?, or at least does a yield?), and
> calls the relevant machinery with faked up callback data (so you can
> avoid all the stuff in thinkProcess which is written in C)

Which machinery is that? callbackEnter:? I have the simulator and
reader image going.

here's the edited-down guts of thunkEntry, which is what you wasn't to simulate:

long

thunkEntry(void *thunkp, long *stackp)

{

...

if ((flags = interpreterProxy->ownVM(0)) < 0) {

fprintf(stderr,"Warning; callback failed to own the VM\n");

return -1;

}

if (!(returnType = setjmp(vmcc.trampoline))) {

...

interpreterProxy->sendInvokeCallbackContext(&vmcc);

fprintf(stderr,"Warning; callback failed to invoke\n");

...

interpreterProxy->disownVM(flags);

return -1;

}

...

interpreterProxy->disownVM(flags);

switch (returnType) {

case retword: return vmcc.rvs.valword;

...

So I expect you want to do something like

[self ownVM: 0.

self sendInvokeCallbackContext: fakeVMCallbackContextAddress] forkAt: Processor activePriority + 1

You could even create a CArray or some such to hold onto the data for the callback (fakeVMCallbackContextAddress above).

> BTW, this is /monstrously/ cool of you!

Sure thing!

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

On Sat, Oct 10, 2015 at 11:11 AM, Craig Latta <[hidden email]> wrote:

Any advice about how the trace log could be useful here? I notice
ownVM etc. write to it.

For example, you can use it to generate a log of important thread-switching events, to get an idea of what's going on.

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: [Vm-dev] re: "callback failed to own the VM"

In reply to this post by Eliot Miranda-2

Hi Craig,

no time soon. My priorities this year are to finish the x64 JIT and do contracted-for work. I hope the Threaded VM will become a priority some time next year, but it will have to wait until one of my sponsors requires it.

On Mon, Oct 12, 2015 at 4:07 PM, Craig Latta <[hidden email]> wrote:

> ...I have no cycles to think about this right now.

When do you think you might? I'm getting nowhere and should drop it.

thanks,

-C

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

_,,,^..^,,,_

best, Eliot