VM Maker: VMMaker.oscog-eem.2520.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

VM Maker: VMMaker.oscog-eem.2520.mcz

commits-2
 
Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.2520.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.2520
Author: eem
Time: 6 February 2019, 4:40:58.448524 pm
UUID: 6031f286-1f25-45df-ba77-ee65f80df015
Ancestors: VMMaker.oscog-eem.2518, VMMaker.oscog-KenD.2519

Merge with VMMaker.oscog-KenD.2518 & VMMaker.oscog-KenD.2519.

ARM64:
Non-register struct returns now work.
[Needs corresponding vm]

Struct return in registers works.
All FFI tests now pass (need to add more).

=============== Diff against VMMaker.oscog-eem.2518 ===============

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>ffiCall:ArgArrayOrNil:NumArgs: (in category 'callout support') -----
+ ffiCall: externalFunction ArgArrayOrNil: argArrayOrNil NumArgs: nArgs
+ "Generic callout. Does the actual work.  If argArrayOrNil is nil it takes args from the stack
+ and the spec from the method.  If argArrayOrNil is not nil takes args from argArrayOrNil
+ and the spec from the receiver."
+ | flags argTypeArray address argType oop argSpec argClass err theCalloutState calloutState requiredStackSize stackSize allocation result primNumArgs |
+ <inline: true>
+ <var: #theCalloutState type: #'CalloutState'>
+ <var: #calloutState type: #'CalloutState *'>
+ <var: #allocation type: #'char *'>
+
+ primNumArgs := interpreterProxy methodArgumentCount.
+ (interpreterProxy is: externalFunction KindOfClass: interpreterProxy classExternalFunction) ifFalse:
+ [^self ffiFail: FFIErrorNotFunction].
+ "Load and check the values in the externalFunction before we call out"
+ flags := interpreterProxy fetchInteger: ExternalFunctionFlagsIndex ofObject: externalFunction.
+ interpreterProxy failed ifTrue:
+ [^self ffiFail: FFIErrorBadArgs].
+
+ "This must come early for compatibility with the old FFIPlugin.  Image-level code
+ may assume the function pointer is loaded eagerly.  Thanks to Nicolas Cellier."
+ address := self ffiLoadCalloutAddress: externalFunction.
+ interpreterProxy failed ifTrue:
+ [^0 "error code already set by ffiLoadCalloutAddress:"].
+
+ argTypeArray := interpreterProxy fetchPointer: ExternalFunctionArgTypesIndex ofObject: externalFunction.
+ "must be array of arg types"
+ ((interpreterProxy isArray: argTypeArray)
+ and: [(interpreterProxy slotSizeOf: argTypeArray) = (nArgs + 1)]) ifFalse:
+ [^self ffiFail: FFIErrorBadArgs].
+ "check if the calling convention is supported"
+ self cppIf: COGMTVM
+ ifTrue:
+ [(self ffiSupportsCallingConvention: (flags bitAnd: FFICallTypesMask)) ifFalse:
+ [^self ffiFail: FFIErrorCallType]]
+ ifFalse: "not masking causes threaded calls to fail, which is as they should if the plugin is not threaded."
+ [(self ffiSupportsCallingConvention: flags) ifFalse:
+ [^self ffiFail: FFIErrorCallType]].
+
+ requiredStackSize := self externalFunctionHasStackSizeSlot
+ ifTrue: [interpreterProxy
+ fetchInteger: ExternalFunctionStackSizeIndex
+ ofObject: externalFunction]
+ ifFalse: [-1].
+ interpreterProxy failed ifTrue:
+ [^interpreterProxy primitiveFailFor: (argArrayOrNil isNil
+ ifTrue: [PrimErrBadMethod]
+ ifFalse: [PrimErrBadReceiver])].
+ stackSize := requiredStackSize < 0 ifTrue: [DefaultMaxStackSize] ifFalse: [requiredStackSize].
+ self cCode: [] inSmalltalk: [theCalloutState := self class calloutStateClass new].
+ calloutState := self addressOf: theCalloutState.
+ self cCode: [self me: calloutState ms: 0 et: (self sizeof: #CalloutState)].
+ calloutState callFlags: flags.
+ "Fetch return type and args"
+ argType := interpreterProxy fetchPointer: 0 ofObject: argTypeArray.
+ argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
+ argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
+ (err := self ffiCheckReturn: argSpec With: argClass in: calloutState) ~= 0 ifTrue:
+ [^self ffiFail: err]. "cannot return"
+ "alloca the outgoing stack frame, leaving room for marshalling args, and including space for the return struct, if any.
+ Additional space reserved for saving register args like mandated by Win64 X64 or PPC ABI, will be managed by the call itself"
+ allocation := self alloca: stackSize + calloutState structReturnSize + self cStackAlignment.
+ self mustAlignStack ifTrue:
+ [allocation := self cCoerce: (allocation asUnsignedIntegerPtr bitClear: self cStackAlignment - 1) to: #'char *'].
+ calloutState
+ argVector: allocation;
+ currentArg: allocation;
+ limit: allocation + stackSize.
+ 1 to: nArgs do:
+ [:i|
+ argType := interpreterProxy fetchPointer: i ofObject: argTypeArray.
+ argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
+ argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
+ oop := argArrayOrNil isNil
+ ifTrue: [interpreterProxy stackValue: nArgs - i]
+ ifFalse: [interpreterProxy fetchPointer: i - 1 ofObject: argArrayOrNil].
+ err := self ffiArgument: oop Spec: argSpec Class: argClass in: calloutState.
+ err ~= 0 ifTrue:
+ [self cleanupCalloutState: calloutState.
+ self cppIf: COGMTVM ifTrue:
+ [err = PrimErrObjectMayMove negated ifTrue:
+ [^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
+ ^self ffiFail: err]]. "coercion failed or out of stack space"
+ "Failures must be reported back from ffiArgument:Spec:Class:in:.
+ Should not fail from here on in."
+ self assert: interpreterProxy failed not.
+ self ffiLogCallout: externalFunction.
+ (requiredStackSize < 0
+ and: [self externalFunctionHasStackSizeSlot]) ifTrue:
+ [stackSize := calloutState currentArg - calloutState argVector.
+ interpreterProxy storeInteger: ExternalFunctionStackSizeIndex ofObject: externalFunction withValue: stackSize].
+ "Go out and call this guy"
+ result := self ffiCalloutTo: address SpecOnStack: argArrayOrNil notNil in: calloutState.
+ self cleanupCalloutState: calloutState.
+ "Can not safely use argumentCount (via e.g. methodReturnValue:) since it may have been changed by a callback."
+ interpreterProxy pop: primNumArgs + 1 thenPush: result.
+ ^result!

Item was changed:
  ----- Method: ThreadedARM64FFIPlugin>>ffiCalloutTo:SpecOnStack:in: (in category 'callout support') -----
  ffiCalloutTo: procAddr SpecOnStack: specOnStack in: calloutState
  <var: #procAddr type: #'void *'>
  <var: #calloutState type: #'CalloutState *'>
  <var: #loadFloatRegs declareC: 'extern void loadFloatRegs(double, double, double, double, double, double, double, double)'>
  "Go out, call this guy and create the return value.  This *must* be inlined because of
  the alloca of the outgoing stack frame in ffiCall:WithFlags:NumArgs:Args:AndTypes:"
+ | myThreadIndex atomicType floatRet intRet x1Ret |
- | myThreadIndex atomicType floatRet intRet |
  <var: #floatRet type: #double>
  <var: #intRet type: #usqLong>
+ <var: #x1Ret type: #usqLong>
  <inline: true>
  myThreadIndex := interpreterProxy disownVM: (self disownFlagsFor: calloutState).
+
-
  calloutState floatRegisterIndex > 0 ifTrue:
  [self loadFloatRegs:
    (calloutState floatRegisters at: 0)
  _: (calloutState floatRegisters at: 1)
  _: (calloutState floatRegisters at: 2)
  _: (calloutState floatRegisters at: 3)
  _: (calloutState floatRegisters at: 4)
  _: (calloutState floatRegisters at: 5)
  _: (calloutState floatRegisters at: 6)
  _: (calloutState floatRegisters at: 7)].
 
  (self allocaLiesSoSetSpBeforeCall or: [self mustAlignStack]) ifTrue:
  [self setsp: calloutState argVector].
 
  atomicType := self atomicTypeOf: calloutState ffiRetHeader.
  (atomicType >> 1) = (FFITypeSingleFloat >> 1) ifTrue:
  [atomicType = FFITypeSingleFloat
  ifTrue:
  [floatRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
  with: (calloutState integerRegisters at: 4)
  with: (calloutState integerRegisters at: 5)
  with: (calloutState integerRegisters at: 6)
  with: (calloutState integerRegisters at: 7)]
  ifFalse: "atomicType = FFITypeDoubleFloat"
  [floatRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'double (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
  with: (calloutState integerRegisters at: 4)
  with: (calloutState integerRegisters at: 5)
  with: (calloutState integerRegisters at: 6)
  with: (calloutState integerRegisters at: 7)].
 
  "undo any callee argument pops because it may confuse stack management with the alloca."
  (self isCalleePopsConvention: calloutState callFlags) ifTrue:
  [self setsp: calloutState argVector].
  interpreterProxy ownVM: myThreadIndex.
 
  ^interpreterProxy floatObjectOf: floatRet].
 
+ "If struct address used for return value, call is special"
+ (self mustReturnStructOnStack: calloutState structReturnSize)
+ ifTrue: [
+ intRet := 0.
+ self setReturnRegister: (self cCoerceSimple: calloutState limit to: 'sqLong') "stack alloca'd struct"
+ andCall: (self cCoerceSimple: procAddr to: 'sqLong')
+ withArgsArray: (self cCoerceSimple: (self addressOf: calloutState integerRegisters) to: 'sqLong').
+ ] ifFalse: [
+ intRet := self
- intRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'usqIntptr_t (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
  with: (calloutState integerRegisters at: 4)
  with: (calloutState integerRegisters at: 5)
  with: (calloutState integerRegisters at: 6)
  with: (calloutState integerRegisters at: 7).
+
+ x1Ret := self getX1register. "Capture x1 immediately. No problem if unused"
+ ].
+ "If struct returned in registers,
+ place register values into calloutState integerRegisters"
+ (calloutState structReturnSize > 0
+ and: [self returnStructInRegisters: calloutState structReturnSize]) ifTrue:
+ ["Only 2 regs used in ARMv8/Aarch64 current"
+ calloutState integerRegisters at: 0 put: intRet. "X0"
+ calloutState integerRegisters at: 1 put: x1Ret]. "X1"
 
  "undo any callee argument pops because it may confuse stack management with the alloca."
  (self isCalleePopsConvention: calloutState callFlags) ifTrue:
  [self setsp: calloutState argVector].
  interpreterProxy ownVM: myThreadIndex.
 
  (calloutState ffiRetHeader anyMask: FFIFlagPointer+FFIFlagStructure) ifTrue:
  ["Note: Order is important here since FFIFlagPointer + FFIFlagStructure is used to represent
  'typedef void* VoidPointer' and VoidPointer must be returned as pointer *not* as struct."
  (calloutState ffiRetHeader anyMask: FFIFlagPointer) ifTrue:
  [^self ffiReturnPointer: intRet ofType: (self ffiReturnType: specOnStack) in: calloutState].
  ^self ffiReturnStruct: intRet ofType: (self ffiReturnType: specOnStack) in: calloutState].
 
  ^self ffiCreateIntegralResultOop: intRet ofAtomicType: atomicType in: calloutState!

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>ffiReturnStruct:ofType:in: (in category 'callout support') -----
+ ffiReturnStruct: longLongRet ofType: ffiRetType in: calloutState
+ <var: #longLongRet type: #usqLong>
+ <var: #calloutState type: #'CalloutState *'>
+ "Create a structure return value from an external function call.  The value has been stored in
+ alloca'ed space pointed to by the calloutState or in the return value."
+ | retOop retClass oop |
+ <inline: true>
+ retClass := interpreterProxy fetchPointer: 1 ofObject: ffiRetType.
+ retOop := interpreterProxy instantiateClass: retClass indexableSize: 0.
+ self remapOop: retOop
+ in: [oop := interpreterProxy
+ instantiateClass: interpreterProxy classByteArray
+ indexableSize: calloutState structReturnSize].
+ self memcpy: (interpreterProxy firstIndexableField: oop)
+ _: ((self returnStructInRegisters: calloutState structReturnSize)
+ ifTrue: [self addressOf: calloutState integerRegisters]
+ ifFalse: [calloutState limit])
+ _: calloutState structReturnSize.
+ interpreterProxy storePointer: 0 ofObject: retOop withValue: oop.
+ ^retOop!

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>getX1register (in category 'callout support') -----
+ getX1register
+
+ <inline: true>
+ <var: #returnX1value declareC: 'extern sqLong returnX1value()'>
+
+ ^self returnX1value!

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>mustReturnStructOnStack: (in category 'marshalling') -----
+ mustReturnStructOnStack: returnStructSize
+ "Answer if a struct result of a given size is unable to be returned in registers."
+ ^returnStructSize > (2 * self wordSize)!

Item was changed:
  ----- Method: ThreadedARM64FFIPlugin>>returnStructInRegisters: (in category 'marshalling') -----
  returnStructInRegisters: returnStructSize
+ "Answer if a struct result of a given size is able to be returned in registers.
+ NB: this is a predicate!! #returnStructInRegisters: does NOT return a struct in anything!!"
+ ^returnStructSize <= (2 * self wordSize)!
- "Answer if a struct result of a given size is returned in memory or not."
- ^returnStructSize <= (2 * self wordSize) "??numIntRegs--> 8 * wordsize??"!

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>setReturnRegister:andCall:withArgsArray: (in category 'callout support') -----
+ setReturnRegister: structAddr andCall: procAddr withArgsArray: arrayAddr
+
+ <inline: true>
+ <var: #structAddr type: #'sqLong'>
+ <var: #procAddr  type: #'sqLong'>
+ <var: #arrayAddr type: #'sqLong'>
+ <var: #callAndReturnWithStructAddr declareC: 'extern void callAndReturnWithStructAddr(sqLong structAddr,sqLong procAddr,sqLong arrayAddr)'>
+
+ self callAndReturnWithStructAddr: structAddr _: procAddr _: arrayAddr!

Item was changed:
  ThreadedFFICalloutState subclass: #ThreadedFFICalloutStateForARM64
  instanceVariableNames: 'integerRegisterIndex floatRegisterIndex integerRegisters floatRegisters'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'VMMaker-Plugins-FFI'!
+
+ !ThreadedFFICalloutStateForARM64 commentStamp: 'KenD 2/4/2019 09:57' prior: 0!
+ A ThreadedFFICalloutStateForARM64 is used to pass FFI parameters for foreign calls.
+
+ Instance Variables
+ floatRegisterIndex: Integer
+ floatRegisters: Array of SqInt [64 bit]
+ integerRegisterIndex: Integer
+ integerRegisters: Array of Double
+
+ !

Item was changed:
  ----- Method: ThreadedFFICalloutStateForARM64>>initialize (in category 'initialize-release') -----
  initialize
  super initialize.
  integerRegisterIndex := 0.
+ floatRegisterIndex     := 0.
+ resultAddress            := 0. "Only used for struct returns which do not fit in registers"
- floatRegisterIndex := 0.
  integerRegisters := CArrayAccessor on: (Array new: ThreadedARM64FFIPlugin numIntRegArgs).
+ floatRegisters     := CArrayAccessor on: (Array new: ThreadedARM64FFIPlugin numFloatRegArgs).
+ !
- floatRegisters     := CArrayAccessor on: (Array new: ThreadedARM64FFIPlugin numFloatRegArgs)!

Item was changed:
  InterpreterPlugin subclass: #ThreadedFFIPlugin
  instanceVariableNames: 'ffiLogEnabled externalFunctionInstSize ffiLastError allocationMap'
  classVariableNames: 'DefaultMaxStackSize ExternalFunctionAddressIndex ExternalFunctionArgTypesIndex ExternalFunctionFlagsIndex ExternalFunctionStackSizeIndex MaxNumArgs'
  poolDictionaries: 'FFIConstants'
  category: 'VMMaker-Plugins-FFI'!
 
+ !ThreadedFFIPlugin commentStamp: 'eem 2/6/2019 16:40' prior: 0!
- !ThreadedFFIPlugin commentStamp: 'eem 1/6/2019 17:35' prior: 0!
  This plugin provides access to foreign function interfaces on those platforms that provide such. For example Windows DLLs and unix .so's.  This version is designed to support reentrancy and threading, and so uses alloca to stack allocate all memory needed for a given callout.  Specific platforms are implemented by concrete subclasses.  Threaded calls can only be provided within the context of the threaded VM; othewise calls must be blocking.  So code specific to threading is guarded with a
  self cppIf: COGMTVM
  ifTrue: [...]
  form to arrange that it is only compiled in the threaded VM context.
 
  The callout primitives consume a type spec that defines the signature of the function to be called and a vector of arguments.  The type spec may be extracted from the method containing an FFI pragma or as an explicit parameter.  The arguments to be passed may either be arguments to the the method containing an FFI pragma or as an explicit Array of parameters.
 
+ Space is allocated to house the marshalled parameters and the type spec and arguments are then parsed to marshall the actual parameters into that space.  The space is some combination of alloca'ed (stack allocated) memory for parameters passed on the stack and a "callout state" struct (an instance of ThreadedFFICalloutState) to hold any parameters to be passed in registers.
- Space is allocated to house the marshalled parameters and the type spec and arguments are then parsed to marshall the actual parameters into that space.  The space is some combination of alloc'ed memory for parameters passed on the stack and a "callout state" struct (an instance of ThreadedFFICalloutState) to hold any parameters to be passed in registers.
 
+ By using C's facilities appropriately we can arrange that the C compiler generates code for passing all parameters, avoiding having to descend to the assembler or machine-code level (*). The basic scheme is to use alloca to stack allocate space for passing stacked parameters, since the memory allocated by alloca is at top-of-stack and in exactly the right place for parameter passing (*), and to invoke the function to be called with as many arguments as there are integer register parameters in the calling convention.  For example, on x86/IA32 there are no register parameters and all arguments are passed on the stack, while on ARMv4/5/6 there are four integer register parameters.  Since float results are typically answered through a floating-point register and integer/pointer results answered through one (or, on 32-bits for 64-bit results, two) register(s).  So on x86 the function to be called is invoked using
- By using C's facilities appropriately we can arrange that the C compiler generates code for passing all parameters, avoiding having to descend to the assembler or machine-code level (*). The basic scheme is to use alloca to stack allocate space for passing stacked parameters, since the memory allocated by alloca s at top-of-stack and in exactly the right place for parameter passing (*), and to invoke the function to be called with as many arguments as there are integer register parameters in the calling convention.  For example, on x86/IA32 there are no register parameters and all arguments are passed on the stack, while on ARMv4/5/6 there are four integer register parameters.  Since float results are typically answered through a floating-point register and integer/pointer results answered through one (or, on 32-bits for 64-0bit results, two) register.  So on x86 the function to be called is invoked using
 
  floatRet := self dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'double (*)()')
  or
  intRet := self dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'usqLong (*)()')
 
  On ARMv4/5/6 it is invoked with
 
  floatRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
  or
  floatRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'double (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
  or
  intRet := self
  dispatchFunctionPointer: (self cCoerceSimple: procAddr to: 'usqIntptr_t (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)')
  with: (calloutState integerRegisters at: 0)
  with: (calloutState integerRegisters at: 1)
  with: (calloutState integerRegisters at: 2)
  with: (calloutState integerRegisters at: 3)
 
+ Hence the C compiler generates the code to pass register parameters appropriately.  All we have to do is provide sufficient register parameters and give the function (pointer) to be called a suitable type.
- Hence the C compiler generates the code to pass regoster parameters appropriately.  All we have to do is provide sufficient register parameters and give the function (pointer) to be called a suitable type.
 
+ Likewise we also persuade the C compiler to generate code to load any floating-point register arguments by preceding any call of a function that takes floating-point arguments passed in registers with a call to loadFloatRegs with as many floating-point parameters as there are floating-point parameter registers.  loadFloatRegs is implemented in platforms//Cross/plugins/SqueakFFIPrims/sqFFIPlugin.c as an empty function.  So floating-point registers are passed via calls such as:
- Likewise we also persuade the C compiler to generate code to load any floating-point register arguments by preceding any call of a function that takes floating-point arguments passed in registers with a call to loadFloatRegs with as many floating-point parameters as trhere are floating-point parameter rtegisters.  loadFloatRegs is implementyed in platforms//Cross/plugins/SqueakFFIPrims/sqFFIPlugin.c as an empty function.  So floating-point registers are passed via calls such as:
 
  calloutState floatRegisterIndex > 0 ifTrue:
  [self loadFloatRegs:
    ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 0)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 2)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 4)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 6)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 8)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 10)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 12)) to: #'double *') at: 0)
  _: ((self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: 14)) to: #'double *') at: 0)
 
  (*) some implementations of alloca do not answer the actual top-of-stack, but may answer an address a word or two away.  On these implementations we merely derive the top-of-stack instead of using the result of alloca, or perhaps set the stack pointer to the result of alloca, as befits the platform.
 
+ Callbacks are not handled by this plugin.  Instead they are handled by image-level code to create Alien "thunks", small sequences of machine code, that invoke a function thunkEntry, implemented as required by each platform in one of the files in platforms/Cross/plugins/IA32ABI (a regrettable name; it should be something like platforms/Cross/plugins/AlienCallbacks).  Each platform's thunkEntry has a signature that takes all integer register parameters (if any), all floating-point register parameters (if any), a pointer to the thunk, and the stack pointer at the point the thunk was invoiked.  To pass a callback to foreign code, the marshaller in this plugin passes the address of the thunk.  When external code calls that address the think is invoked, and it invokes thunkEntry as required.  thunkEntry then saves its parameters locally in an instance of VMCallbackContext, which includes a jmpbuf.  thunkEntry then does a setjmp and enters the VM via sendInvokeCallbackContext[:], which cr
 eates an activation of invokeCallbackContext: to execute the callback.  This method then extracts the address of the thunk from the VMCallbackContext and invokes machinery in Callback to match the address with a Smalltalk block.  The block is then evaluated with the parameters extracted from the VMCallbackContext, marshalled by methods in Callback subclasses signatures protocol, which know whether a parameter would be passed in a register or on the stack.  The result of the block is then passed back to thunkEntry by storing it in a field in the VMCallbackContext and invoking primSignal:andReturnAs:fromContext:, which uses longjmp to jump back to thunkEntry which then extracts the result from its VMCallbackContext and returns.
- Callbacks are not handled by this plugin.  Insteasd they are handled by image-level code to create Alien "thunks", small sequences of machine code, that invoke a function thunkEntry, implemented as required by each platform in one of the files in platforms/Cross/plugins/IA32ABI (a regrettable name; it should be something like platforms/Cross/plugins/AlienCallbacks).  Each platform's thunkEntry has a signature that takes all integer register parameters (if any), all floating-point register parameters (if any), a pointer to the thunk, and the stack pointer at the point the thunk was invoiked.  To pass a callback to foreign code, the marshaller in this plugin passes the address of the tunk.  When external code calls that address the think is invoked, and it invokes thunkEntry as required.  thunkEntry then saves its parameters locally in an instance of VMCallbackContext, which includes a jmpbuf.  thunkEntry then does a setjmp and enters the VM via sendInvokeCallbackContext[:], which cr
 eates an activation of invokeCallbackContext: to execute the callback.  This method then extracts the address of the thunk from the VMCallbackContext and invokes machinery in Callback to match the address with a Smalltalk block.  The block is then evaluated with the parameters extracted from the VMCallbackContext, marshalled by methods in Callback subclasses signatures ptotocol, which know whether a parameter would be passed in a register or on the stack.  The result of the block is then passed back to thunkEntry by storing it in a field in the VMCallbackContext and invoking primSignal:andReturnAs:fromContext:, which uses longjmp to jump back to thunkEntry which then extracts the result from its VMCallbackContext and returns.
 
  For  example, the signature of thunkEntry on x86/IA32 (platforms/Cross/plugins/IA32ABI/ia32abicc.c) is
  long
  thunkEntry(void *thunkp, sqIntptr_t *stackp)
  whereas on ARMv4/5/6 (platforms/Cross/plugins/IA32ABI/arm32abicc.c) it is
  long long
  thunkEntry(long r0, long r1, long r2, long r3,
             double d0, double d1, double d2, double d3,
             double d4, double d5, double d6, double d7,
             void *thunkpPlus16, sqIntptr_t *stackp)!