VM Maker: VMMaker.oscog-eem.2593.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

VM Maker: VMMaker.oscog-eem.2593.mcz

commits-2
 
Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.2593.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.2593
Author: eem
Time: 25 November 2019, 4:25:37.475274 pm
UUID: 54657ad2-3651-4c19-9cdf-3699c8ae5faf
Ancestors: VMMaker.oscog-eem.2592

ARM Cogits:
Get initialization for the ARMv8 ISA to select CogARMv8Compiler/GdbARMv8Alien.
Implement CogARMv8Compiler>>stopsFrom:to:.
Use memset:_:_: instead of me:ms:et: et al.
Reduce the size of an abstract instruction on ARMv5 with out-of-line literals.  We need only two words, not the five for the in-line literals case.

=============== Diff against VMMaker.oscog-eem.2592 ===============

Item was changed:
  ----- Method: CoInterpreter>>initStackPagesAndInterpret (in category 'initialization') -----
  initStackPagesAndInterpret
  "Initialize the stack pages and enter interpret. Use alloca'ed memory so that when
  we have a JIT its stack pointer will be on the native stack since alloca allocates
  memory on the stack. Certain thread systems use the native stack pointer as the
  frame ID so putting the stack anywhere else can confuse the thread system."
 
  "Override to establish the setjmp/longjmp handler for reentering the interpreter
  from machine code, and disable executablity on the heap and stack pages."
 
  "This should be in its own initStackPages method but Slang can't inline
  C code strings."
  | stackPageBytes stackPagesBytes theStackMemory |
  <var: #theStackMemory type: #'char *'>
  stackPageBytes := self stackPageByteSize.
  stackPagesBytes := self computeStackZoneSize.
  theStackMemory := self
  cCode: [self alloca: stackPagesBytes]
  inSmalltalk: [stackPages initializeWithByteSize: stackPagesBytes for: self].
+ self cCode: [self memset: theStackMemory _: 0 _: stackPagesBytes].
- self cCode: [self me: theStackMemory ms: 0 et: stackPagesBytes].
  self sqMakeMemoryNotExecutableFrom: objectMemory startOfMemory asUnsignedInteger
  To: objectMemory memoryLimit asUnsignedInteger.
  self sqMakeMemoryNotExecutableFrom: theStackMemory asUnsignedInteger
  To: theStackMemory asUnsignedInteger + stackPagesBytes.
  stackPages
  initializeStack: theStackMemory
  numSlots: stackPagesBytes / objectMemory wordSize
  pageSize: stackPageBytes / objectMemory wordSize.
  self assert: self minimumUnusedHeadroom = stackPageBytes.
 
  "Once the stack pages are initialized we can continue to bootstrap the system."
  self loadInitialContext.
  "We're ready for the heartbeat (poll interrupt)"
  self ioInitHeartbeat.
  self initialEnterSmalltalkExecutive.
  ^nil!

Item was changed:
  ----- Method: CogARMCompiler>>machineCodeBytes (in category 'generate machine code') -----
  machineCodeBytes
+ "Answer the maximum number of bytes of machine code generated for any abstract instruction."
+ ^self subclassResponsibility!
- "Answer the maximum number of bytes of machine code generated for any abstract instruction.
- e.g. CmpCwR =>
- mov R3, #<addressByte1>, 12
- orr R3, R3, #<addressByte2>, 8
- orr R3, R3, #<addressByte3>, 4
- orr R3, R3, #<addressByte4>, 0
- cmp R?, R3"
- ^20!

Item was changed:
  ----- Method: CogARMCompiler>>machineCodeWords (in category 'generate machine code') -----
  machineCodeWords
+ ^self machineCodeBytes / 4!
- "Answer the maximum number of words of machine code generated for any abstract instruction.
- e.g. CmpCwR =>
- mov R3, #<addressByte1>, 12
- orr R3, R3, #<addressByte2>, 8
- orr R3, R3, #<addressByte3>, 4
- orr R3, R3, #<addressByte4>, 0
- cmp R?, R3"
- ^5!

Item was changed:
  ----- Method: CogARMv8Compiler class>>ISA (in category 'translation') -----
  ISA
  "Answer the name of the ISA the receiver implements."
  ^#ARMv8!

Item was added:
+ ----- Method: CogARMv8Compiler>>stop (in category 'encoding') -----
+ stop
+ "generate a HLT; C6.2.92 Arm ARM"
+ <inline: true>
+ ^2r11010100010000000000000000000000 "16rD4400000"!

Item was added:
+ ----- Method: CogARMv8Compiler>>stopsFrom:to: (in category 'generate machine code') -----
+ stopsFrom: startAddr to: endAddr
+ self assert: endAddr - startAddr + 1 \\ 4 = 0.
+ startAddr to: endAddr by: 4 do:
+ [:addr | objectMemory long32At: addr put: self stop]!

Item was changed:
  ----- Method: CogIA32Compiler>>stopsFrom:to: (in category 'generate machine code') -----
  stopsFrom: startAddr to: endAddr
  self
+ cCode: [self memset: startAddr _: self stop _: endAddr - startAddr + 1]
- cCode: [self me: startAddr ms: self stop et: endAddr - startAddr + 1]
  inSmalltalk:
  [| alignedEnd alignedStart stops |
  stops := self stop << 8 + self stop.
  stops := stops << 16 + stops.
  alignedStart := startAddr + 3 // 4 * 4.
  alignedEnd := endAddr - 1 // 4 * 4.
  alignedEnd <= startAddr
  ifTrue:
  [startAddr to: endAddr do:
  [:addr | objectMemory byteAt: addr put: self stop]]
  ifFalse:
  [startAddr to: alignedStart - 1 do:
  [:addr | objectMemory byteAt: addr put: self stop].
  alignedStart to: alignedEnd by: 4 do:
  [:addr | objectMemory long32At: addr put: stops].
  alignedEnd + 4 to: endAddr do:
  [:addr | objectMemory byteAt: addr put: self stop]]]!

Item was added:
+ ----- Method: CogInLineLiteralsARMCompiler>>machineCodeBytes (in category 'generate machine code') -----
+ machineCodeBytes
+ "Answer the maximum number of bytes of machine code generated for any abstract instruction.
+ e.g. CmpCwR =>
+ mov R3, #<addressByte1>, 12
+ orr R3, R3, #<addressByte2>, 8
+ orr R3, R3, #<addressByte3>, 4
+ orr R3, R3, #<addressByte4>, 0
+ cmp R?, R3"
+ ^20!

Item was changed:
  ----- Method: CogMIPSELCompiler class>>printFormatForOpcodeName: (in category 'debug printing') -----
  printFormatForOpcodeName: opcodeName
  "Answer a sequence of $r, $f or nil for the operands in the opcode, used for printing, where
  r => integer register, f => floating point register, and nil => numeric or address operand.
  Subclasses can override to provide a format string for their own private opcodes."
+ ^(opcodeName beginsWith: 'Br') ifTrue: [' rr'] ifFalse: [#()]!
- ^(opcodeName startsWith: 'Br') ifTrue: [' rr'] ifFalse: [#()]!

Item was added:
+ ----- Method: CogOutOfLineLiteralsARMCompiler>>machineCodeBytes (in category 'generate machine code') -----
+ machineCodeBytes
+ "Answer the maximum number of bytes of machine code generated for any abstract instruction."
+ ^8!

Item was changed:
  ----- Method: CogX64Compiler>>stopsFrom:to: (in category 'generate machine code') -----
  stopsFrom: startAddr to: endAddr
  self
+ cCode: [self memset: startAddr _: self stop _: endAddr - startAddr + 1]
- cCode: [self me: startAddr ms: self stop et: endAddr - startAddr + 1]
  inSmalltalk:
  [| alignedEnd alignedStart stops |
  stops := self stop << 8 + self stop.
  stops := stops << 16 + stops.
  stops := stops << 32 + stops.
  alignedStart := startAddr + 7 // 8 * 8.
  alignedEnd := endAddr - 1 // 8 * 8.
  alignedEnd <= startAddr
  ifTrue:
  [startAddr to: endAddr do:
  [:addr | objectMemory byteAt: addr put: self stop]]
  ifFalse:
  [startAddr to: alignedStart - 1 do:
  [:addr | objectMemory byteAt: addr put: self stop].
  alignedStart to: alignedEnd by: 8 do:
  [:addr | objectMemory long64At: addr put: stops].
  alignedEnd + 8 to: endAddr do:
  [:addr | objectMemory byteAt: addr put: self stop]]]!

Item was changed:
  ----- Method: Cogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
  super initializeMiscConstants.
  Debug := InitializationOptions at: #Debug ifAbsent: [false].
  (InitializationOptions includesKey: #EagerInstructionDecoration)
  ifTrue:
  [EagerInstructionDecoration := InitializationOptions at: #EagerInstructionDecoration]
  ifFalse:
  [EagerInstructionDecoration ifNil:
  [EagerInstructionDecoration := false]]. "speeds up single stepping but could lose fidelity"
 
  ProcessorClass := (InitializationOptions at: #ISA ifAbsentPut: [self objectMemoryClass defaultISA]) caseOf: {
+ [#X64] -> [BochsX64Alien].
- [#X64] -> [BochsX64Alien].
  [#IA32] -> [BochsIA32Alien].
  [#ARMv5] -> [GdbARMAlien].
+ [#ARMv8] -> [GdbARMv8Alien].
  [#MIPSEL] -> [MIPSELSimulator] }.
  CogCompilerClass := self activeCompilerClass.
  (CogCompilerClass withAllSuperclasses copyUpTo: CogAbstractInstruction) reverseDo:
  [:compilerClass| compilerClass initialize; initializeAbstractRegisters].
  self objectMemoryClass objectRepresentationClass initializeMiscConstants.
  "Our criterion for which methods to JIT is literal count.  The default value is 60 literals or less."
  MaxLiteralCountForCompile := InitializationOptions at: #MaxLiteralCountForCompile ifAbsent: [60].
  "we special-case 0, 1 & 2 argument sends, N is numArgs >= 3"
  NumSendTrampolines := 4.
  "Currently not even the ceImplicitReceiverTrampoline contains object references."
  NumObjRefsInRuntime := 0.
  "6 is a fine number for the max number of PCI entries.  8 is too large."
  MaxCPICCases := 6.
 
  "One variable defines whether in a block and whether in a vanilla or full block."
  InVanillaBlock := 1.
  InFullBlock := 2.
 
  NSCSelectorIndex := (NSSendCache instVarNames indexOf: #selector) - 1.
  NSCNumArgsIndex := (NSSendCache instVarNames indexOf: #numArgs) - 1.
  NSCClassTagIndex := (NSSendCache instVarNames indexOf: #classTag) - 1.
  NSCEnclosingObjectIndex := (NSSendCache instVarNames indexOf: #enclosingObject) - 1.
  NSCTargetIndex := (NSSendCache instVarNames indexOf: #target) - 1.
  NumOopsPerNSC := NSSendCache instVarNames size.
 
  "Max size to alloca when compiling.
  Mac OS X 10.6.8 segfaults approaching 8Mb.
  Linux 2.6.9 segfaults above 11Mb.
  WIndows XP segfaults approaching 2Mb."
  MaxStackAllocSize := 1024 * 1024 * 3 / 2 !

Item was changed:
  ----- Method: SpurGenerationScavenger>>computeRefCountToShrinkRT (in category 'remembered set') -----
  computeRefCountToShrinkRT
  "Some time in every scavenger's life there may come a time when someone writes code that stresses
  the remembered table.  One might conclude that if the remembered table is full, then the right thing
  to do is simply to tenure everything, emptying the remembered table.  Bt in some circumstances this
  can be counter-productive, and result in the same situation arising soon after tenuring everything.
  Instead, we can try and selectively prune the remembered table, tenuring only those objects that
  are referenced by many objects in the remembered table.  That's what this algorithm does.  It
  reference counts young objects referenced from the remembered set, and then sets a threshold
  used to tenure objects oft referenced from the remembered set, thereby allowing  the remembered
  set to shrink, while not tenuring everything.
 
  Once in a network monitoring application in a galaxy not dissimilar from the one this code inhabits,
  a tree of nodes referring to large integers was in precisely this situation.  The nodes were old, and
  the integers were in new space.  Some of the nodes referred to shared numbers, some their own
  unique numbers.  The numbers were updated frequently. Were new space simply tenured when the
  remembered table was full, the remembered table would soon fill up as new numbers were computed.
  Only by selectively pruning the remembered table of nodes that shared data, was a balance achieved
  whereby the remembered table population was kept small, and tenuring rates were low."
  <inline: #never>
  | population |
  <var: 'population' declareC: 'long population[MaxRTRefCount + 1]'>
+ self cCode: [self memset: population _: 0 _: (self sizeof: #long) * (MaxRTRefCount + 1)]
- self cCode: [self me: population ms: 0 et: (self sizeof: #long) * (MaxRTRefCount + 1)]
  inSmalltalk: [population := CArrayAccessor on: (Array new: MaxRTRefCount + 1 withAll: 0)].
  self assert: self allNewSpaceObjectsHaveZeroRTRefCount.
  self referenceCountRememberedReferents: population.
  self setRefCountToShrinkRT: population
 
  "For debugging:
  (manager allNewSpaceObjectsDo: [:o| manager rtRefCountOf: o put: 0])"!

Item was changed:
  ----- Method: SpurSegmentManager>>allocateOrExtendSegmentInfos (in category 'private') -----
  allocateOrExtendSegmentInfos
  "Increase the number of allocated segInfos by 16."
  | newNumSegs |
  numSegInfos = 0 ifTrue:
  [numSegInfos := 16.
  segments := self
+ cCode: [self calloc: numSegInfos _: (self sizeof: SpurSegmentInfo)]
- cCode: [self c: numSegInfos alloc: (self sizeof: SpurSegmentInfo)]
  inSmalltalk: [CArrayAccessor on: ((1 to: numSegInfos) collect: [:i| SpurSegmentInfo new])].
  ^self].
  newNumSegs := numSegInfos + 16.
  segments := self
+ cCode: [self realloc: segments _: newNumSegs * (self sizeof: SpurSegmentInfo)]
- cCode: [self re: segments alloc: newNumSegs * (self sizeof: SpurSegmentInfo)]
  inSmalltalk: [CArrayAccessor on: segments object,
  ((numSegInfos to: newNumSegs) collect: [:i| SpurSegmentInfo new])].
  self cCode:
  [segments = 0 ifTrue:
  [self error: 'out of memory; cannot allocate more segments'].
  self
+ memset: segments + numSegInfos
+ _: 0
+ _: newNumSegs - numSegInfos * (self sizeof: SpurSegmentInfo)].
- me: segments + numSegInfos
- ms: 0
- et: newNumSegs - numSegInfos * (self sizeof: SpurSegmentInfo)].
  numSegInfos := newNumSegs!

Item was changed:
  ----- Method: StackInterpreter>>initStackPages (in category 'initialization') -----
  initStackPages
  "Initialize the stackPages.  This version is only for simulation
  because Slang refuses to inline it, which makes the alloca invalid."
  | stackPageBytes stackPagesBytes theStackMemory |
  stackPageBytes := self stackPageByteSize.
  stackPagesBytes := self computeStackZoneSize.
  theStackMemory := self
  cCode: [self alloca: stackPagesBytes]
  inSmalltalk: [stackPages initializeWithByteSize: stackPagesBytes for: self].
+ self cCode: [self memset: theStackMemory _: 0 _: stackPagesBytes].
- self cCode: [self me: theStackMemory ms: 0 et: stackPagesBytes].
  stackPages
  initializeStack: theStackMemory
  numSlots: stackPagesBytes / objectMemory wordSize
  pageSize: stackPageBytes / objectMemory wordSize!

Item was changed:
  ----- Method: StackInterpreter>>initStackPagesAndInterpret (in category 'initialization') -----
  initStackPagesAndInterpret
  "Initialize the stack pages and enter interpret. Use alloca'ed memory so that when
  we have a JIT its stack pointer will be on the native stack since alloca allocates
  memory on the stack. Certain thread systems use the native stack pointer as the
  frame ID so putting the stack anywhere else can confuse the thread system."
 
  "This should be in its own initStackPages method but Slang can't inline
  C code strings."
  | stackPageBytes stackPagesBytes theStackMemory |
  <var: #theStackMemory type: #'void *'>
  stackPageBytes := self stackPageByteSize.
  stackPagesBytes := self computeStackZoneSize.
  theStackMemory := self
  cCode: [self alloca: stackPagesBytes]
  inSmalltalk: [stackPages initializeWithByteSize: stackPagesBytes for: self].
+ self cCode: [self memset: theStackMemory _: 0 _: stackPagesBytes].
- self cCode: [self me: theStackMemory ms: 0 et: stackPagesBytes].
  stackPages
  initializeStack: theStackMemory
  numSlots: stackPagesBytes / objectMemory wordSize
  pageSize: stackPageBytes / objectMemory wordSize.
 
  "Once the stack pages are initialized we can continue to bootstrap the system."
  self loadInitialContext.
  "We're ready for the heartbeat (poll interrupt)"
  self ioInitHeartbeat.
  self initialEnterSmalltalkExecutive.
  ^nil!

Item was changed:
  ----- Method: ThreadedARM64FFIPlugin>>ffiCall:ArgArrayOrNil:NumArgs: (in category 'callout support') -----
  ffiCall: externalFunction ArgArrayOrNil: argArrayOrNil NumArgs: nArgs
  "Generic callout. Does the actual work.  If argArrayOrNil is nil it takes args from the stack
  and the spec from the method.  If argArrayOrNil is not nil takes args from argArrayOrNil
  and the spec from the receiver."
  | flags argTypeArray address argType oop argSpec argClass err theCalloutState calloutState requiredStackSize stackSize allocation result primNumArgs |
  <inline: true>
  <var: #theCalloutState type: #'CalloutState'>
  <var: #calloutState type: #'CalloutState *'>
  <var: #allocation type: #'char *'>
 
  primNumArgs := interpreterProxy methodArgumentCount.
  (interpreterProxy is: externalFunction KindOfClass: interpreterProxy classExternalFunction) ifFalse:
  [^self ffiFail: FFIErrorNotFunction].
  "Load and check the values in the externalFunction before we call out"
  flags := interpreterProxy fetchInteger: ExternalFunctionFlagsIndex ofObject: externalFunction.
  interpreterProxy failed ifTrue:
  [^self ffiFail: FFIErrorBadArgs].
 
  "This must come early for compatibility with the old FFIPlugin.  Image-level code
  may assume the function pointer is loaded eagerly.  Thanks to Nicolas Cellier."
  address := self ffiLoadCalloutAddress: externalFunction.
  interpreterProxy failed ifTrue:
  [^0 "error code already set by ffiLoadCalloutAddress:"].
 
  argTypeArray := interpreterProxy fetchPointer: ExternalFunctionArgTypesIndex ofObject: externalFunction.
  "must be array of arg types"
  ((interpreterProxy isArray: argTypeArray)
  and: [(interpreterProxy slotSizeOf: argTypeArray) = (nArgs + 1)]) ifFalse:
  [^self ffiFail: FFIErrorBadArgs].
  "check if the calling convention is supported"
  self cppIf: COGMTVM
  ifTrue:
  [(self ffiSupportsCallingConvention: (flags bitAnd: FFICallTypesMask)) ifFalse:
  [^self ffiFail: FFIErrorCallType]]
  ifFalse: "not masking causes threaded calls to fail, which is as they should if the plugin is not threaded."
  [(self ffiSupportsCallingConvention: flags) ifFalse:
  [^self ffiFail: FFIErrorCallType]].
 
  requiredStackSize := self externalFunctionHasStackSizeSlot
  ifTrue: [interpreterProxy
  fetchInteger: ExternalFunctionStackSizeIndex
  ofObject: externalFunction]
  ifFalse: [-1].
  interpreterProxy failed ifTrue:
  [^interpreterProxy primitiveFailFor: (argArrayOrNil isNil
  ifTrue: [PrimErrBadMethod]
  ifFalse: [PrimErrBadReceiver])].
  stackSize := requiredStackSize < 0 ifTrue: [DefaultMaxStackSize] ifFalse: [requiredStackSize].
  self cCode: [] inSmalltalk: [theCalloutState := self class calloutStateClass new].
  calloutState := self addressOf: theCalloutState.
+ self cCode: [self memset: calloutState _: 0 _: (self sizeof: #CalloutState)].
- self cCode: [self me: calloutState ms: 0 et: (self sizeof: #CalloutState)].
  calloutState callFlags: flags.
  "Fetch return type and args"
  argType := interpreterProxy fetchPointer: 0 ofObject: argTypeArray.
  argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
  (err := self ffiCheckReturn: argSpec With: argClass in: calloutState) ~= 0 ifTrue:
  [^self ffiFail: err]. "cannot return"
  "alloca the outgoing stack frame, leaving room for marshalling args, and including space for the return struct, if any.
  Additional space reserved for saving register args like mandated by Win64 X64 or PPC ABI, will be managed by the call itself"
  allocation := self alloca: stackSize + calloutState structReturnSize + self cStackAlignment.
  self mustAlignStack ifTrue:
  [allocation := self cCoerce: (allocation asUnsignedIntegerPtr bitClear: self cStackAlignment - 1) to: #'char *'].
  calloutState
  argVector: allocation;
  currentArg: allocation;
  limit: allocation + stackSize.
  1 to: nArgs do:
  [:i|
  argType := interpreterProxy fetchPointer: i ofObject: argTypeArray.
  argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
  oop := argArrayOrNil isNil
  ifTrue: [interpreterProxy stackValue: nArgs - i]
  ifFalse: [interpreterProxy fetchPointer: i - 1 ofObject: argArrayOrNil].
  err := self ffiArgument: oop Spec: argSpec Class: argClass in: calloutState.
  err ~= 0 ifTrue:
  [self cleanupCalloutState: calloutState.
  self cppIf: COGMTVM ifTrue:
  [err = PrimErrObjectMayMove negated ifTrue:
  [^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
  ^self ffiFail: err]]. "coercion failed or out of stack space"
  "Failures must be reported back from ffiArgument:Spec:Class:in:.
  Should not fail from here on in."
  self assert: interpreterProxy failed not.
  self ffiLogCallout: externalFunction.
  (requiredStackSize < 0
  and: [self externalFunctionHasStackSizeSlot]) ifTrue:
  [stackSize := calloutState currentArg - calloutState argVector.
  interpreterProxy storeInteger: ExternalFunctionStackSizeIndex ofObject: externalFunction withValue: stackSize].
  "Go out and call this guy"
  result := self ffiCalloutTo: address SpecOnStack: argArrayOrNil notNil in: calloutState.
  self cleanupCalloutState: calloutState.
  "Can not safely use argumentCount (via e.g. methodReturnValue:) since it may have been changed by a callback."
  interpreterProxy pop: primNumArgs + 1 thenPush: result.
  ^result!

Item was changed:
  ----- Method: ThreadedFFIPlugin>>ffiCall:ArgArrayOrNil:NumArgs: (in category 'callout support') -----
  ffiCall: externalFunction ArgArrayOrNil: argArrayOrNil NumArgs: nArgs
  "Generic callout. Does the actual work.  If argArrayOrNil is nil it takes args from the stack
  and the spec from the method.  If argArrayOrNil is not nil takes args from argArrayOrNil
  and the spec from the receiver."
  | flags argTypeArray address argType oop argSpec argClass err theCalloutState calloutState requiredStackSize stackSize allocation result primNumArgs |
  <inline: true>
  <var: #theCalloutState type: #'CalloutState'>
  <var: #calloutState type: #'CalloutState *'>
  <var: #allocation type: #'char *'>
 
  primNumArgs := interpreterProxy methodArgumentCount.
  (interpreterProxy is: externalFunction KindOfClass: interpreterProxy classExternalFunction) ifFalse:
  [^self ffiFail: FFIErrorNotFunction].
  "Load and check the values in the externalFunction before we call out"
  flags := interpreterProxy fetchInteger: ExternalFunctionFlagsIndex ofObject: externalFunction.
  interpreterProxy failed ifTrue:
  [^self ffiFail: FFIErrorBadArgs].
 
  "This must come early for compatibility with the old FFIPlugin.  Image-level code
  may assume the function pointer is loaded eagerly.  Thanks to Nicolas Cellier."
  address := self ffiLoadCalloutAddress: externalFunction.
  interpreterProxy failed ifTrue:
  [^0 "error code already set by ffiLoadCalloutAddress:"].
 
  argTypeArray := interpreterProxy fetchPointer: ExternalFunctionArgTypesIndex ofObject: externalFunction.
  "must be array of arg types"
  ((interpreterProxy isArray: argTypeArray)
  and: [(interpreterProxy slotSizeOf: argTypeArray) = (nArgs + 1)]) ifFalse:
  [^self ffiFail: FFIErrorBadArgs].
  "check if the calling convention is supported"
  self cppIf: COGMTVM
  ifTrue:
  [(self ffiSupportsCallingConvention: (flags bitAnd: FFICallTypesMask)) ifFalse:
  [^self ffiFail: FFIErrorCallType]]
  ifFalse: "not masking causes threaded calls to fail, which is as they should if the plugin is not threaded."
  [(self ffiSupportsCallingConvention: flags) ifFalse:
  [^self ffiFail: FFIErrorCallType]].
 
  requiredStackSize := self externalFunctionHasStackSizeSlot
  ifTrue: [interpreterProxy
  fetchInteger: ExternalFunctionStackSizeIndex
  ofObject: externalFunction]
  ifFalse: [-1].
  interpreterProxy failed ifTrue:
  [^interpreterProxy primitiveFailFor: (argArrayOrNil isNil
  ifTrue: [PrimErrBadMethod]
  ifFalse: [PrimErrBadReceiver])].
  stackSize := requiredStackSize < 0 ifTrue: [DefaultMaxStackSize] ifFalse: [requiredStackSize].
  self cCode: [] inSmalltalk: [theCalloutState := self class calloutStateClass new].
  calloutState := self addressOf: theCalloutState.
+ self cCode: [self memset: calloutState _: 0 _: (self sizeof: #CalloutState)].
- self cCode: [self me: calloutState ms: 0 et: (self sizeof: #CalloutState)].
  calloutState callFlags: flags.
  "Fetch return type and args"
  argType := interpreterProxy fetchPointer: 0 ofObject: argTypeArray.
  argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
  (err := self ffiCheckReturn: argSpec With: argClass in: calloutState) ~= 0 ifTrue:
  [^self ffiFail: err]. "cannot return"
  "alloca the outgoing stack frame, leaving room for marshalling args, and including space for the return struct, if any.
  Additional space reserved for saving register args like mandated by Win64 X64 or PPC ABI, will be managed by the call itself"
  allocation := self alloca: stackSize + calloutState structReturnSize + self cStackAlignment.
  self mustAlignStack ifTrue:
  [allocation := self cCoerce: (allocation asUnsignedIntegerPtr bitClear: self cStackAlignment - 1) to: #'char *'].
  calloutState
  argVector: allocation;
  currentArg: allocation;
  limit: allocation + stackSize.
  (calloutState structReturnSize > 0
  and: [self nonRegisterStructReturnIsViaImplicitFirstArgument
  and: [(self returnStructInRegisters: calloutState structReturnSize) not]]) ifTrue:
  [err := self ffiPushPointer: calloutState limit in: calloutState.
  err ~= 0 ifTrue:
  [self cleanupCalloutState: calloutState.
  self cppIf: COGMTVM ifTrue:
  [err = PrimErrObjectMayMove negated ifTrue:
  [^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
  ^self ffiFail: err]].
  1 to: nArgs do:
  [:i|
  argType := interpreterProxy fetchPointer: i ofObject: argTypeArray.
  argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
  oop := argArrayOrNil isNil
  ifTrue: [interpreterProxy stackValue: nArgs - i]
  ifFalse: [interpreterProxy fetchPointer: i - 1 ofObject: argArrayOrNil].
  err := self ffiArgument: oop Spec: argSpec Class: argClass in: calloutState.
  err ~= 0 ifTrue:
  [self cleanupCalloutState: calloutState.
  self cppIf: COGMTVM ifTrue:
  [err = PrimErrObjectMayMove negated ifTrue:
  [^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
  ^self ffiFail: err]]. "coercion failed or out of stack space"
  "Failures must be reported back from ffiArgument:Spec:Class:in:.
  Should not fail from here on in."
  self assert: interpreterProxy failed not.
  self ffiLogCallout: externalFunction.
  (requiredStackSize < 0
  and: [self externalFunctionHasStackSizeSlot]) ifTrue:
  [stackSize := calloutState currentArg - calloutState argVector.
  interpreterProxy storeInteger: ExternalFunctionStackSizeIndex ofObject: externalFunction withValue: stackSize].
  "Go out and call this guy"
  result := self ffiCalloutTo: address SpecOnStack: argArrayOrNil notNil in: calloutState.
  self cleanupCalloutState: calloutState.
  "Can not safely use argumentCount (via e.g. methodReturnValue:) since it may have been changed by a callback."
  interpreterProxy pop: primNumArgs + 1 thenPush: result.
  ^result!