VM Maker: VMMaker.oscog-eem.746.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

VM Maker: VMMaker.oscog-eem.746.mcz

commits-2
 
Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.746.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.746
Author: eem
Time: 1 June 2014, 6:05:30.694 pm
UUID: cc4961d3-e629-4e28-b308-88eab314a8c9
Ancestors: VMMaker.oscog-eem.745

Implement a peephole in the Spur Cogit for an indirection
vector initialized with a single value  Avoid initializing the
slot in the array to nil and instead initialize it with the value.

Refactor setting byte1, byte2 & byte3 into
loadSubsequentBytesForDescriptor:at: for the peephole
tryCollapseTempVectorInitializationOfSize:.

No loner inline CoInterpreter>>pre/postGCAction: for VM profiling.

Increase the number of trampoline table slots.

Simulator:
Fix CurrentImageCoInterpreterFacade for the new Spur
inline instantiation code.

=============== Diff against VMMaker.oscog-eem.745 ===============

Item was changed:
  ----- Method: CoInterpreter>>postGCAction: (in category 'object memory support') -----
  postGCAction: gcModeArg
  "Attempt to shrink free memory, signal the gc semaphore and let the Cogit do its post GC thang"
+ <inline: false>
  self assert: gcModeArg = gcMode.
  super postGCAction: gcModeArg.
  cogit cogitPostGCAction: gcModeArg.
  lastCoggableInterpretedBlockMethod := lastUncoggableInterpretedBlockMethod := nil.
  gcMode := 0!

Item was changed:
  ----- Method: CoInterpreter>>preGCAction: (in category 'object memory support') -----
  preGCAction: gcModeArg
+ <inline: false>
- <inline: true>
  "Need to write back the frame pointers unless all pages are free (as in snapshot).
  Need to set gcMode var (to avoid passing the flag through a lot of the updating code)"
  super preGCAction: gcModeArg.
 
  gcMode := gcModeArg.
 
  cogit recordEventTrace ifTrue:
  [| traceType |
  traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
  self recordTrace: traceType thing: traceType source: 0].
 
  cogit recordPrimTrace ifTrue:
  [| traceType |
  traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
  self fastLogPrim: traceType]!

Item was added:
+ ----- Method: CogObjectRepresentation>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+ "Answer if the object representation allocates arrays inline.  By
+ default answer false. Better code can be generated when creating
+ arrays inline if values are /not/ flushed to the stack."
+ ^false!

Item was removed:
- ----- Method: CogObjectRepresentationFor32BitSpur>>createsClosuresInline (in category 'bytecode generator support') -----
- createsClosuresInline
- "Answer if the object representation allocates closures inline.  By
- default answer false. Better code can be generated when creating
- closures inline if copied values are /not/ flushed to the stack."
- ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+ "Answer if the object representation allocates arrays inline.  By
+ default answer false. Better code can be generated when creating
+ arrays inline if values are /not/ flushed to the stack."
+ ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsClosuresInline (in category 'bytecode generator support') -----
+ createsClosuresInline
+ "Answer if the object representation allocates closures inline.  By
+ default answer false. Better code can be generated when creating
+ closures inline if copied values are /not/ flushed to the stack."
+ ^true!

Item was changed:
  ----- Method: Cogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
  "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
  | nextOpcodeIndex descriptor fixup result nExts |
  <var: #descriptor type: #'BytecodeDescriptor *'>
  <var: #fixup type: #'BytecodeFixup *'>
  bytecodePC := start.
  nExts := 0.
  [byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj)  + bytecodeSetOffset.
  descriptor := self generatorAt: byte0.
+ self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
- descriptor numBytes > 1 ifTrue:
- [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
- descriptor numBytes > 2 ifTrue:
- [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
- descriptor numBytes > 3 ifTrue:
- [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
- descriptor numBytes > 4 ifTrue:
- [self notYetImplemented]]]].
  nextOpcodeIndex := opcodeIndex.
  result := self perform: descriptor generator.
  descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
  [self assert: (extA = 0 and: [extB = 0])].
  fixup := self fixupAt: bytecodePC - initialPC.
  fixup targetInstruction ~= 0 ifTrue:
  ["There is a fixup for this bytecode.  It must point to the first generated
    instruction for this bytecode.  If there isn't one we need to add a label."
  opcodeIndex = nextOpcodeIndex ifTrue:
  [self Label].
  fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
  bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
  result = 0 and: [bytecodePC <= end]]
  whileTrue:
  [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
  self checkEnoughOpcodes.
  ^result!

Item was added:
+ ----- Method: Cogit>>loadSubsequentBytesForDescriptor:at: (in category 'compile abstract instructions') -----
+ loadSubsequentBytesForDescriptor: descriptor at: pc
+ <var: #descriptor type: #'BytecodeDescriptor *'>
+ descriptor numBytes > 1 ifTrue:
+ [byte1 := objectMemory fetchByte: pc + 1 ofObject: methodObj.
+ descriptor numBytes > 2 ifTrue:
+ [byte2 := objectMemory fetchByte: pc + 2 ofObject: methodObj.
+ descriptor numBytes > 3 ifTrue:
+ [byte3 := objectMemory fetchByte: pc + 3 ofObject: methodObj.
+ descriptor numBytes > 4 ifTrue:
+ [self notYetImplemented]]]]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+ ^self subclassResponsibility!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>cogit: (in category 'initialize-release') -----
  cogit: aCogit
  cogit := aCogit.
  coInterpreter cogit: aCogit.
+ (objectMemory respondsTo: #cogit:) ifTrue:
+ [objectMemory cogit: aCogit]!
- objectMemory cogit: aCogit!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>indexablePointersFormat (in category 'accessing') -----
+ indexablePointersFormat
+ ^objectMemory indexablePointersFormat!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>initialize (in category 'initialize-release') -----
  initialize
  memory := ByteArray new: 262144.
+ objectMemory := self class objectMemoryClass new.
- objectMemory := NewCoObjectMemory new.
  coInterpreter := CoInterpreter new.
  coInterpreter
  instVarNamed: 'objectMemory'
  put: objectMemory;
  instVarNamed: 'primitiveTable'
  put: (CArrayAccessor on: CoInterpreter primitiveTable copy).
  variables := Dictionary new.
  #('stackLimit') do:
  [:l| self addressForLabel: l].
  self initializeObjectMap!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>methodNeedsLargeContext: (in category 'accessing') -----
+ methodNeedsLargeContext: aMethodOop
+ ^(self objectForOop: aMethodOop) frameSize > CompiledMethod smallFrameSize!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+ ^Spur32BitCoMemoryManager!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>arrayFormat (in category 'accessing') -----
+ arrayFormat
+ ^objectMemory arrayFormat!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>getScavengeThreshold (in category 'accessing') -----
+ getScavengeThreshold
+ ^objectMemory getScavengeThreshold ifNil: [16r24680]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>headerForSlots:format:classIndex: (in category 'accessing') -----
+ headerForSlots: numSlots format: formatField classIndex: classIndex
+ ^objectMemory headerForSlots: numSlots format: formatField classIndex: classIndex!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>numSlotsMask (in category 'accessing') -----
+ numSlotsMask
+ ^objectMemory numSlotsMask!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>rememberedBitShift (in category 'accessing') -----
+ rememberedBitShift
+ ^objectMemory rememberedBitShift!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>smallObjectBytesForSlots: (in category 'accessing') -----
+ smallObjectBytesForSlots: numSlots
+ ^objectMemory smallObjectBytesForSlots: numSlots!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>storeCheckBoundary (in category 'accessing') -----
+ storeCheckBoundary
+ ^objectMemory storeCheckBoundary ifNil: [16r12345678]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSqueakV3ObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+ ^NewObjectMemory!

Item was changed:
  ----- Method: SimpleStackBasedCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
  super initializeMiscConstants.
  MaxLiteralCountForCompile := initializationOptions at: #MaxLiteralCountForCompile ifAbsent: [60].
  NumTrampolines := NewspeakVM
+ ifTrue: [50]
+ ifFalse: [42]!
- ifTrue: [46]
- ifFalse: [38]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
  super initializeMiscConstants.
  NumTrampolines := NewspeakVM
+ ifTrue: [60]
+ ifFalse: [52]!
- ifTrue: [58]
- ifFalse: [50]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
  "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
  | nextOpcodeIndex descriptor nExts fixup result |
  <var: #descriptor type: #'BytecodeDescriptor *'>
  <var: #fixup type: #'BytecodeFixup *'>
  self traceSimStack.
  bytecodePC := start.
  nExts := 0.
  descriptor := nil.
  deadCode := false.
  [self cCode: '' inSmalltalk:
  [(debugBytecodePointers includes: bytecodePC) ifTrue: [self halt]].
  fixup := self fixupAt: bytecodePC - initialPC.
  fixup targetInstruction asUnsignedInteger > 0
  ifTrue:
  [deadCode := false.
  fixup targetInstruction asUnsignedInteger >= 2 ifTrue:
  [self merge: fixup
  afterContinuation: (descriptor notNil
  and: [descriptor isUnconditionalBranch
  or: [descriptor isReturn]]) not]]
  ifFalse: "If there's no fixup following a return there's no jump to that code and it is dead."
  [(descriptor notNil and: [descriptor isReturn]) ifTrue:
  [deadCode := true]].
  self cCode: '' inSmalltalk:
  [deadCode ifFalse:
  [self assert: simStackPtr + (needsFrame ifTrue: [0] ifFalse: [1])
  = (self debugStackPointerFor: bytecodePC)]].
  byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj) + bytecodeSetOffset.
  descriptor := self generatorAt: byte0.
+ self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
- descriptor numBytes > 1 ifTrue:
- [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
- descriptor numBytes > 2 ifTrue:
- [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
- descriptor numBytes > 3 ifTrue:
- [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
- descriptor numBytes > 4 ifTrue:
- [self notYetImplemented]]]].
  nextOpcodeIndex := opcodeIndex.
  result := deadCode
  ifTrue: "insert nops for dead code that is mapped so that bc to mc mapping is not many to one"
  [(descriptor isMapped
   or: [inBlock and: [descriptor isMappedInBlock]]) ifTrue:
  [self annotateBytecode: self Nop].
  0]
  ifFalse:
  [self perform: descriptor generator].
  descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
  [self assert: (extA = 0 and: [extB = 0])].
  self traceDescriptor: descriptor; traceSimStack.
  (fixup targetInstruction asUnsignedInteger between: 1 and: 2) ifTrue:
  ["There is a fixup for this bytecode.  It must point to the first generated
    instruction for this bytecode.  If there isn't one we need to add a label."
  opcodeIndex = nextOpcodeIndex ifTrue:
  [self Label].
  fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
  bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
  result = 0 and: [bytecodePC <= end]] whileTrue:
  [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
  self checkEnoughOpcodes.
  ^result!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>evaluate:at: (in category 'peephole optimizations') -----
+ evaluate: descriptor at: pc
+ <var: #descriptor type: #'BytecodeDescriptor *'>
+ byte0 := objectMemory fetchByte: pc ofObject: methodObj.
+ self assert: descriptor = (self generatorAt: bytecodeSetOffset + byte0).
+ self loadSubsequentBytesForDescriptor: descriptor at: pc.
+ self perform: descriptor generator!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>genPushNewArrayBytecode (in category 'bytecode generators') -----
  genPushNewArrayBytecode
  | size popValues |
  self assert: needsFrame.
  optStatus isReceiverResultRegLive: false.
  (popValues := byte1 > 127)
  ifTrue: [self ssFlushTo: simStackPtr]
  ifFalse: [self ssAllocateCallReg: SendNumArgsReg and: ReceiverResultReg].
  size := byte1 bitAnd: 127.
+ popValues ifFalse:
+ [(self tryCollapseTempVectorInitializationOfSize: size) ifTrue:
+ [^0]].
  objectRepresentation genNewArrayOfSize: size initialized: popValues not.
  popValues ifTrue:
  [size - 1 to: 0 by: -1 do:
  [:i|
  self PopR: TempReg.
  objectRepresentation
  genStoreSourceReg: TempReg
  slotIndex: i
  intoNewObjectInDestReg: ReceiverResultReg].
  self ssPop: size].
  ^self ssPushRegister: ReceiverResultReg!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>tryCollapseTempVectorInitializationOfSize: (in category 'peephole optimizations') -----
+ tryCollapseTempVectorInitializationOfSize: slots
+ "Try and collapse
+ push: (Array new: 1)
+ popIntoTemp: tempIndex
+ pushConstant: const or pushTemp: n
+ popIntoTemp: 0 inVectorAt: tempIndex
+ into
+ tempAt: tempIndex put: {const}.
+ One might think that we should look for a sequence of more than
+ one pushes and pops but this is extremely rare."
+ | pushArrayDesc storeArrayDesc pushValueDesc storeValueDesc reg |
+ <var: #pushArrayDesc type: #'BytecodeDescriptor *'>
+ <var: #pushValueDesc type: #'BytecodeDescriptor *'>
+ <var: #storeArrayDesc type: #'BytecodeDescriptor *'>
+ <var: #storeValueDesc type: #'BytecodeDescriptor *'>
+ slots ~= 1 ifTrue:
+ [^false].
+ pushArrayDesc := self generatorAt: bytecodeSetOffset
+ + (objectMemory
+ fetchByte: bytecodePC
+ ofObject: methodObj).
+ self assert: pushArrayDesc generator == #genPushNewArrayBytecode.
+ storeArrayDesc := self generatorAt: bytecodeSetOffset
+ + (objectMemory
+ fetchByte: bytecodePC
+ + pushArrayDesc numBytes
+ ofObject: methodObj).
+ storeArrayDesc generator ~~ #genStoreAndPopTemporaryVariableBytecode ifTrue:
+ [^false].
+ pushValueDesc := self generatorAt: bytecodeSetOffset
+ + (objectMemory
+ fetchByte: bytecodePC
+ + pushArrayDesc numBytes
+ + storeArrayDesc numBytes
+ ofObject: methodObj).
+ (pushValueDesc generator ~~ #genPushLiteralConstantBytecode
+ and: [pushValueDesc generator ~~ #genPushQuickIntegerConstantBytecode
+ and: [pushValueDesc generator ~~ #genPushTemporaryVariableBytecode]]) ifTrue:
+ [^false].
+ storeValueDesc := self generatorAt: bytecodeSetOffset
+ + (objectMemory
+ fetchByte: bytecodePC
+ + pushArrayDesc numBytes
+ + storeArrayDesc numBytes
+ + pushValueDesc numBytes
+ ofObject: methodObj).
+ storeValueDesc generator ~~ #genStoreAndPopRemoteTempLongBytecode ifTrue:
+ [^false].
+
+ objectRepresentation genNewArrayOfSize: 1 initialized: false.
+ self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc numBytes + storeArrayDesc numBytes.
+ reg := self ssStorePop: true toPreferredReg: TempReg.
+ objectRepresentation
+ genStoreSourceReg: reg
+ slotIndex: 0
+ intoNewObjectInDestReg: ReceiverResultReg.
+ self ssPushRegister: ReceiverResultReg.
+ self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc numBytes.
+ bytecodePC := bytecodePC
+ "+ pushArrayDesc numBytes this gets added by nextBytecodePCFor:at:exts:in:"
+ + storeArrayDesc numBytes
+ + pushValueDesc numBytes
+ + storeValueDesc numBytes.
+ ^true!

Reply | Threaded
Open this post in threaded view
|

Re: VM Maker: VMMaker.oscog-eem.746.mcz

Clément Béra
 
Eliot,

Recent commits are very exciting. Context and closure creations are now inlined in machine code :-)

Have you already done at:put: and stringAt:put: or is it your next step ?

Please tell us about the new bench results with these features.

Clément


2014-06-02 16:14 GMT+02:00 <[hidden email]>:

Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.746.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.746
Author: eem
Time: 1 June 2014, 6:05:30.694 pm
UUID: cc4961d3-e629-4e28-b308-88eab314a8c9
Ancestors: VMMaker.oscog-eem.745

Implement a peephole in the Spur Cogit for an indirection
vector initialized with a single value  Avoid initializing the
slot in the array to nil and instead initialize it with the value.

Refactor setting byte1, byte2 & byte3 into
loadSubsequentBytesForDescriptor:at: for the peephole
tryCollapseTempVectorInitializationOfSize:.

No loner inline CoInterpreter>>pre/postGCAction: for VM profiling.

Increase the number of trampoline table slots.

Simulator:
Fix CurrentImageCoInterpreterFacade for the new Spur
inline instantiation code.

=============== Diff against VMMaker.oscog-eem.745 ===============

Item was changed:
  ----- Method: CoInterpreter>>postGCAction: (in category 'object memory support') -----
  postGCAction: gcModeArg
        "Attempt to shrink free memory, signal the gc semaphore and let the Cogit do its post GC thang"
+       <inline: false>
        self assert: gcModeArg = gcMode.
        super postGCAction: gcModeArg.
        cogit cogitPostGCAction: gcModeArg.
        lastCoggableInterpretedBlockMethod := lastUncoggableInterpretedBlockMethod := nil.
        gcMode := 0!

Item was changed:
  ----- Method: CoInterpreter>>preGCAction: (in category 'object memory support') -----
  preGCAction: gcModeArg
+       <inline: false>
-       <inline: true>
        "Need to write back the frame pointers unless all pages are free (as in snapshot).
         Need to set gcMode var (to avoid passing the flag through a lot of the updating code)"
        super preGCAction: gcModeArg.

        gcMode := gcModeArg.

        cogit recordEventTrace ifTrue:
                [| traceType |
                traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
                self recordTrace: traceType thing: traceType source: 0].

        cogit recordPrimTrace ifTrue:
                [| traceType |
                traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
                self fastLogPrim: traceType]!

Item was added:
+ ----- Method: CogObjectRepresentation>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+       "Answer if the object representation allocates arrays inline.  By
+        default answer false. Better code can be generated when creating
+        arrays inline if values are /not/ flushed to the stack."
+       ^false!

Item was removed:
- ----- Method: CogObjectRepresentationFor32BitSpur>>createsClosuresInline (in category 'bytecode generator support') -----
- createsClosuresInline
-       "Answer if the object representation allocates closures inline.  By
-        default answer false. Better code can be generated when creating
-        closures inline if copied values are /not/ flushed to the stack."
-       ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+       "Answer if the object representation allocates arrays inline.  By
+        default answer false. Better code can be generated when creating
+        arrays inline if values are /not/ flushed to the stack."
+       ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsClosuresInline (in category 'bytecode generator support') -----
+ createsClosuresInline
+       "Answer if the object representation allocates closures inline.  By
+        default answer false. Better code can be generated when creating
+        closures inline if copied values are /not/ flushed to the stack."
+       ^true!

Item was changed:
  ----- Method: Cogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
        "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
        | nextOpcodeIndex descriptor fixup result nExts |
        <var: #descriptor type: #'BytecodeDescriptor *'>
        <var: #fixup type: #'BytecodeFixup *'>
        bytecodePC := start.
        nExts := 0.
        [byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj)  + bytecodeSetOffset.
         descriptor := self generatorAt: byte0.
+        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
-        descriptor numBytes > 1 ifTrue:
-               [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
-                descriptor numBytes > 2 ifTrue:
-                       [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
-                        descriptor numBytes > 3 ifTrue:
-                               [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
-                                descriptor numBytes > 4 ifTrue:
-                                       [self notYetImplemented]]]].
         nextOpcodeIndex := opcodeIndex.
         result := self perform: descriptor generator.
         descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
                [self assert: (extA = 0 and: [extB = 0])].
         fixup := self fixupAt: bytecodePC - initialPC.
         fixup targetInstruction ~= 0 ifTrue:
                ["There is a fixup for this bytecode.  It must point to the first generated
                   instruction for this bytecode.  If there isn't one we need to add a label."
                 opcodeIndex = nextOpcodeIndex ifTrue:
                        [self Label].
                 fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
         bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
         result = 0 and: [bytecodePC <= end]]
                whileTrue:
                        [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
        self checkEnoughOpcodes.
        ^result!

Item was added:
+ ----- Method: Cogit>>loadSubsequentBytesForDescriptor:at: (in category 'compile abstract instructions') -----
+ loadSubsequentBytesForDescriptor: descriptor at: pc
+       <var: #descriptor type: #'BytecodeDescriptor *'>
+       descriptor numBytes > 1 ifTrue:
+               [byte1 := objectMemory fetchByte: pc + 1 ofObject: methodObj.
+                descriptor numBytes > 2 ifTrue:
+                       [byte2 := objectMemory fetchByte: pc + 2 ofObject: methodObj.
+                        descriptor numBytes > 3 ifTrue:
+                               [byte3 := objectMemory fetchByte: pc + 3 ofObject: methodObj.
+                                descriptor numBytes > 4 ifTrue:
+                                       [self notYetImplemented]]]]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^self subclassResponsibility!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>cogit: (in category 'initialize-release') -----
  cogit: aCogit
        cogit := aCogit.
        coInterpreter cogit: aCogit.
+       (objectMemory respondsTo: #cogit:) ifTrue:
+               [objectMemory cogit: aCogit]!
-       objectMemory cogit: aCogit!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>indexablePointersFormat (in category 'accessing') -----
+ indexablePointersFormat
+       ^objectMemory indexablePointersFormat!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>initialize (in category 'initialize-release') -----
  initialize
        memory := ByteArray new: 262144.
+       objectMemory := self class objectMemoryClass new.
-       objectMemory := NewCoObjectMemory new.
        coInterpreter := CoInterpreter new.
        coInterpreter
                instVarNamed: 'objectMemory'
                        put: objectMemory;
                instVarNamed: 'primitiveTable'
                        put: (CArrayAccessor on: CoInterpreter primitiveTable copy).
        variables := Dictionary new.
        #('stackLimit') do:
                [:l| self addressForLabel: l].
        self initializeObjectMap!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>methodNeedsLargeContext: (in category 'accessing') -----
+ methodNeedsLargeContext: aMethodOop
+       ^(self objectForOop: aMethodOop) frameSize > CompiledMethod smallFrameSize!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^Spur32BitCoMemoryManager!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>arrayFormat (in category 'accessing') -----
+ arrayFormat
+       ^objectMemory arrayFormat!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>getScavengeThreshold (in category 'accessing') -----
+ getScavengeThreshold
+       ^objectMemory getScavengeThreshold ifNil: [16r24680]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>headerForSlots:format:classIndex: (in category 'accessing') -----
+ headerForSlots: numSlots format: formatField classIndex: classIndex
+       ^objectMemory headerForSlots: numSlots format: formatField classIndex: classIndex!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>numSlotsMask (in category 'accessing') -----
+ numSlotsMask
+       ^objectMemory numSlotsMask!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>rememberedBitShift (in category 'accessing') -----
+ rememberedBitShift
+       ^objectMemory rememberedBitShift!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>smallObjectBytesForSlots: (in category 'accessing') -----
+ smallObjectBytesForSlots: numSlots
+       ^objectMemory smallObjectBytesForSlots: numSlots!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>storeCheckBoundary (in category 'accessing') -----
+ storeCheckBoundary
+       ^objectMemory storeCheckBoundary ifNil: [16r12345678]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSqueakV3ObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^NewObjectMemory!

Item was changed:
  ----- Method: SimpleStackBasedCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
        super initializeMiscConstants.
        MaxLiteralCountForCompile := initializationOptions at: #MaxLiteralCountForCompile ifAbsent: [60].
        NumTrampolines := NewspeakVM
+                                                       ifTrue: [50]
+                                                       ifFalse: [42]!
-                                                       ifTrue: [46]
-                                                       ifFalse: [38]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
        super initializeMiscConstants.
        NumTrampolines := NewspeakVM
+                                                       ifTrue: [60]
+                                                       ifFalse: [52]!
-                                                       ifTrue: [58]
-                                                       ifFalse: [50]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
        "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
        | nextOpcodeIndex descriptor nExts fixup result |
        <var: #descriptor type: #'BytecodeDescriptor *'>
        <var: #fixup type: #'BytecodeFixup *'>
        self traceSimStack.
        bytecodePC := start.
        nExts := 0.
        descriptor := nil.
        deadCode := false.
        [self cCode: '' inSmalltalk:
                [(debugBytecodePointers includes: bytecodePC) ifTrue: [self halt]].
        fixup := self fixupAt: bytecodePC - initialPC.
        fixup targetInstruction asUnsignedInteger > 0
                ifTrue:
                        [deadCode := false.
                         fixup targetInstruction asUnsignedInteger >= 2 ifTrue:
                                [self merge: fixup
                                        afterContinuation: (descriptor notNil
                                                                                and: [descriptor isUnconditionalBranch
                                                                                        or: [descriptor isReturn]]) not]]
                ifFalse: "If there's no fixup following a return there's no jump to that code and it is dead."
                        [(descriptor notNil and: [descriptor isReturn]) ifTrue:
                                [deadCode := true]].
         self cCode: '' inSmalltalk:
                [deadCode ifFalse:
                        [self assert: simStackPtr + (needsFrame ifTrue: [0] ifFalse: [1])
                                                = (self debugStackPointerFor: bytecodePC)]].
         byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj) + bytecodeSetOffset.
         descriptor := self generatorAt: byte0.
+        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
-        descriptor numBytes > 1 ifTrue:
-               [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
-                descriptor numBytes > 2 ifTrue:
-                       [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
-                        descriptor numBytes > 3 ifTrue:
-                               [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
-                                descriptor numBytes > 4 ifTrue:
-                                       [self notYetImplemented]]]].
         nextOpcodeIndex := opcodeIndex.
         result := deadCode
                                ifTrue: "insert nops for dead code that is mapped so that bc to mc mapping is not many to one"
                                        [(descriptor isMapped
                                          or: [inBlock and: [descriptor isMappedInBlock]]) ifTrue:
                                                [self annotateBytecode: self Nop].
                                                0]
                                ifFalse:
                                        [self perform: descriptor generator].
         descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
                [self assert: (extA = 0 and: [extB = 0])].
         self traceDescriptor: descriptor; traceSimStack.
         (fixup targetInstruction asUnsignedInteger between: 1 and: 2) ifTrue:
                ["There is a fixup for this bytecode.  It must point to the first generated
                   instruction for this bytecode.  If there isn't one we need to add a label."
                 opcodeIndex = nextOpcodeIndex ifTrue:
                        [self Label].
                 fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
         bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
         result = 0 and: [bytecodePC <= end]] whileTrue:
                [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
        self checkEnoughOpcodes.
        ^result!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>evaluate:at: (in category 'peephole optimizations') -----
+ evaluate: descriptor at: pc
+       <var: #descriptor type: #'BytecodeDescriptor *'>
+       byte0 := objectMemory fetchByte: pc ofObject: methodObj.
+       self assert: descriptor = (self generatorAt: bytecodeSetOffset + byte0).
+       self loadSubsequentBytesForDescriptor: descriptor at: pc.
+       self perform: descriptor generator!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>genPushNewArrayBytecode (in category 'bytecode generators') -----
  genPushNewArrayBytecode
        | size popValues |
        self assert: needsFrame.
        optStatus isReceiverResultRegLive: false.
        (popValues := byte1 > 127)
                ifTrue: [self ssFlushTo: simStackPtr]
                ifFalse: [self ssAllocateCallReg: SendNumArgsReg and: ReceiverResultReg].
        size := byte1 bitAnd: 127.
+       popValues ifFalse:
+               [(self tryCollapseTempVectorInitializationOfSize: size) ifTrue:
+                       [^0]].
        objectRepresentation genNewArrayOfSize: size initialized: popValues not.
        popValues ifTrue:
                [size - 1 to: 0 by: -1 do:
                        [:i|
                        self PopR: TempReg.
                        objectRepresentation
                                genStoreSourceReg: TempReg
                                slotIndex: i
                                intoNewObjectInDestReg: ReceiverResultReg].
                 self ssPop: size].
        ^self ssPushRegister: ReceiverResultReg!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>tryCollapseTempVectorInitializationOfSize: (in category 'peephole optimizations') -----
+ tryCollapseTempVectorInitializationOfSize: slots
+       "Try and collapse
+               push: (Array new: 1)
+               popIntoTemp: tempIndex
+               pushConstant: const or pushTemp: n
+               popIntoTemp: 0 inVectorAt: tempIndex
+        into
+               tempAt: tempIndex put: {const}.
+        One might think that we should look for a sequence of more than
+        one pushes and pops but this is extremely rare."
+       | pushArrayDesc storeArrayDesc pushValueDesc storeValueDesc reg |
+       <var: #pushArrayDesc type: #'BytecodeDescriptor *'>
+       <var: #pushValueDesc type: #'BytecodeDescriptor *'>
+       <var: #storeArrayDesc type: #'BytecodeDescriptor *'>
+       <var: #storeValueDesc type: #'BytecodeDescriptor *'>
+       slots ~= 1 ifTrue:
+               [^false].
+       pushArrayDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                               ofObject: methodObj).
+       self assert: pushArrayDesc generator == #genPushNewArrayBytecode.
+       storeArrayDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                               ofObject: methodObj).
+       storeArrayDesc generator ~~ #genStoreAndPopTemporaryVariableBytecode ifTrue:
+               [^false].
+       pushValueDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                                               + storeArrayDesc numBytes
+                                                                                               ofObject: methodObj).
+       (pushValueDesc generator ~~ #genPushLiteralConstantBytecode
+        and: [pushValueDesc generator ~~ #genPushQuickIntegerConstantBytecode
+        and: [pushValueDesc generator ~~ #genPushTemporaryVariableBytecode]]) ifTrue:
+               [^false].
+       storeValueDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                                               + storeArrayDesc numBytes
+                                                                                                               + pushValueDesc numBytes
+                                                                                               ofObject: methodObj).
+       storeValueDesc generator ~~ #genStoreAndPopRemoteTempLongBytecode ifTrue:
+               [^false].
+
+       objectRepresentation genNewArrayOfSize: 1 initialized: false.
+       self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc numBytes + storeArrayDesc numBytes.
+       reg := self ssStorePop: true toPreferredReg: TempReg.
+       objectRepresentation
+               genStoreSourceReg: reg
+               slotIndex: 0
+               intoNewObjectInDestReg: ReceiverResultReg.
+       self ssPushRegister: ReceiverResultReg.
+       self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc numBytes.
+       bytecodePC := bytecodePC
+                                       "+ pushArrayDesc numBytes this gets added by nextBytecodePCFor:at:exts:in:"
+                                       + storeArrayDesc numBytes
+                                       + pushValueDesc numBytes
+                                       + storeValueDesc numBytes.
+       ^true!


Reply | Threaded
Open this post in threaded view
|

Re: VM Maker: VMMaker.oscog-eem.746.mcz

Eliot Miranda-2
 
Hi Clément,

On Mon, Jun 2, 2014 at 7:40 AM, Clément Bera <[hidden email]> wrote:
 
Eliot,

Recent commits are very exciting. Context and closure creations are now inlined in machine code :-)

I'm glad you think so :-).  What was nice is that once I had inline context, closure and temp vector creation it was motivating to try and do a little peephole optimization, and that turned out to be really easy.  If you look at e.g. Collection>>inject:into: it starts with

17 <8A 01> push: (Array new: 1)
19 <6A> popIntoTemp: 2
20 <10> pushTemp: 0
21 <8E 00 02> popIntoTemp: 0 inVectorAt: 2
24 <70> self
25 <11> pushTemp: 1
26 <12> pushTemp: 2
27 <8F 21 00 0A> closureNumCopied: 2 numArgs: 1 bytes 31 to 40

With the current memory manager this generates the following machine code on x86

17: 1057: movl $0x00000001, %ebx : BB 01 00 00 00 
105c: call .+0xfffff927 (0x00000988=ceCreateNewArrayTrampoline) : E8 27 F9 FF FF 
20: 1061: movl 12(%ebp), %eax : 8B 45 0C 
21: 1064: movl %eax, %ds:0x4(%edx) : 89 42 04 
19: 1067: movl %edx, -16(%ebp) : 89 55 F0 
24: 106a: movl -12(%ebp), %eax : 8B 45 F4 
106d: pushl %eax : 50 
25: 106e: movl 8(%ebp), %eax : 8B 45 08 
1071: pushl %eax : 50 
26: 1072: movl -16(%ebp), %eax : 8B 45 F0 
1075: pushl %eax : 50 
27: 1076: movl $0x0001f081, %ebx : BB 81 F0 01 00 
107b: call .+0xfffff9a0 (0x00000a20=ceClosureCopyTrampoline) : E8 A0 F9 FF FF 
1080: addl $0x00000008, %esp : 83 C4 08 

The reordering of bytecodes 20 & 21 before 19 is due to the Cogit's stack-to-register-mapping code generating strategy which defers certain actions until operands are used.

Both ceCreateNewArrayTrampoline and ceClosureCopyTrampoline switch stacks to the C stack and call routines in the cointerpreter to create an array and create a closure.  Hence, the array created by ceCreateNewArrayTrampoline is initialized with nils before being returned, and the closure's copied values are pushed to the stack so that the cointerpreter can copy these values into the closure.

With Spur the machine code generated is quite different and even though its longer it is faster because the calls into the cointerpreter are expensive, and because the current memory manager's allocation routines are slow:

17: 1463: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00 inline temp vector allocation
1469: movl $0x02000033, %eax : B8 33 00 00 02  high 32-bits of header
146e: movl %eax, %ds:(%edx) : 89 02 
1470: movl $0x01000000, %eax : B8 00 00 00 01   low 32-bits of header
1475: movl %eax, %ds:0x4(%edx) : 89 42 04 
1478: movl $0x00100000=nil, %eax : B8 00 00 10 00   initialize slot with nil
147d: movl %eax, %ds:0x8(%edx) : 89 42 08 
1480: movl %edx, %eax : 89 D0 
1482: addl $0x00000010, %eax : 83 C0 10 
1485: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00  write back allocation pointer
148a: cmpl $0x00024680, %eax : 3D 80 46 02 00 
148f: jb .+0x00000005 (0x00001496) : 72 05  check if a scavenge should be scheduled
1491: call .+0xfffff632 (0x00000ac8=ceSheduleScavengeTrampoline) : E8 32 F6 FF FF 
19: 1496: movl %edx, -16(%ebp) : 89 55 F0 
20: 1499: movl 12(%ebp), %ecx : 8B 4D 0C 
21: 149c: movl -16(%ebp), %edx : 8B 55 F0  alas an inline store check is generated
149f: movl %ecx, %ds:0x8(%edx) : 89 4A 08 
14a2: movl %ecx, %eax : 89 C8 
14a4: andl $0x00000003, %eax : 83 E0 03 
14a7: jnz .+0x0000001c (0x000014c5) : 75 1C 
14a9: movl $0x12345678, %eax : B8 78 56 34 12 
14ae: cmpl %eax, %edx : 39 C2 
14b0: jb .+0x00000013 (0x000014c5) : 72 13 
14b2: cmpl %eax, %ecx : 39 C1 
14b4: jnb .+0x0000000f (0x000014c5) : 73 0F 
14b6: movb %ds:0x3(%edx), %al : 8A 42 03 
14b9: andl $0x00000020, %eax : 83 E0 20 
14bc: jnz .+0x00000007 (0x000014c5) : 75 07 
14be: pushl %ecx : 51 
14bf: call .+0xfffff5d4 (0x00000a98=ceStoreCheckTrampoline) : E8 D4 F5 FF FF 
14c4: popl %ecx : 59 
27: 14c5: movl $0x00000002, %ebx : BB 02 00 00 00  set numArgs for ceSmallMethodContext
14ca: call .+0xfffff639 (0x00000b08=ceSmallMethodContext) : E8 39 F6 FF FF call to machine code context creation
14cf: movl %edx, %ecx : 89 D1 
14d1: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00  inline closure allocation
14d7: movl $0x03000025, %eax : B8 25 00 00 03 
14dc: movl %eax, %ds:(%edx) : 89 02 
14de: movl $0x05000000, %eax : B8 00 00 00 05 
14e3: movl %eax, %ds:0x4(%edx) : 89 42 04 
14e6: movl %edx, %eax : 89 D0 
14e8: addl $0x00000020, %eax : 83 C0 20 
14eb: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00 
14f0: cmpl $0x00024680, %eax : 3D 80 46 02 00 
14f5: jb .+0x00000005 (0x000014fc) : 72 05 
14f7: call .+0xfffff5cc (0x00000ac8=ceSheduleScavengeTrampoline) : E8 CC F5 FF FF 
14fc: movl %ecx, %ds:0x8(%edx) : 89 4A 08 
14ff: movl $0x0000003f, %eax : B8 3F 00 00 00     set closure's startpc
1504: movl %eax, %ds:0xc(%edx) : 89 42 0C 
1507: movl $0x00000003, %eax : B8 03 00 00 00      set closure's numArgs (1 as a SmallInteger)
150c: movl %eax, %ds:0x10(%edx) : 89 42 10 
150f: movl -16(%ebp), %eax : 8B 45 F0         set closure's outerContext
1512: movl %eax, %ds:0x18(%edx) : 89 42 18 
1515: movl 8(%ebp), %eax : 8B 45 08          set closure's copied value
1518: movl %eax, %ds:0x14(%edx) : 89 42 14 

What's cool is that the closure creation and initialization is all inlined, pushing nothing to the stack at all.  However, the indirect temp vector initialization is clumsy.  The store check is unnecessary and long.  And we're still initializing the slot with nil before assigning the temp from "pushTemp: 0; popIntoTemp: 0 inVectorAt: 2".

So I thought I would try to implement a peephole that would look for
push: (Array new: 1)
popIntoTemp: tempIndex
pushConstant: const or pushTemp: n
popIntoTemp: 0 inVectorAt: tempIndex
and collapse this into
tempAt: tempIndex put: {const or temp}

The peephole's checking for the sequence is a bit long-winded but it boils down to

objectRepresentation genNewArrayOfSize: 1 initialized: false.
self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc numBytes + storeArrayDesc numBytes.
reg := self ssStorePop: true toPreferredReg: TempReg.
objectRepresentation
genStoreSourceReg: reg
slotIndex: 0
intoNewObjectInDestReg: ReceiverResultReg.
self ssPushRegister: ReceiverResultReg.
self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc numBytes.

and now the code generated is quite a bit shorter:

17: 1463: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00 
1469: movl $0x02000033, %eax : B8 33 00 00 02 
146e: movl %eax, %ds:(%edx) : 89 02 
1470: movl $0x01000000, %eax : B8 00 00 00 01 
1475: movl %eax, %ds:0x4(%edx) : 89 42 04 
1478: movl %edx, %eax : 89 D0 
147a: addl $0x00000010, %eax : 83 C0 10 
147d: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00 
1482: cmpl $0x00024680, %eax : 3D 80 46 02 00 
1487: jb .+0x00000005 (0x0000148e) : 72 05 
1489: call .+0xfffff63a (0x00000ac8=ceSheduleScavengeTrampoline) : E8 3A F6 FF FF 
20: 148e: movl 12(%ebp), %eax : 8B 45 0C 
21: 1491: movl %eax, %ds:0x8(%edx) : 89 42 08 
19: 1494: movl %edx, -16(%ebp) : 89 55 F0 
27: 1497: movl $0x00000002, %ebx : BB 02 00 00 00 
149c: call .+0xfffff667 (0x00000b08=ceSmallMethodContext) : E8 67 F6 FF FF 
14a1: movl %edx, %ecx : 89 D1 
14a3: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00 
14a9: movl $0x03000025, %eax : B8 25 00 00 03 
14ae: movl %eax, %ds:(%edx) : 89 02 
14b0: movl $0x05000000, %eax : B8 00 00 00 05 
14b5: movl %eax, %ds:0x4(%edx) : 89 42 04 
14b8: movl %edx, %eax : 89 D0 
14ba: addl $0x00000020, %eax : 83 C0 20 
14bd: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00 
14c2: cmpl $0x00024680, %eax : 3D 80 46 02 00 
14c7: jb .+0x00000005 (0x000014ce) : 72 05 
14c9: call .+0xfffff5fa (0x00000ac8=ceSheduleScavengeTrampoline) : E8 FA F5 FF FF 
14ce: movl %ecx, %ds:0x8(%edx) : 89 4A 08 
14d1: movl $0x0000003f, %eax : B8 3F 00 00 00 
14d6: movl %eax, %ds:0xc(%edx) : 89 42 0C 
14d9: movl $0x00000003, %eax : B8 03 00 00 00 
14de: movl %eax, %ds:0x10(%edx) : 89 42 10 
14e1: movl -16(%ebp), %eax : 8B 45 F0 
14e4: movl %eax, %ds:0x18(%edx) : 89 42 18 
14e7: movl 8(%ebp), %eax : 8B 45 08 
14ea: movl %eax, %ds:0x14(%edx) : 89 42 14 

Very nice.  I could also move the allocations into trampolines.  There's an advantage to this because the test for ceSheduleScavengeTrampoline ends up only jumping when a scavenge is required, returning if not.  And it will save even more space.  This at the cost of a leaf call to get to the trampoline plus setting a register with either the number of slots in the array or the number of copied values.

But what's most satisfying is the use of the entire VMMaker framework with the simulator and Bochs plugin allowing me to test the code generation and the generated code immediately with no long edit-compile cycles.  I was able to implement the entire refactoring of temp vector and closure creation (moving it from the Cogit to the object representations) and implement machine code allocation over a weekend in which I still took my son fishing, went shopping, played gran turismo and watched a couple of movies.

This is the first time I've ever implemented context creation in machine code in a Smalltalk VM.  Context creation is complex because the context has to be allocated and married with its stack frame.  In the VW VM and in Cog up until now that has been done in C code, requiring an expensive switch from the Smalltalk machine code stack to the C stack, and using a generic instantiation routine that marries any kind of context, large or small x method or block.  Now I have four machine code routines, large or small x method or block, that are extremely tight and take a single register parameter which is the argument count, which is known at compile time.  That's possible both because Spur makes allocation simple /and/ because the whole Cog VMMaker framework makes developing and debugging machine code so much easier than in a C VM.

Have you already done at:put: and stringAt:put: or is it your next step ?

That's been there for quite a while.
 
Please tell us about the new bench results with these features.

So in the current Cog the following, which is mostly Interval>>#do: and SmallInteger>>#+, but involves lots of closure creation where all but the outer closure has copied values:

[(1 to: 10) do: [:i| (1 to: 10) do: [:j| (1 to: 10) do: [:k| (1 to: 10) do: [:l| (1 to: 10) do: [:m| (1 to: 10) do: [:n| (1 to: 10) do: [:o| (1 to: 10) do: [:p| i + j + k + l + m + n + o + p]]]]]]]]] timeToRun

evaluates to 9002, 9 seconds.

In the new Spur VM it just evaluated to 4829, 4.8 seconds, for a -46% speedup.

Currently the Newspeak bootstrap, which loads a number of Monticello packages, including the Newspeak compiler, and then compiles a number of Newspeak packages, about 80k lines of Newspeak, now takes about -51% of the time using Spur.

Alas I now need two computers.  I tend to keep lots of tabs open in my web browser and unless I quit Chrome my benchmark figures slow down by about a factor of two :-( [ ;-) ].

Of course, this:

[1 to: 10 do: [:i| 1 to: 10 do: [:j| 1 to: 10 do: [:k| 1 to: 10 do: [:l| 1 to: 10 do: [:m| 1 to: 10 do: [:n| 1 to: 10 do: [:o| 1 to: 10 do: [:p| i + j + k + l + m + n + o + p]]]]]]]]] timeToRun

is even faster, 2320, for a -52% speedup w.r.t. Spur and a -74% speedup w.r.t. the current VM.  And that's what you're working on Clément.  (Clément is working on adaptive optimization which will optimize the Interval>>#do: code into the inlined to:do: code on the fly).  And thats going to be really exciting!


Clément


2014-06-02 16:14 GMT+02:00 <[hidden email]>:

Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.746.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.746
Author: eem
Time: 1 June 2014, 6:05:30.694 pm
UUID: cc4961d3-e629-4e28-b308-88eab314a8c9
Ancestors: VMMaker.oscog-eem.745

Implement a peephole in the Spur Cogit for an indirection
vector initialized with a single value  Avoid initializing the
slot in the array to nil and instead initialize it with the value.

Refactor setting byte1, byte2 & byte3 into
loadSubsequentBytesForDescriptor:at: for the peephole
tryCollapseTempVectorInitializationOfSize:.

No loner inline CoInterpreter>>pre/postGCAction: for VM profiling.

Increase the number of trampoline table slots.

Simulator:
Fix CurrentImageCoInterpreterFacade for the new Spur
inline instantiation code.

=============== Diff against VMMaker.oscog-eem.745 ===============

Item was changed:
  ----- Method: CoInterpreter>>postGCAction: (in category 'object memory support') -----
  postGCAction: gcModeArg
        "Attempt to shrink free memory, signal the gc semaphore and let the Cogit do its post GC thang"
+       <inline: false>
        self assert: gcModeArg = gcMode.
        super postGCAction: gcModeArg.
        cogit cogitPostGCAction: gcModeArg.
        lastCoggableInterpretedBlockMethod := lastUncoggableInterpretedBlockMethod := nil.
        gcMode := 0!

Item was changed:
  ----- Method: CoInterpreter>>preGCAction: (in category 'object memory support') -----
  preGCAction: gcModeArg
+       <inline: false>
-       <inline: true>
        "Need to write back the frame pointers unless all pages are free (as in snapshot).
         Need to set gcMode var (to avoid passing the flag through a lot of the updating code)"
        super preGCAction: gcModeArg.

        gcMode := gcModeArg.

        cogit recordEventTrace ifTrue:
                [| traceType |
                traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
                self recordTrace: traceType thing: traceType source: 0].

        cogit recordPrimTrace ifTrue:
                [| traceType |
                traceType := gcModeArg == GCModeFull ifTrue: [TraceFullGC] ifFalse: [TraceIncrementalGC].
                self fastLogPrim: traceType]!

Item was added:
+ ----- Method: CogObjectRepresentation>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+       "Answer if the object representation allocates arrays inline.  By
+        default answer false. Better code can be generated when creating
+        arrays inline if values are /not/ flushed to the stack."
+       ^false!

Item was removed:
- ----- Method: CogObjectRepresentationFor32BitSpur>>createsClosuresInline (in category 'bytecode generator support') -----
- createsClosuresInline
-       "Answer if the object representation allocates closures inline.  By
-        default answer false. Better code can be generated when creating
-        closures inline if copied values are /not/ flushed to the stack."
-       ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsArraysInline (in category 'bytecode generator support') -----
+ createsArraysInline
+       "Answer if the object representation allocates arrays inline.  By
+        default answer false. Better code can be generated when creating
+        arrays inline if values are /not/ flushed to the stack."
+       ^true!

Item was added:
+ ----- Method: CogObjectRepresentationForSpur>>createsClosuresInline (in category 'bytecode generator support') -----
+ createsClosuresInline
+       "Answer if the object representation allocates closures inline.  By
+        default answer false. Better code can be generated when creating
+        closures inline if copied values are /not/ flushed to the stack."
+       ^true!

Item was changed:
  ----- Method: Cogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
        "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
        | nextOpcodeIndex descriptor fixup result nExts |
        <var: #descriptor type: #'BytecodeDescriptor *'>
        <var: #fixup type: #'BytecodeFixup *'>
        bytecodePC := start.
        nExts := 0.
        [byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj)  + bytecodeSetOffset.
         descriptor := self generatorAt: byte0.
+        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
-        descriptor numBytes > 1 ifTrue:
-               [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
-                descriptor numBytes > 2 ifTrue:
-                       [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
-                        descriptor numBytes > 3 ifTrue:
-                               [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
-                                descriptor numBytes > 4 ifTrue:
-                                       [self notYetImplemented]]]].
         nextOpcodeIndex := opcodeIndex.
         result := self perform: descriptor generator.
         descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
                [self assert: (extA = 0 and: [extB = 0])].
         fixup := self fixupAt: bytecodePC - initialPC.
         fixup targetInstruction ~= 0 ifTrue:
                ["There is a fixup for this bytecode.  It must point to the first generated
                   instruction for this bytecode.  If there isn't one we need to add a label."
                 opcodeIndex = nextOpcodeIndex ifTrue:
                        [self Label].
                 fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
         bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
         result = 0 and: [bytecodePC <= end]]
                whileTrue:
                        [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
        self checkEnoughOpcodes.
        ^result!

Item was added:
+ ----- Method: Cogit>>loadSubsequentBytesForDescriptor:at: (in category 'compile abstract instructions') -----
+ loadSubsequentBytesForDescriptor: descriptor at: pc
+       <var: #descriptor type: #'BytecodeDescriptor *'>
+       descriptor numBytes > 1 ifTrue:
+               [byte1 := objectMemory fetchByte: pc + 1 ofObject: methodObj.
+                descriptor numBytes > 2 ifTrue:
+                       [byte2 := objectMemory fetchByte: pc + 2 ofObject: methodObj.
+                        descriptor numBytes > 3 ifTrue:
+                               [byte3 := objectMemory fetchByte: pc + 3 ofObject: methodObj.
+                                descriptor numBytes > 4 ifTrue:
+                                       [self notYetImplemented]]]]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^self subclassResponsibility!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>cogit: (in category 'initialize-release') -----
  cogit: aCogit
        cogit := aCogit.
        coInterpreter cogit: aCogit.
+       (objectMemory respondsTo: #cogit:) ifTrue:
+               [objectMemory cogit: aCogit]!
-       objectMemory cogit: aCogit!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>indexablePointersFormat (in category 'accessing') -----
+ indexablePointersFormat
+       ^objectMemory indexablePointersFormat!

Item was changed:
  ----- Method: CurrentImageCoInterpreterFacade>>initialize (in category 'initialize-release') -----
  initialize
        memory := ByteArray new: 262144.
+       objectMemory := self class objectMemoryClass new.
-       objectMemory := NewCoObjectMemory new.
        coInterpreter := CoInterpreter new.
        coInterpreter
                instVarNamed: 'objectMemory'
                        put: objectMemory;
                instVarNamed: 'primitiveTable'
                        put: (CArrayAccessor on: CoInterpreter primitiveTable copy).
        variables := Dictionary new.
        #('stackLimit') do:
                [:l| self addressForLabel: l].
        self initializeObjectMap!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacade>>methodNeedsLargeContext: (in category 'accessing') -----
+ methodNeedsLargeContext: aMethodOop
+       ^(self objectForOop: aMethodOop) frameSize > CompiledMethod smallFrameSize!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^Spur32BitCoMemoryManager!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>arrayFormat (in category 'accessing') -----
+ arrayFormat
+       ^objectMemory arrayFormat!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>getScavengeThreshold (in category 'accessing') -----
+ getScavengeThreshold
+       ^objectMemory getScavengeThreshold ifNil: [16r24680]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>headerForSlots:format:classIndex: (in category 'accessing') -----
+ headerForSlots: numSlots format: formatField classIndex: classIndex
+       ^objectMemory headerForSlots: numSlots format: formatField classIndex: classIndex!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>numSlotsMask (in category 'accessing') -----
+ numSlotsMask
+       ^objectMemory numSlotsMask!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>rememberedBitShift (in category 'accessing') -----
+ rememberedBitShift
+       ^objectMemory rememberedBitShift!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>smallObjectBytesForSlots: (in category 'accessing') -----
+ smallObjectBytesForSlots: numSlots
+       ^objectMemory smallObjectBytesForSlots: numSlots!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>storeCheckBoundary (in category 'accessing') -----
+ storeCheckBoundary
+       ^objectMemory storeCheckBoundary ifNil: [16r12345678]!

Item was added:
+ ----- Method: CurrentImageCoInterpreterFacadeForSqueakV3ObjectRepresentation class>>objectMemoryClass (in category 'accessing') -----
+ objectMemoryClass
+       ^NewObjectMemory!

Item was changed:
  ----- Method: SimpleStackBasedCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
        super initializeMiscConstants.
        MaxLiteralCountForCompile := initializationOptions at: #MaxLiteralCountForCompile ifAbsent: [60].
        NumTrampolines := NewspeakVM
+                                                       ifTrue: [50]
+                                                       ifFalse: [42]!
-                                                       ifTrue: [46]
-                                                       ifFalse: [38]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit class>>initializeMiscConstants (in category 'class initialization') -----
  initializeMiscConstants
        super initializeMiscConstants.
        NumTrampolines := NewspeakVM
+                                                       ifTrue: [60]
+                                                       ifFalse: [52]!
-                                                       ifTrue: [58]
-                                                       ifFalse: [50]!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>compileAbstractInstructionsFrom:through: (in category 'compile abstract instructions') -----
  compileAbstractInstructionsFrom: start through: end
        "Loop over bytecodes, dispatching to the generator for each bytecode, handling fixups in due course."
        | nextOpcodeIndex descriptor nExts fixup result |
        <var: #descriptor type: #'BytecodeDescriptor *'>
        <var: #fixup type: #'BytecodeFixup *'>
        self traceSimStack.
        bytecodePC := start.
        nExts := 0.
        descriptor := nil.
        deadCode := false.
        [self cCode: '' inSmalltalk:
                [(debugBytecodePointers includes: bytecodePC) ifTrue: [self halt]].
        fixup := self fixupAt: bytecodePC - initialPC.
        fixup targetInstruction asUnsignedInteger > 0
                ifTrue:
                        [deadCode := false.
                         fixup targetInstruction asUnsignedInteger >= 2 ifTrue:
                                [self merge: fixup
                                        afterContinuation: (descriptor notNil
                                                                                and: [descriptor isUnconditionalBranch
                                                                                        or: [descriptor isReturn]]) not]]
                ifFalse: "If there's no fixup following a return there's no jump to that code and it is dead."
                        [(descriptor notNil and: [descriptor isReturn]) ifTrue:
                                [deadCode := true]].
         self cCode: '' inSmalltalk:
                [deadCode ifFalse:
                        [self assert: simStackPtr + (needsFrame ifTrue: [0] ifFalse: [1])
                                                = (self debugStackPointerFor: bytecodePC)]].
         byte0 := (objectMemory fetchByte: bytecodePC ofObject: methodObj) + bytecodeSetOffset.
         descriptor := self generatorAt: byte0.
+        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
-        descriptor numBytes > 1 ifTrue:
-               [byte1 := objectMemory fetchByte: bytecodePC + 1 ofObject: methodObj.
-                descriptor numBytes > 2 ifTrue:
-                       [byte2 := objectMemory fetchByte: bytecodePC + 2 ofObject: methodObj.
-                        descriptor numBytes > 3 ifTrue:
-                               [byte3 := objectMemory fetchByte: bytecodePC + 3 ofObject: methodObj.
-                                descriptor numBytes > 4 ifTrue:
-                                       [self notYetImplemented]]]].
         nextOpcodeIndex := opcodeIndex.
         result := deadCode
                                ifTrue: "insert nops for dead code that is mapped so that bc to mc mapping is not many to one"
                                        [(descriptor isMapped
                                          or: [inBlock and: [descriptor isMappedInBlock]]) ifTrue:
                                                [self annotateBytecode: self Nop].
                                                0]
                                ifFalse:
                                        [self perform: descriptor generator].
         descriptor isExtension ifFalse: "extended bytecodes must consume their extensions"
                [self assert: (extA = 0 and: [extB = 0])].
         self traceDescriptor: descriptor; traceSimStack.
         (fixup targetInstruction asUnsignedInteger between: 1 and: 2) ifTrue:
                ["There is a fixup for this bytecode.  It must point to the first generated
                   instruction for this bytecode.  If there isn't one we need to add a label."
                 opcodeIndex = nextOpcodeIndex ifTrue:
                        [self Label].
                 fixup targetInstruction: (self abstractInstructionAt: nextOpcodeIndex)].
         bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC exts: nExts in: methodObj.
         result = 0 and: [bytecodePC <= end]] whileTrue:
                [nExts := descriptor isExtension ifTrue: [nExts + 1] ifFalse: [0]].
        self checkEnoughOpcodes.
        ^result!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>evaluate:at: (in category 'peephole optimizations') -----
+ evaluate: descriptor at: pc
+       <var: #descriptor type: #'BytecodeDescriptor *'>
+       byte0 := objectMemory fetchByte: pc ofObject: methodObj.
+       self assert: descriptor = (self generatorAt: bytecodeSetOffset + byte0).
+       self loadSubsequentBytesForDescriptor: descriptor at: pc.
+       self perform: descriptor generator!

Item was changed:
  ----- Method: StackToRegisterMappingCogit>>genPushNewArrayBytecode (in category 'bytecode generators') -----
  genPushNewArrayBytecode
        | size popValues |
        self assert: needsFrame.
        optStatus isReceiverResultRegLive: false.
        (popValues := byte1 > 127)
                ifTrue: [self ssFlushTo: simStackPtr]
                ifFalse: [self ssAllocateCallReg: SendNumArgsReg and: ReceiverResultReg].
        size := byte1 bitAnd: 127.
+       popValues ifFalse:
+               [(self tryCollapseTempVectorInitializationOfSize: size) ifTrue:
+                       [^0]].
        objectRepresentation genNewArrayOfSize: size initialized: popValues not.
        popValues ifTrue:
                [size - 1 to: 0 by: -1 do:
                        [:i|
                        self PopR: TempReg.
                        objectRepresentation
                                genStoreSourceReg: TempReg
                                slotIndex: i
                                intoNewObjectInDestReg: ReceiverResultReg].
                 self ssPop: size].
        ^self ssPushRegister: ReceiverResultReg!

Item was added:
+ ----- Method: StackToRegisterMappingCogit>>tryCollapseTempVectorInitializationOfSize: (in category 'peephole optimizations') -----
+ tryCollapseTempVectorInitializationOfSize: slots
+       "Try and collapse
+               push: (Array new: 1)
+               popIntoTemp: tempIndex
+               pushConstant: const or pushTemp: n
+               popIntoTemp: 0 inVectorAt: tempIndex
+        into
+               tempAt: tempIndex put: {const}.
+        One might think that we should look for a sequence of more than
+        one pushes and pops but this is extremely rare."
+       | pushArrayDesc storeArrayDesc pushValueDesc storeValueDesc reg |
+       <var: #pushArrayDesc type: #'BytecodeDescriptor *'>
+       <var: #pushValueDesc type: #'BytecodeDescriptor *'>
+       <var: #storeArrayDesc type: #'BytecodeDescriptor *'>
+       <var: #storeValueDesc type: #'BytecodeDescriptor *'>
+       slots ~= 1 ifTrue:
+               [^false].
+       pushArrayDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                               ofObject: methodObj).
+       self assert: pushArrayDesc generator == #genPushNewArrayBytecode.
+       storeArrayDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                               ofObject: methodObj).
+       storeArrayDesc generator ~~ #genStoreAndPopTemporaryVariableBytecode ifTrue:
+               [^false].
+       pushValueDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                                               + storeArrayDesc numBytes
+                                                                                               ofObject: methodObj).
+       (pushValueDesc generator ~~ #genPushLiteralConstantBytecode
+        and: [pushValueDesc generator ~~ #genPushQuickIntegerConstantBytecode
+        and: [pushValueDesc generator ~~ #genPushTemporaryVariableBytecode]]) ifTrue:
+               [^false].
+       storeValueDesc := self generatorAt: bytecodeSetOffset
+                                                                               + (objectMemory
+                                                                                               fetchByte: bytecodePC
+                                                                                                               + pushArrayDesc numBytes
+                                                                                                               + storeArrayDesc numBytes
+                                                                                                               + pushValueDesc numBytes
+                                                                                               ofObject: methodObj).
+       storeValueDesc generator ~~ #genStoreAndPopRemoteTempLongBytecode ifTrue:
+               [^false].
+
+       objectRepresentation genNewArrayOfSize: 1 initialized: false.
+       self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc numBytes + storeArrayDesc numBytes.
+       reg := self ssStorePop: true toPreferredReg: TempReg.
+       objectRepresentation
+               genStoreSourceReg: reg
+               slotIndex: 0
+               intoNewObjectInDestReg: ReceiverResultReg.
+       self ssPushRegister: ReceiverResultReg.
+       self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc numBytes.
+       bytecodePC := bytecodePC
+                                       "+ pushArrayDesc numBytes this gets added by nextBytecodePCFor:at:exts:in:"
+                                       + storeArrayDesc numBytes
+                                       + pushValueDesc numBytes
+                                       + storeValueDesc numBytes.
+       ^true!






--
best,
Eliot