VM Maker: VMMaker.oscog-tfrel.1719.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

VM Maker: VMMaker.oscog-tfrel.1719.mcz

commits-2
 
Tim Felgentreff uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-tfrel.1719.mcz

==================== Summary ====================

Name: VMMaker.oscog-tfrel.1719
Author: tfrel
Time: 9 March 2016, 10:35:56.474573 pm
UUID: b09bd0a2-7d2a-41d2-9794-b83f2881c59e
Ancestors: VMMaker.oscog-eem.1718

Improve the performance of BitBltSimulator by using += rather than + and new assignment for the src and dest index pointers if they are CObjectAccessors. This avoids creating many copies of CObjectAccessors in the inner BitBlt loops, and thus improves simulation performance dramatically. For the generator, the code is the same as previously, but now in an inlined method, so nothing changes in C-land.

=============== Diff against VMMaker.oscog-eem.1718 ===============

Item was changed:
  ----- Method: BitBltSimulation>>copyLoop (in category 'inner loop') -----
  copyLoop
  | prevWord thisWord skewWord halftoneWord mergeWord hInc y unskew skewMask notSkewMask mergeFnwith destWord |
  "This version of the inner loop assumes noSource = false."
  <inline: false>
  <var: #mergeFnwith declareC: 'sqInt (*mergeFnwith)(sqInt, sqInt)'>
  mergeFnwith := self cCoerce: (opTable at: combinationRule+1) to: 'sqInt (*)(sqInt, sqInt)'.
  mergeFnwith.  "null ref for compiler"
 
  hInc := hDir*4.  "Byte delta"
  "degenerate skew fixed for Sparc. 10/20/96 ikp"
  skew == -32
  ifTrue: [skew := unskew := skewMask := 0]
  ifFalse: [skew < 0
  ifTrue:
  [unskew := skew+32.
  skewMask := AllOnes << (0-skew)]
  ifFalse:
  [skew = 0
  ifTrue:
  [unskew := 0.
  skewMask := AllOnes]
  ifFalse:
  [unskew := skew-32.
  skewMask := AllOnes >> skew]]].
  notSkewMask := skewMask bitInvert32.
  noHalftone
  ifTrue: [halftoneWord := AllOnes.  halftoneHeight := 0]
  ifFalse: [halftoneWord := self halftoneAt: 0].
 
  y := dy.
  1 to: bbH do: "here is the vertical loop"
  [ :i |
  halftoneHeight > 1 ifTrue:  "Otherwise, its always the same"
  [halftoneWord := self halftoneAt: y.
  y := y + vDir].
  preload ifTrue:
  ["load the 64-bit shifter"
  prevWord := self srcLongAt: sourceIndex.
+ self incSrcIndex: hInc]
- sourceIndex := sourceIndex + hInc]
  ifFalse:
  [prevWord := 0].
 
  "Note: the horizontal loop has been expanded into three parts for speed:"
 
  "This first section requires masking of the destination store..."
  destMask := mask1.
  thisWord := self srcLongAt: sourceIndex.  "pick up next word"
+ self incSrcIndex: hInc.
- sourceIndex := sourceIndex + hInc.
  skewWord := ((prevWord bitAnd: notSkewMask) bitShift: unskew)
  bitOr:  "32-bit rotate"
  ((thisWord bitAnd: skewMask) bitShift: skew).
  prevWord := thisWord.
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord) with: destWord.
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
+ self incDestIndex: hInc.
- destIndex := destIndex + hInc.
 
  "This central horizontal loop requires no store masking"
  destMask := AllOnes.
  combinationRule = 3
  ifTrue: [(skew = 0) & (halftoneWord = AllOnes)
  ifTrue:
  ["Very special inner loop for STORE mode with no skew -- just move words"
  hDir = -1
  ifTrue: ["Woeful patch: revert to older code for hDir = -1"
  2 to: nWords-1 do:
  [ :word |
  thisWord := self srcLongAt: sourceIndex.
+ self incSrcIndex: hInc.
- sourceIndex := sourceIndex + hInc.
  self dstLongAt: destIndex put: thisWord.
+ self incDestIndex: hInc]]
- destIndex := destIndex + hInc]]
  ifFalse: [2 to: nWords-1 do:
  [ :word |  "Note loop starts with prevWord loaded (due to preload)"
  self dstLongAt: destIndex put: prevWord.
+ self incDestIndex: hInc.
- destIndex := destIndex + hInc.
  prevWord := self srcLongAt: sourceIndex.
+ self incSrcIndex: hInc]]]
- sourceIndex := sourceIndex + hInc]]]
  ifFalse:
  ["Special inner loop for STORE mode -- no need to call merge"
  2 to: nWords-1 do:
  [ :word |
  thisWord := self srcLongAt: sourceIndex.
+ self incSrcIndex: hInc.
- sourceIndex := sourceIndex + hInc.
  skewWord := ((prevWord bitAnd: notSkewMask) bitShift: unskew)
  bitOr:  "32-bit rotate"
  ((thisWord bitAnd: skewMask) bitShift: skew).
  prevWord := thisWord.
  self dstLongAt: destIndex put: (skewWord bitAnd: halftoneWord).
+ self incDestIndex: hInc]]
- destIndex := destIndex + hInc]]
  ] ifFalse: [2 to: nWords-1 do: "Normal inner loop does merge:"
  [ :word |
  thisWord := self srcLongAt: sourceIndex.  "pick up next word"
+ self incSrcIndex: hInc.
- sourceIndex := sourceIndex + hInc.
  skewWord := ((prevWord bitAnd: notSkewMask) bitShift: unskew)
  bitOr:  "32-bit rotate"
  ((thisWord bitAnd: skewMask) bitShift: skew).
  prevWord := thisWord.
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord)
  with: (self dstLongAt: destIndex).
  self dstLongAt: destIndex put: mergeWord.
+ self incDestIndex: hInc]
- destIndex := destIndex + hInc]
  ].
 
  "This last section, if used, requires masking of the destination store..."
  nWords > 1 ifTrue:
  [destMask := mask2.
  thisWord := self srcLongAt: sourceIndex.  "pick up next word"
+ self incSrcIndex: hInc.
- sourceIndex := sourceIndex + hInc.
  skewWord := ((prevWord bitAnd: notSkewMask) bitShift: unskew)
  bitOr:  "32-bit rotate"
  ((thisWord bitAnd: skewMask) bitShift: skew).
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord) with: destWord.
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
+ self incDestIndex: hInc].
- destIndex := destIndex + hInc].
 
+ self incSrcIndex: sourceDelta.
+ self incDestIndex: destDelta]!
- sourceIndex := sourceIndex + sourceDelta.
- destIndex := destIndex + destDelta]!

Item was changed:
  ----- Method: BitBltSimulation>>copyLoopNoSource (in category 'inner loop') -----
  copyLoopNoSource
  "Faster copyLoop when source not used.  hDir and vDir are both
  positive, and perload and skew are unused"
  | halftoneWord mergeWord mergeFnwith destWord |
  <inline: false>
  <var: #mergeFnwith declareC: 'sqInt (*mergeFnwith)(sqInt, sqInt)'>
  mergeFnwith := self cCoerce: (opTable at: combinationRule+1) to: 'sqInt (*)(sqInt, sqInt)'.
  mergeFnwith.  "null ref for compiler"
 
  1 to: bbH do: "here is the vertical loop"
  [ :i |
  noHalftone
  ifTrue: [halftoneWord := AllOnes]
  ifFalse: [halftoneWord := self halftoneAt: dy+i-1].
 
  "Note: the horizontal loop has been expanded into three parts for speed:"
 
  "This first section requires masking of the destination store..."
  destMask := mask1.
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: halftoneWord
  with: destWord.
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
+ self incDestIndex: 4.
- destIndex := destIndex + 4.
 
  "This central horizontal loop requires no store masking"
  destMask := AllOnes.
  combinationRule = 3 ifTrue: ["Special inner loop for STORE"
  destWord := halftoneWord.
  2 to: nWords-1 do:[ :word |
  self dstLongAt: destIndex put: destWord.
+ self incDestIndex: 4].
- destIndex := destIndex + 4].
  ] ifFalse:[ "Normal inner loop does merge"
  2 to: nWords-1 do:[ :word | "Normal inner loop does merge"
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: halftoneWord with: destWord.
  self dstLongAt: destIndex put: mergeWord.
+ self incDestIndex: 4].
- destIndex := destIndex + 4].
  ].
 
  "This last section, if used, requires masking of the destination store..."
  nWords > 1 ifTrue:
  [destMask := mask2.
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: halftoneWord with: destWord.
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
+ self incDestIndex: 4].
- destIndex := destIndex + 4].
 
+ self incDestIndex: destDelta]!
- destIndex := destIndex + destDelta]!

Item was changed:
  ----- Method: BitBltSimulation>>copyLoopPixMap (in category 'inner loop') -----
  copyLoopPixMap
  "This version of the inner loop maps source pixels
  to a destination form with different depth.  Because it is already
  unweildy, the loop is not unrolled as in the other versions.
  Preload, skew and skewMask are all overlooked, since pickSourcePixels
  delivers its destination word already properly aligned.
  Note that pickSourcePixels could be copied in-line at the top of
  the horizontal loop, and some of its inits moved out of the loop."
  "ar 12/7/1999:
  The loop has been rewritten to use only one pickSourcePixels call.
  The idea is that the call itself could be inlined. If we decide not
  to inline pickSourcePixels we could optimize the loop instead."
  | skewWord halftoneWord mergeWord scrStartBits nSourceIncs startBits endBits sourcePixMask destPixMask mergeFnwith nPix srcShift dstShift destWord words srcShiftInc dstShiftInc dstShiftLeft mapperFlags |
  <inline: false>
  <var: #mergeFnwith declareC: 'sqInt (*mergeFnwith)(sqInt, sqInt)'>
  mergeFnwith := self cCoerce: (opTable at: combinationRule+1) to: 'sqInt (*)(sqInt, sqInt)'.
  mergeFnwith.  "null ref for compiler"
 
  "Additional inits peculiar to unequal source and dest pix size..."
  sourcePPW := 32//sourceDepth.
  sourcePixMask := maskTable at: sourceDepth.
  destPixMask := maskTable at: destDepth.
  mapperFlags := cmFlags bitAnd: ColorMapNewStyle bitInvert32.
  sourceIndex := sourceBits +
  (sy * sourcePitch) + ((sx // sourcePPW) *4).
  scrStartBits := sourcePPW - (sx bitAnd: sourcePPW-1).
  bbW < scrStartBits
  ifTrue: [nSourceIncs := 0]
  ifFalse: [nSourceIncs := (bbW - scrStartBits)//sourcePPW + 1].
  sourceDelta := sourcePitch - (nSourceIncs * 4).
 
  "Note following two items were already calculated in destmask setup!!"
  startBits := destPPW - (dx bitAnd: destPPW-1).
  endBits := ((dx + bbW - 1) bitAnd: destPPW-1) + 1.
 
  bbW < startBits ifTrue:[startBits := bbW].
 
  "Precomputed shifts for pickSourcePixels"
  srcShift := ((sx bitAnd: sourcePPW - 1) * sourceDepth).
  dstShift := ((dx bitAnd: destPPW - 1) * destDepth).
  srcShiftInc := sourceDepth.
  dstShiftInc := destDepth.
  dstShiftLeft := 0.
  sourceMSB ifTrue:[
  srcShift := 32 - sourceDepth - srcShift.
  srcShiftInc := 0 - srcShiftInc].
  destMSB ifTrue:[
  dstShift := 32 - destDepth - dstShift.
  dstShiftInc := 0 - dstShiftInc.
  dstShiftLeft := 32 - destDepth].
 
  1 to: bbH do: "here is the vertical loop"
  [ :i |
  "*** is it possible at all that noHalftone == false? ***"
  noHalftone
  ifTrue:[halftoneWord := AllOnes]
  ifFalse: [halftoneWord := self halftoneAt: dy+i-1].
  "setup first load"
  srcBitShift := srcShift.
  dstBitShift := dstShift.
  destMask := mask1.
  nPix := startBits.
  "Here is the horizontal loop..."
  words := nWords.
  ["pick up the word"
  skewWord := self pickSourcePixels: nPix flags: mapperFlags
  srcMask: sourcePixMask destMask: destPixMask
  srcShiftInc: srcShiftInc dstShiftInc: dstShiftInc.
  "align next word to leftmost pixel"
  dstBitShift := dstShiftLeft.
 
  destMask = AllOnes ifTrue:["avoid read-modify-write"
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord)
  with: (self dstLongAt: destIndex).
  self dstLongAt: destIndex put: (destMask bitAnd: mergeWord).
  ] ifFalse:[ "General version using dest masking"
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord)
  with: (destWord bitAnd: destMask).
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
  ].
+ self incDestIndex: 4.
- destIndex := destIndex + 4.
  words = 2 "e.g., is the next word the last word?"
  ifTrue:["set mask for last word in this row"
  destMask := mask2.
  nPix := endBits]
  ifFalse:["use fullword mask for inner loop"
  destMask := AllOnes.
  nPix := destPPW].
  (words := words - 1) = 0] whileFalse.
  "--- end of inner loop ---"
+ self incSrcIndex: sourceDelta.
+ self incDestIndex: destDelta]
- sourceIndex := sourceIndex + sourceDelta.
- destIndex := destIndex + destDelta]
  !

Item was added:
+ ----- Method: BitBltSimulation>>incDestIndex: (in category 'memory access') -----
+ incDestIndex: offset
+ <inline: true>
+ ^ destIndex := destIndex + offset
+ !

Item was added:
+ ----- Method: BitBltSimulation>>incSrcIndex: (in category 'memory access') -----
+ incSrcIndex: offset
+ <inline: true>
+ ^ sourceIndex := sourceIndex + offset
+ !

Item was changed:
  ----- Method: BitBltSimulation>>pickSourcePixels:flags:srcMask:destMask:srcShiftInc:dstShiftInc: (in category 'combination rules') -----
  pickSourcePixels: nPixels flags: mapperFlags srcMask: srcMask destMask: dstMask srcShiftInc: srcShiftInc dstShiftInc: dstShiftInc
  "Pick nPix pixels starting at srcBitIndex from the source, map by the
  color map, and justify them according to dstBitIndex in the resulting destWord."
  | sourceWord destWord sourcePix destPix srcShift dstShift nPix |
  <inline: true> "oh please"
  sourceWord := self srcLongAt: sourceIndex.
  destWord := 0.
  srcShift := srcBitShift. "Hint: Keep in register"
  dstShift := dstBitShift. "Hint: Keep in register"
  nPix := nPixels. "always > 0 so we can use do { } while(--nPix);"
  (mapperFlags = (ColorMapPresent bitOr: ColorMapIndexedPart)) ifTrue:[
  "a little optimization for (pretty crucial) blits using indexed lookups only"
  [ "grab, colormap and mix in pixel"
  sourcePix := sourceWord >> srcShift bitAnd: srcMask.
  destPix := cmLookupTable at: (sourcePix bitAnd: cmMask).
  destWord := destWord bitOr: (destPix bitAnd: dstMask) << dstShift.
  "adjust dest pix index"
  dstShift := dstShift + dstShiftInc.
  "adjust source pix index"
  ((srcShift := srcShift + srcShiftInc) bitAnd: 16rFFFFFFE0) = 0 ifFalse:[
  sourceMSB ifTrue:[srcShift := srcShift + 32] ifFalse:[srcShift := srcShift - 32].
+ sourceWord := self srcLongAt: (self incSrcIndex: 4)].
- sourceWord := self srcLongAt: (sourceIndex := sourceIndex + 4)].
  (nPix := nPix - 1) = 0] whileFalse.
  ] ifFalse:[
  [ "grab, colormap and mix in pixel"
  sourcePix := sourceWord >> srcShift bitAnd: srcMask.
  destPix := self mapPixel: sourcePix flags: mapperFlags.
  destWord := destWord bitOr: (destPix bitAnd: dstMask) << dstShift.
  "adjust dest pix index"
  dstShift := dstShift + dstShiftInc.
  "adjust source pix index"
  ((srcShift := srcShift + srcShiftInc) bitAnd: 16rFFFFFFE0) = 0 ifFalse:[
  sourceMSB ifTrue:[srcShift := srcShift + 32] ifFalse:[srcShift := srcShift - 32].
+ sourceWord := self srcLongAt: (self incSrcIndex: 4)].
- sourceWord := self srcLongAt: (sourceIndex := sourceIndex + 4)].
  (nPix := nPix - 1) = 0] whileFalse.
  ].
  srcBitShift := srcShift. "Store back"
  ^destWord
  !

Item was changed:
  ----- Method: BitBltSimulation>>warpLoop (in category 'inner loop') -----
  warpLoop
  "This version of the inner loop traverses an arbirary quadrilateral
  source, thus producing a general affine transformation."
  | skewWord halftoneWord mergeWord startBits
   deltaP12x deltaP12y deltaP43x deltaP43y pAx pAy pBx pBy
   xDelta yDelta smoothingCount sourceMapOop
   nSteps nPix words destWord endBits mergeFnwith dstShiftInc dstShiftLeft mapperFlags |
  <inline: false>
  <var: #mergeFnwith declareC: 'sqInt (*mergeFnwith)(sqInt, sqInt)'>
  mergeFnwith := self cCoerce: (opTable at: combinationRule+1) to: 'sqInt (*)(sqInt, sqInt)'.
  mergeFnwith.  "null ref for compiler"
 
  (interpreterProxy slotSizeOf: bitBltOop) >= (BBWarpBase+12)
  ifFalse: [^ interpreterProxy primitiveFail].
  nSteps := height-1.  nSteps <= 0 ifTrue: [nSteps := 1].
 
  pAx := self fetchIntOrFloat: BBWarpBase ofObject: bitBltOop.
  words := self fetchIntOrFloat: BBWarpBase+3 ofObject: bitBltOop.
  deltaP12x := self deltaFrom: pAx to: words nSteps: nSteps.
  deltaP12x < 0 ifTrue: [pAx := words - (nSteps*deltaP12x)].
 
  pAy := self fetchIntOrFloat: BBWarpBase+1 ofObject: bitBltOop.
  words := self fetchIntOrFloat: BBWarpBase+4 ofObject: bitBltOop.
  deltaP12y := self deltaFrom: pAy to: words nSteps: nSteps.
  deltaP12y < 0 ifTrue: [pAy := words - (nSteps*deltaP12y)].
 
  pBx := self fetchIntOrFloat: BBWarpBase+9 ofObject: bitBltOop.
  words := self fetchIntOrFloat: BBWarpBase+6 ofObject: bitBltOop.
  deltaP43x := self deltaFrom: pBx to: words nSteps: nSteps.
  deltaP43x < 0 ifTrue: [pBx := words - (nSteps*deltaP43x)].
 
  pBy := self fetchIntOrFloat: BBWarpBase+10 ofObject: bitBltOop.
  words := self fetchIntOrFloat: BBWarpBase+7 ofObject: bitBltOop.
  deltaP43y := self deltaFrom: pBy to: words nSteps: nSteps.
  deltaP43y < 0 ifTrue: [pBy := words - (nSteps*deltaP43y)].
 
  interpreterProxy failed ifTrue: [^ false].  "ie if non-integers above"
  interpreterProxy methodArgumentCount = 2
  ifTrue: [smoothingCount := interpreterProxy stackIntegerValue: 1.
  sourceMapOop := interpreterProxy stackValue: 0.
  sourceMapOop = interpreterProxy nilObject
  ifTrue: [sourceDepth < 16 ifTrue:
  ["color map is required to smooth non-RGB dest"
  ^ interpreterProxy primitiveFail]]
  ifFalse: [(interpreterProxy slotSizeOf: sourceMapOop)
  < (1 << sourceDepth) ifTrue:
  ["sourceMap must be long enough for sourceDepth"
  ^ interpreterProxy primitiveFail].
  sourceMapOop := self oopForPointer: (interpreterProxy firstIndexableField: sourceMapOop)]]
  ifFalse: [smoothingCount := 1.
  sourceMapOop := interpreterProxy nilObject].
  nSteps := width-1.  nSteps <= 0 ifTrue: [nSteps := 1].
  startBits := destPPW - (dx bitAnd: destPPW-1).
  endBits := ((dx + bbW - 1) bitAnd: destPPW-1) + 1.
    bbW < startBits ifTrue:[startBits := bbW].
 
  destY < clipY ifTrue:[
  "Advance increments if there was clipping in y"
  pAx := pAx + (clipY - destY * deltaP12x).
  pAy := pAy + (clipY - destY * deltaP12y).
  pBx := pBx + (clipY - destY * deltaP43x).
  pBy := pBy + (clipY - destY * deltaP43y)].
 
  "Setup values for faster pixel fetching."
  self warpLoopSetup.
  "Setup color mapping if not provided"
  (smoothingCount > 1 and:[(cmFlags bitAnd: ColorMapNewStyle) = 0]) ifTrue:[
  cmLookupTable == nil ifTrue:[
  destDepth = 16 ifTrue:[self setupColorMasksFrom: 8 to: 5].
  ] ifFalse:[
  self setupColorMasksFrom: 8 to: cmBitsPerColor.
  ].
  ].
  mapperFlags := cmFlags bitAnd: ColorMapNewStyle bitInvert32.
 
  destMSB
  ifTrue:[ dstShiftInc := 0 - destDepth.
  dstShiftLeft := 32 - destDepth]
  ifFalse:[ dstShiftInc := destDepth.
  dstShiftLeft := 0].
  1 to: bbH do:
  [ :i | "here is the vertical loop..."
  xDelta := self deltaFrom: pAx to: pBx nSteps: nSteps.
    xDelta >= 0 ifTrue: [sx := pAx] ifFalse: [sx := pBx - (nSteps*xDelta)].
  yDelta := self deltaFrom: pAy to: pBy nSteps: nSteps.
    yDelta >= 0 ifTrue: [sy := pAy] ifFalse: [sy := pBy - (nSteps*yDelta)].
 
  destMSB
  ifTrue:[dstBitShift := 32 - ((dx bitAnd: destPPW - 1) + 1 * destDepth)]
  ifFalse:[dstBitShift := (dx bitAnd: destPPW - 1) * destDepth].
 
  (destX < clipX) ifTrue:[
  "Advance increments if there was clipping in x"
  sx := sx + (clipX - destX * xDelta).
  sy := sy + (clipX - destX * yDelta).
  ].
 
  noHalftone
  ifTrue: [halftoneWord := AllOnes]
  ifFalse: [halftoneWord := self halftoneAt: dy+i-1].
  destMask := mask1.
  nPix := startBits.
  "Here is the inner loop..."
  words := nWords.
  ["pick up word"
  smoothingCount = 1 ifTrue:["Faster if not smoothing"
  skewWord := self warpPickSourcePixels: nPix
  xDeltah: xDelta yDeltah: yDelta
  xDeltav: deltaP12x yDeltav: deltaP12y
  dstShiftInc: dstShiftInc flags: mapperFlags.
  ] ifFalse:["more difficult with smoothing"
  skewWord := self warpPickSmoothPixels: nPix
  xDeltah: xDelta yDeltah: yDelta
  xDeltav: deltaP12x yDeltav: deltaP12y
  sourceMap: sourceMapOop
  smoothing: smoothingCount
  dstShiftInc: dstShiftInc.
  ].
  "align next word access to left most pixel"
  dstBitShift := dstShiftLeft.
  destMask = AllOnes ifTrue:["avoid read-modify-write"
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord)
  with: (self dstLongAt: destIndex).
  self dstLongAt: destIndex put: (destMask bitAnd: mergeWord).
  ] ifFalse:[ "General version using dest masking"
  destWord := self dstLongAt: destIndex.
  mergeWord := self mergeFn: (skewWord bitAnd: halftoneWord)
  with: (destWord bitAnd: destMask).
  destWord := (destMask bitAnd: mergeWord) bitOr:
  (destWord bitAnd: destMask bitInvert32).
  self dstLongAt: destIndex put: destWord.
  ].
+ self incDestIndex: 4.
- destIndex := destIndex + 4.
  words = 2 "e.g., is the next word the last word?"
  ifTrue:["set mask for last word in this row"
  destMask := mask2.
  nPix := endBits]
  ifFalse:["use fullword mask for inner loop"
  destMask := AllOnes.
  nPix := destPPW].
  (words := words - 1) = 0] whileFalse.
  "--- end of inner loop ---"
  pAx := pAx + deltaP12x.
  pAy := pAy + deltaP12y.
  pBx := pBx + deltaP43x.
  pBy := pBy + deltaP43y.
+ self incDestIndex: destDelta]!
- destIndex := destIndex + destDelta]!

Item was added:
+ ----- Method: BitBltSimulator>>incDestIndex: (in category 'memory access') -----
+ incDestIndex: offset
+ "if offset is a CObject, this avoids a copy, making the simulation massively faster"
+ ^ offset isCObjectAccessor
+ ifTrue: [destIndex += offset]
+ ifFalse: [super incDestIndex: offset]
+ !

Item was added:
+ ----- Method: BitBltSimulator>>incSrcIndex: (in category 'memory access') -----
+ incSrcIndex: offset
+ "if offset is a CObject, this avoids a copy, making the simulation massively faster"
+ ^ offset isCObjectAccessor
+ ifTrue: [sourceIndex += offset]
+ ifFalse: [super incSrcIndex: offset]
+ !