The Trunk: Kernel-nice.1222.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Kernel-nice.1222.mcz

commits-2
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1222.mcz

==================== Summary ====================

Name: Kernel-nice.1222
Author: nice
Time: 28 April 2019, 3:55:49.124486 pm
UUID: e697844c-d6bd-4781-a572-553f79c98644
Ancestors: Kernel-nice.1221

Make fastMultiply: the default for multiplying integers

The scheme is:
- first try 64bit imul (primitive 29) if receiver is Large, or (primitive 9) if receiver is Small.
- then check for operand type in super *, use fastMultiply: if Integer
- then check receiver length and invoke O(N^2) schoolbook multiply if too small ( invoke LargeIntegers primitive thru digitMultiply:neg: )
- then check operand byte length
- then dispatch to Karatsuba or Toom3 (or future enhancements)

For medium integers (>64 bits) that's only 1 indirection more than previous implementation, so a very small (negligible) penalty.

For larger integers (a few thousand bits) that's a win.

Note that the length threshold heuristics may vary from one platform to another, one VM to another, and wether images are 32 or 64 bits...
We could make them class var, and offer some initialize method so as to automatically tune them.

=============== Diff against Kernel-nice.1221 ===============

Item was changed:
  ----- Method: Integer>>* (in category 'arithmetic') -----
  * aNumber
  "Refer to the comment in Number * "
  aNumber isInteger ifTrue:
+ [^ self fastMultiply: aNumber].
- [^ self digitMultiply: aNumber
- neg: self negative ~~ aNumber negative].
  ^ aNumber adaptToInteger: self andSend: #*!

Item was changed:
+ ----- Method: Integer>>fastMultiply: (in category 'private') -----
- ----- Method: Integer>>fastMultiply: (in category 'arithmetic') -----
  fastMultiply: anInteger
+ ^self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!

Item was changed:
+ ----- Method: Integer>>karatsubaTimes: (in category 'private') -----
- ----- Method: Integer>>karatsubaTimes: (in category 'arithmetic') -----
  karatsubaTimes: anInteger
  "Eventually use Karatsuba algorithm to perform the multiplication.
  This is efficient only for large integers (see subclass).
  By default, use regular multiplication."
 
+ ^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!

Item was changed:
+ ----- Method: Integer>>toom3Times: (in category 'private') -----
- ----- Method: Integer>>toom3Times: (in category 'arithmetic') -----
  toom3Times: anInteger
+ ^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!

Item was changed:
  ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
  digitDiv32: anInteger
  "This is part of the recursive division algorithm from Burnikel - Ziegler
  Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
  Each limb is made of p bytes (8*p bits).
  This step transforms the division problem into multiplication
  It must use the fastMultiply: to be worth the overhead costs."
 
  | a2 b1 d p q qr r |
  p := anInteger digitLength + 1 bitShift: -1.
  (a2 := self butLowestNDigits: 2*p)
  < (b1 := anInteger butLowestNDigits: p)
  ifTrue:
  [qr := (self butLowestNDigits: p) digitDiv21: b1.
  q := qr first.
  r := qr last]
  ifFalse:
  [q := (1 bitShift: 8*p) - 1.
  r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ d := q * (anInteger lowestNDigits: p).
- d := q fastMultiply: (anInteger lowestNDigits: p).
  r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
  [r < 0]
  whileTrue:
  [q := q - 1.
  r := r + anInteger].
  ^Array with: q with: r
  !

Item was changed:
+ ----- Method: LargePositiveInteger>>fastMultiply: (in category 'private') -----
- ----- Method: LargePositiveInteger>>fastMultiply: (in category 'arithmetic') -----
  fastMultiply: anInteger
+ "Return the result of multiplying the receiver by the Integer argument.
+ This method dispatch to the fastest algorithm based on operands length."
- "Eventually use Karatsuba or 3-way Toom-Cook algorithm to perform the multiplication"
 
+ | length |
+ "Revert to schoolbook multiplication if short"
+ (length := self digitLength) < 600
+ ifTrue: [^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative].
+
+ "Arrange to have the receiver be the shorter and retry"
+ length > anInteger digitLength
- | xLen |
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > anInteger digitLength
  ifTrue: [^ anInteger fastMultiply: self ].
 
+ "Choose the fastest algorithm based on another heuristic"
+ length < 6000 ifTrue: [^self karatsubaTimes: anInteger].
- "If too short to be worth, fallback to naive algorithm"
- (xLen >= 600) ifFalse: [^self * anInteger].
- (xLen >= 6000) ifFalse: [^self karatsubaTimes: anInteger].
  ^self toom3Times: anInteger!

Item was changed:
+ ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'private') -----
- ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'arithmetic') -----
  karatsubaTimes: anInteger
  "Eventually use Karatsuba algorithm to perform the multiplication
  The Karatsuba algorithm divide number in two smaller parts.
  Then it uses tricks to perform only 3 multiplications.
  See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
 
  | half xHigh xLow yHigh yLow low high mid xLen yLen |
+ "Revert to schoolbook multiplication if short"
+ (xLen := self digitLength) < 600
+ ifTrue: [^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative].
+
+ "Arrange to have the receiver be the shorter and retry"
+ xLen > (yLen := anInteger digitLength)
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > (yLen := anInteger digitLength)
  ifTrue: [^ anInteger karatsubaTimes: self ].
-
- "If too short to be worth, fallback to naive algorithm"
- (xLen >= 600) ifFalse: [^self * anInteger].
     
  "Seek for integers of about the same length, else split the long one"
  yLen >= (7 * xLen bitShift: -2)
  ifTrue:
  [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
  self karatsubaTimes:
  (anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]].
 
  "At this point, lengths are similar, divide each integer in two halves"
  half := (yLen + 1 bitShift: -1)  bitClear: 2r11.
  xLow := self lowestNDigits: half.
  xHigh := self butLowestNDigits: half.
  yLow := anInteger lowestNDigits: half.
  yHigh := anInteger butLowestNDigits: half.
 
  "Karatsuba trick: perform with 3 multiplications instead of 4"
  low := xLow karatsubaTimes: yLow.
  high := xHigh karatsubaTimes: yHigh.
  mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh).
 
  "Sum the parts of decomposition"
  ^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half)
 
  "
  | a b |
  a := 650 factorial-1.
  b := 700 factorial-1.
  {a digitLength. b digitLength}.
  self assert: (a karatsubaTimes: b) - (a * b) = 0.
  [Smalltalk garbageCollect.
  [1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value /
  [Smalltalk garbageCollect.
  [1000 timesRepeat: [a * b]] timeToRun] value asFloat
  "!

Item was changed:
  ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
  squaredToom4
  "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
  See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
  https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
 
  | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
  "divide in 4 parts"
  p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
  a3 := self butLowestNDigits: p * 3.
  a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
  a1 := self copyDigitsFrom: p + 1 to: p * 2.
  a0 := self lowestNDigits: p.
 
  "Toom-4 trick: 7 multiplications instead of 16"
  a02 := a0 - a2.
  a13 := a1 - a3.
  s0 := a0 squared.
+ s1 := (a0 * a1) bitShift: 1.
+ s2 := (a02 + a13) * (a02 - a13).
- s1 := (a0 fastMultiply: a1) bitShift: 1.
- s2 := (a02 + a13) fastMultiply: (a02 - a13).
  s3 := ((a0 + a1) + (a2 + a3)) squared.
+ s4 := (a02 * a13) bitShift: 1.
+ s5 := (a3 * a2) bitShift: 1.
- s4 := (a02 fastMultiply: a13) bitShift: 1.
- s5 := (a3 fastMultiply: a2) bitShift: 1.
  s6 := a3 squared.
 
  "Interpolation"
  t2 := s1 + s5.
  t3 := (s2 + s3 + s4 bitShift: -1) - t2.
  s3 := t2 - s4.
  s4 := t3 - s0.
  s2 := t3 - s2 - s6.
 
  "Sum the parts of decomposition"
  ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
  +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
 
  "
  | a |
  a := 770 factorial-1.
  a digitLength.
  [a * a - a squaredToom4 = 0] assert.
  [Smalltalk garbageCollect.
  [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
  [Smalltalk garbageCollect.
  [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
  "!

Item was changed:
+ ----- Method: LargePositiveInteger>>toom3Times: (in category 'private') -----
- ----- Method: LargePositiveInteger>>toom3Times: (in category 'arithmetic') -----
  toom3Times: anInteger
  "Eventually use a variant of Toom-Cooke for performing multiplication.
  Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
  It divides operands in n parts instead of 2.
  See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
  Here we divide each operand in 3 parts, thus the name Toom-3.
  And use a Bodrato-Zanoni variant for the choice of interpolation points and matrix inversion
  See What about Toom-Cook matrices optimality? - Marco Bodrato, Alberto Zanoni - Oct. 2006
  http://www.bodrato.it/papers/WhatAboutToomCookMatricesOptimality.pdf"
 
  | third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 |
+ "Revert to schoolbook multiplication if short"
+ (xLen := self digitLength) < 6000
+ ifTrue: [^ self karatsubaTimes: anInteger ].
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > (yLen := anInteger digitLength)
- ifTrue: [^anInteger toom3Times: self ].
-
- "If too short to be worth, fallback to Karatsuba algorithm"
- (xLen > 6000) ifFalse: [^self karatsubaTimes: anInteger].
 
+ "Arrange to have the receiver be the shorter and retry"
+ xLen > (yLen := anInteger digitLength)
+ ifTrue: [^ anInteger toom3Times: self ].
+
  "Seek for well balanced integers"
  yLen > (5 * xLen bitShift: -2)
  ifTrue: [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
  self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]].
 
  "At this point, lengths are well balanced, divide in 3 parts"
  third := yLen + 2 // 3 bitClear: 2r11.
  x2 := self butLowestNDigits: third * 2.
  x1 := self copyDigitsFrom: third + 1 to: third * 2.
  x0 := self lowestNDigits: third.
  y2 := anInteger butLowestNDigits: third * 2.
  y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
  y0 := anInteger lowestNDigits: third.
 
  "Toom-3 trick: 5 multiplications instead of 9"
  z0 := x0 toom3Times: y0.
  z4 := x2 toom3Times: y2.
  x20 := x2 + x0.
  y20 := y2 + y0.
  z1 := x20 + x1 toom3Times: y20 + y1.
  x20 := x20 - x1.
  y20 := y20 - y1.
  z2 := x20 toom3Times: y20.
  z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0.
 
  "Sum the parts of decomposition"
  z3 := z3 - z1 quo: 3.
  z1 := z1 - z2 bitShift: -1.
  z2 := z2 - z0.
 
  z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
  z2 := z2 + z1 - z4.
  z1 := z1 - z3.
  ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
 
  "
  | a b |
  a :=5000 factorial - 1.
  b := 6000 factorial - 1.
  a digitLength min: b digitLength.
  a digitLength max: b digitLength.
  (a toom3Times: b) = (a * b) ifFalse: [self error].
  [Smalltalk garbageCollect.
  [300 timesRepeat: [a toom3Times: b]] timeToRun] value /
  [Smalltalk garbageCollect.
  [300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat
  "!