The Inbox: Kernel-nice.1217.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Kernel-nice.1217.mcz

commits-2
Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1217.mcz

==================== Summary ====================

Name: Kernel-nice.1217
Author: nice
Time: 26 April 2019, 6:01:39.036232 pm
UUID: 39478bc1-0f55-4747-8332-45a6d225b281
Ancestors: Kernel-eem.1215

Publish Karatsuba and 3-way Toom-Cooke algorithms for fast large integer multiplication.
These are efficient divide and conquer algorithms that recursively split a problem into simpler problems.

Currently this fast multiplication is used only in LargePositiveInteger>>squared.
Slightly modify raisedToInteger: to use that squared.
This result on a small penalty for small number exponentation, but large speed up for big integers.

Also publish another Karatsuba-like square root algorithm that should accelerate sqrt of huge integers.


Examples on a 64 bits image/VM

[10 raisedToInteger: 30] bench.
 '3,770,000 per second. 265 nanoseconds per run.' - before
 '3,500,000 per second. 286 nanoseconds per run.' - after
[10.0 raisedToInteger: 30] bench.
 '10,900,000 per second. 91.6 nanoseconds per run.' - before
 '9,540,000 per second. 105 nanoseconds per run.' - after

| x |
x := 100 factorial.
[x raisedToInteger: 30] bench.
 '15,900 per second. 63 microseconds per run.' - before
 '19,500 per second. 51.3 microseconds per run.' - after

| x |
x := 1000 factorial.
[x raisedToInteger: 30] bench.
 '58.8 per second. 17 milliseconds per run.' - before
 '179 per second. 5.6 milliseconds per run.' - after

| x |
x := 500 factorial - 100 factorial.
{
[x sqrtFloor] bench.
[x sqrtFloorNewtonRaphson] bench.
}
 #(
        '42,300 per second. 23.6 microseconds per run.'
        '8,240 per second. 121 microseconds per run.')

=============== Diff against Kernel-eem.1215 ===============

Item was added:
+ ----- Method: Integer>>fastMultiply: (in category 'arithmetic') -----
+ fastMultiply: anInteger
+ ^self * anInteger!

Item was added:
+ ----- Method: Integer>>karatsubaTimes: (in category 'arithmetic') -----
+ karatsubaTimes: anInteger
+ "Eventually use Karatsuba algorithm to perform the multiplication.
+ This is efficient only for large integers (see subclass).
+ By default, use regular multiplication."
+
+ ^self * anInteger!

Item was changed:
  ----- Method: Integer>>sqrtFloor (in category 'mathematical functions') -----
  sqrtFloor
+ "Return the integer part of the square root of self
+ Assume self >= 0
+ The following invariants apply:
+ 1) self sqrtFloor squared <= self
+ 2) (self sqrtFloor + 1) squared > self"
- "Return the integer part of the square root of self"
 
+ ^self sqrtFloorNewtonRaphson!
- | guess delta |
- guess := 1 bitShift: self highBit + 1 // 2.
- [
- delta := guess squared - self // (guess bitShift: 1).
- delta = 0 ] whileFalse: [
- guess := guess - delta ].
- ^guess - 1!

Item was added:
+ ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'mathematical functions') -----
+ sqrtFloorNewtonRaphson
+ "Return the integer part of the square root of self.
+ Use a Newton-Raphson iteration since it converges quadratically
+ (twice more bits of precision at each iteration)"
+
+ | guess delta |
+ guess := 1 bitShift: self highBit + 1 // 2.
+ [
+ delta := guess squared - self // (guess bitShift: 1).
+ delta = 0 ] whileFalse: [
+ guess := guess - delta ].
+ ^guess - 1!

Item was added:
+ ----- Method: Integer>>sqrtRem (in category 'mathematical functions') -----
+ sqrtRem
+ "Return an array with floor sqrt and sqrt remainder.
+ Assume self >= 0.
+ The following invariants apply:
+ 1) self  sqrtRem first squared <= self
+ 2) (self sqrtRem first + 1) squared > self
+ 3) self sqrtRem first squared + self sqrtRem last = self"
+
+ | s |
+ s := self sqrtFloorNewtonRaphson.
+ ^{ s. self - s squared } !

Item was added:
+ ----- Method: Integer>>toom3Times: (in category 'arithmetic') -----
+ toom3Times: anInteger
+ ^self * anInteger!

Item was added:
+ ----- Method: LargePositiveInteger>>butLowestNDigits: (in category 'private') -----
+ butLowestNDigits: N
+ "make a new integer removing N least significant digits of self."
+
+ ^self bitShiftMagnitude: -8 * N!

Item was added:
+ ----- Method: LargePositiveInteger>>copyDigitsFrom:to: (in category 'private') -----
+ copyDigitsFrom: start to: stop
+ "Make a new integer keeping only some digits of self."
+
+ | len slice |
+ start > 0 ifFalse: [^self error: 'start index should be at least 1'].
+ len := self digitLength.
+ (start > len or: [start > stop]) ifTrue: [^0].
+ stop >= len
+ ifTrue: [start = 1 ifTrue: [^self].
+ len := len - start + 1]
+ ifFalse: [len := stop - start + 1].
+ slice := self class new: len.
+ slice replaceFrom: 1 to: len with: self startingAt: start.
+ ^slice normalize!

Item was added:
+ ----- Method: LargePositiveInteger>>fastMultiply: (in category 'arithmetic') -----
+ fastMultiply: anInteger
+ "Eventually use Karatsuba or 3-way Toom-Cook algorithm to perform the multiplication"
+
+ | xLen yLen |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > (yLen := anInteger digitLength)
+ ifTrue: [^ anInteger fastMultiply: self ].
+
+ "If too short to be worth, fallback to naive algorithm"
+ (xLen >= 600) ifFalse: [^self * anInteger].
+ (xLen >= 6000) ifFalse: [^self karatsubaTimes: anInteger].
+ ^self toom3Times: anInteger!

Item was added:
+ ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'arithmetic') -----
+ karatsubaTimes: anInteger
+ "Eventually use Karatsuba algorithm to perform the multiplication
+ The Karatsuba algorithm divide number in two smaller parts.
+ Then it uses tricks to perform only 3 multiplications.
+ See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
+
+ | half xHigh xLow yHigh yLow low high mid xLen yLen |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > (yLen := anInteger digitLength)
+ ifTrue: [^ anInteger karatsubaTimes: self ].
+
+ "If too short to be worth, fallback to naive algorithm"
+ (xLen >= 600) ifFalse: [^self * anInteger].
+    
+ "Seek for integers of about the same length, else split the long one"
+ yLen >= (7 * xLen bitShift: -2)
+ ifTrue:
+ [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
+ self karatsubaTimes:
+ (anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]].
+
+ "At this point, lengths are similar, divide each integer in two halves"
+ half := (yLen + 1 bitShift: -1)  bitClear: 2r11.
+ xLow := self lowestNDigits: half.
+ xHigh := self butLowestNDigits: half.
+ yLow := anInteger lowestNDigits: half.
+ yHigh := anInteger butLowestNDigits: half.
+
+ "Karatsuba trick: perform with 3 multiplications instead of 4"
+ low := xLow karatsubaTimes: yLow.
+ high := xHigh karatsubaTimes: yHigh.
+ mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh).
+
+ "Sum the parts of decomposition"
+ ^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half)
+
+ "
+ | a b |
+ a := 650 factorial-1.
+ b := 700 factorial-1.
+ {a digitLength. b digitLength}.
+ self assert: (a karatsubaTimes: b) - (a * b) = 0.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a * b]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>lowestNDigits: (in category 'private') -----
+ lowestNDigits: N
+ "Make a new integer keeping only N least significant digits of self"
+
+ | low |
+ N >= self digitLength ifTrue: [^self].
+ low := self class new: N.
+ low replaceFrom: 1 to: N with: self startingAt: 1.
+ ^low normalize!

Item was added:
+ ----- Method: LargePositiveInteger>>sqrtFloor (in category 'mathematical functions') -----
+ sqrtFloor
+ "Like super, but use a faster divide and conquer strategy if it's worth"
+
+ self digitLength >= 64 ifFalse: [^self sqrtFloorNewtonRaphson].
+ ^self sqrtRem first!

Item was added:
+ ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
+ sqrtRem
+ "Like super, but use a divide and conquer method to perform this operation.
+ See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
+ https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
+
+ | n  qr s r sr high mid low |
+ n := self digitLength bitShift: -2.
+ n >= 16 ifFalse: [^super sqrtRem].
+ high := self butLowestNDigits: n * 2.
+ mid := self copyDigitsFrom: n + 1 to: n * 2.
+ low := self lowestNDigits: n.
+
+ sr := high sqrtRem.
+ qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+ s := (sr first bitShift: 8 * n) + qr first.
+ r := (qr last bitShift: 8 * n) + low - qr first squared.
+ r negative
+ ifTrue:
+ [r := (s bitShift: 1) + r - 1.
+ s := s - 1].
+ sr at: 1 put: s; at: 2 put: r.
+ ^sr
+ !

Item was added:
+ ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
+ squared
+ "Eventually use a divide and conquer algorithm to perform the multiplication"
+
+ (self digitLength >= 400) ifFalse: [^self * self].
+ (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
+ ^self squaredToom3!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredKaratsuba (in category 'mathematical functions') -----
+ squaredKaratsuba
+ "Use a divide and conquer algorithm to perform the multiplication.
+ Split in two parts like Karatsuba, but economize 2 additions by using asymetrical product."
+
+ | half xHigh xLow low high mid |
+
+ "Divide digits in two halves"
+ half := self digitLength + 1 // 2 bitClear: 2r11.
+ xLow := self lowestNDigits: half.
+ xHigh := self butLowestNDigits: half.
+
+ "eventually use karatsuba"
+ low := xLow squared.
+ high := xHigh squared.
+ mid := xLow karatsubaTimes: xHigh.
+
+ "Sum the parts of decomposition"
+ ^low + (mid bitShift: 8*half+1) + (high bitShift: 16*half)
+
+ "
+ | a |
+ a := 440 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredKaratsuba = 0.
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a squaredKaratsuba]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a * a]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom3 (in category 'mathematical functions') -----
+ squaredToom3
+ "Use a 3-way Toom-Cook divide and conquer algorithm to perform the multiplication"
+
+ | third x0 x1 x2 x20 z0 z1 z2 z3 z4 |
+ "divide in 3 parts"
+ third := self digitLength + 2 // 3 bitClear: 2r11.
+ x2 := self butLowestNDigits: third * 2.
+ x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ x0 := self lowestNDigits: third.
+
+ "Toom-3 trick: 5 multiplications instead of 9"
+ z0 := x0 squared.
+ z4 := x2 squared.
+ x20 := x2 + x0.
+ z1 := (x20 + x1) squared.
+ x20 := x20 - x1.
+ z2 := x20 squared.
+ z3 := ((x20 + x2 bitShift: 1) - x0) squared.
+
+ "Sum the parts of decomposition"
+ z3 := z3 - z1 quo: 3.
+ z1 := z1 - z2 bitShift: -1.
+ z2 := z2 - z0.
+
+ z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ z2 := z2 + z1 - z4.
+ z1 := z1 - z3.
+ ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
+
+ "
+ | a |
+ a := 1400 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredToom3 = 0.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom3]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>toom3Times: (in category 'arithmetic') -----
+ toom3Times: anInteger
+ "Eventually use a variant of Toom-Cooke for performing multiplication.
+ Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
+ It divides operands in n parts instead of 2.
+ See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
+ Here we divide each operand in 3 parts, thus the name Toom-3."
+
+ | third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > (yLen := anInteger digitLength)
+ ifTrue: [^anInteger toom3Times: self ].
+
+ "If too short to be worth, fallback to Karatsuba algorithm"
+ (xLen > 6000) ifFalse: [^self karatsubaTimes: anInteger].
+
+ "Seek for well balanced integers"
+ yLen > (5 * xLen bitShift: -2)
+ ifTrue: [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
+ self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]].
+
+ "At this point, lengths are well balanced, divide in 3 parts"
+ third := yLen + 2 // 3 bitClear: 2r11.
+ x2 := self butLowestNDigits: third * 2.
+ x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ x0 := self lowestNDigits: third.
+ y2 := anInteger butLowestNDigits: third * 2.
+ y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
+ y0 := anInteger lowestNDigits: third.
+
+ "Toom-3 trick: 5 multiplications instead of 9"
+ z0 := x0 toom3Times: y0.
+ z4 := x2 toom3Times: y2.
+ x20 := x2 + x0.
+ y20 := y2 + y0.
+ z1 := x20 + x1 toom3Times: y20 + y1.
+ x20 := x20 - x1.
+ y20 := y20 - y1.
+ z2 := x20 toom3Times: y20.
+ z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0.
+
+ "Sum the parts of decomposition"
+ z3 := z3 - z1 quo: 3.
+ z1 := z1 - z2 bitShift: -1.
+ z2 := z2 - z0.
+
+ z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ z2 := z2 + z1 - z4.
+ z1 := z1 - z3.
+ ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
+
+ "
+ | a b |
+ a :=5000 factorial - 1.
+ b := 6000 factorial - 1.
+ a digitLength min: b digitLength.
+ a digitLength max: b digitLength.
+ (a toom3Times: b) = (a * b) ifFalse: [self error].
+ [Smalltalk garbageCollect.
+ [300 timesRepeat: [a toom3Times: b]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat
+ "!

Item was changed:
  ----- Method: Number>>raisedToInteger: (in category 'mathematical functions') -----
  raisedToInteger: anInteger
  "The 0 raisedToInteger: 0 is an special case. In some contexts must be 1 and in others must
  be handled as an indeterminate form.
  I take the first context because that's the way that was previously handled.
  Maybe further discussion is required on this topic."
 
  | bitProbe result |
  anInteger negative ifTrue: [^(self raisedToInteger: anInteger negated) reciprocal].
  bitProbe := 1 bitShift: anInteger highBit - 1.
  result := self class one.
  [
  (anInteger bitAnd: bitProbe) > 0 ifTrue: [ result := result * self ].
  (bitProbe := bitProbe bitShift: -1) > 0 ]
+ whileTrue: [ result := result squared ].
- whileTrue: [ result := result * result ].
  ^result!