Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1217.mcz ==================== Summary ==================== Name: Kernel-nice.1217 Author: nice Time: 26 April 2019, 6:01:39.036232 pm UUID: 39478bc1-0f55-4747-8332-45a6d225b281 Ancestors: Kernel-eem.1215 Publish Karatsuba and 3-way Toom-Cooke algorithms for fast large integer multiplication. These are efficient divide and conquer algorithms that recursively split a problem into simpler problems. Currently this fast multiplication is used only in LargePositiveInteger>>squared. Slightly modify raisedToInteger: to use that squared. This result on a small penalty for small number exponentation, but large speed up for big integers. Also publish another Karatsuba-like square root algorithm that should accelerate sqrt of huge integers. Examples on a 64 bits image/VM [10 raisedToInteger: 30] bench. '3,770,000 per second. 265 nanoseconds per run.' - before '3,500,000 per second. 286 nanoseconds per run.' - after [10.0 raisedToInteger: 30] bench. '10,900,000 per second. 91.6 nanoseconds per run.' - before '9,540,000 per second. 105 nanoseconds per run.' - after | x | x := 100 factorial. [x raisedToInteger: 30] bench. '15,900 per second. 63 microseconds per run.' - before '19,500 per second. 51.3 microseconds per run.' - after | x | x := 1000 factorial. [x raisedToInteger: 30] bench. '58.8 per second. 17 milliseconds per run.' - before '179 per second. 5.6 milliseconds per run.' - after | x | x := 500 factorial - 100 factorial. { [x sqrtFloor] bench. [x sqrtFloorNewtonRaphson] bench. } #( '42,300 per second. 23.6 microseconds per run.' '8,240 per second. 121 microseconds per run.') =============== Diff against Kernel-eem.1215 =============== Item was added: + ----- Method: Integer>>fastMultiply: (in category 'arithmetic') ----- + fastMultiply: anInteger + ^self * anInteger! Item was added: + ----- Method: Integer>>karatsubaTimes: (in category 'arithmetic') ----- + karatsubaTimes: anInteger + "Eventually use Karatsuba algorithm to perform the multiplication. + This is efficient only for large integers (see subclass). + By default, use regular multiplication." + + ^self * anInteger! Item was changed: ----- Method: Integer>>sqrtFloor (in category 'mathematical functions') ----- sqrtFloor + "Return the integer part of the square root of self + Assume self >= 0 + The following invariants apply: + 1) self sqrtFloor squared <= self + 2) (self sqrtFloor + 1) squared > self" - "Return the integer part of the square root of self" + ^self sqrtFloorNewtonRaphson! - | guess delta | - guess := 1 bitShift: self highBit + 1 // 2. - [ - delta := guess squared - self // (guess bitShift: 1). - delta = 0 ] whileFalse: [ - guess := guess - delta ]. - ^guess - 1! Item was added: + ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'mathematical functions') ----- + sqrtFloorNewtonRaphson + "Return the integer part of the square root of self. + Use a Newton-Raphson iteration since it converges quadratically + (twice more bits of precision at each iteration)" + + | guess delta | + guess := 1 bitShift: self highBit + 1 // 2. + [ + delta := guess squared - self // (guess bitShift: 1). + delta = 0 ] whileFalse: [ + guess := guess - delta ]. + ^guess - 1! Item was added: + ----- Method: Integer>>sqrtRem (in category 'mathematical functions') ----- + sqrtRem + "Return an array with floor sqrt and sqrt remainder. + Assume self >= 0. + The following invariants apply: + 1) self sqrtRem first squared <= self + 2) (self sqrtRem first + 1) squared > self + 3) self sqrtRem first squared + self sqrtRem last = self" + + | s | + s := self sqrtFloorNewtonRaphson. + ^{ s. self - s squared } ! Item was added: + ----- Method: Integer>>toom3Times: (in category 'arithmetic') ----- + toom3Times: anInteger + ^self * anInteger! Item was added: + ----- Method: LargePositiveInteger>>butLowestNDigits: (in category 'private') ----- + butLowestNDigits: N + "make a new integer removing N least significant digits of self." + + ^self bitShiftMagnitude: -8 * N! Item was added: + ----- Method: LargePositiveInteger>>copyDigitsFrom:to: (in category 'private') ----- + copyDigitsFrom: start to: stop + "Make a new integer keeping only some digits of self." + + | len slice | + start > 0 ifFalse: [^self error: 'start index should be at least 1']. + len := self digitLength. + (start > len or: [start > stop]) ifTrue: [^0]. + stop >= len + ifTrue: [start = 1 ifTrue: [^self]. + len := len - start + 1] + ifFalse: [len := stop - start + 1]. + slice := self class new: len. + slice replaceFrom: 1 to: len with: self startingAt: start. + ^slice normalize! Item was added: + ----- Method: LargePositiveInteger>>fastMultiply: (in category 'arithmetic') ----- + fastMultiply: anInteger + "Eventually use Karatsuba or 3-way Toom-Cook algorithm to perform the multiplication" + + | xLen yLen | + "arrange to have the receiver be the shorter" + (xLen := self digitLength) > (yLen := anInteger digitLength) + ifTrue: [^ anInteger fastMultiply: self ]. + + "If too short to be worth, fallback to naive algorithm" + (xLen >= 600) ifFalse: [^self * anInteger]. + (xLen >= 6000) ifFalse: [^self karatsubaTimes: anInteger]. + ^self toom3Times: anInteger! Item was added: + ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'arithmetic') ----- + karatsubaTimes: anInteger + "Eventually use Karatsuba algorithm to perform the multiplication + The Karatsuba algorithm divide number in two smaller parts. + Then it uses tricks to perform only 3 multiplications. + See https://en.wikipedia.org/wiki/Karatsuba_algorithm" + + | half xHigh xLow yHigh yLow low high mid xLen yLen | + "arrange to have the receiver be the shorter" + (xLen := self digitLength) > (yLen := anInteger digitLength) + ifTrue: [^ anInteger karatsubaTimes: self ]. + + "If too short to be worth, fallback to naive algorithm" + (xLen >= 600) ifFalse: [^self * anInteger]. + + "Seek for integers of about the same length, else split the long one" + yLen >= (7 * xLen bitShift: -2) + ifTrue: + [^(0 to: yLen - 1 by: xLen + 1) digitShiftSum: [:yShift | + self karatsubaTimes: + (anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]]. + + "At this point, lengths are similar, divide each integer in two halves" + half := (yLen + 1 bitShift: -1) bitClear: 2r11. + xLow := self lowestNDigits: half. + xHigh := self butLowestNDigits: half. + yLow := anInteger lowestNDigits: half. + yHigh := anInteger butLowestNDigits: half. + + "Karatsuba trick: perform with 3 multiplications instead of 4" + low := xLow karatsubaTimes: yLow. + high := xHigh karatsubaTimes: yHigh. + mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh). + + "Sum the parts of decomposition" + ^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half) + + " + | a b | + a := 650 factorial-1. + b := 700 factorial-1. + {a digitLength. b digitLength}. + self assert: (a karatsubaTimes: b) - (a * b) = 0. + [Smalltalk garbageCollect. + [1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value / + [Smalltalk garbageCollect. + [1000 timesRepeat: [a * b]] timeToRun] value asFloat + "! Item was added: + ----- Method: LargePositiveInteger>>lowestNDigits: (in category 'private') ----- + lowestNDigits: N + "Make a new integer keeping only N least significant digits of self" + + | low | + N >= self digitLength ifTrue: [^self]. + low := self class new: N. + low replaceFrom: 1 to: N with: self startingAt: 1. + ^low normalize! Item was added: + ----- Method: LargePositiveInteger>>sqrtFloor (in category 'mathematical functions') ----- + sqrtFloor + "Like super, but use a faster divide and conquer strategy if it's worth" + + self digitLength >= 64 ifFalse: [^self sqrtFloorNewtonRaphson]. + ^self sqrtRem first! Item was added: + ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') ----- + sqrtRem + "Like super, but use a divide and conquer method to perform this operation. + See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854> + https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf" + + | n qr s r sr high mid low | + n := self digitLength bitShift: -2. + n >= 16 ifFalse: [^super sqrtRem]. + high := self butLowestNDigits: n * 2. + mid := self copyDigitsFrom: n + 1 to: n * 2. + low := self lowestNDigits: n. + + sr := high sqrtRem. + qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false. + s := (sr first bitShift: 8 * n) + qr first. + r := (qr last bitShift: 8 * n) + low - qr first squared. + r negative + ifTrue: + [r := (s bitShift: 1) + r - 1. + s := s - 1]. + sr at: 1 put: s; at: 2 put: r. + ^sr + ! Item was added: + ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') ----- + squared + "Eventually use a divide and conquer algorithm to perform the multiplication" + + (self digitLength >= 400) ifFalse: [^self * self]. + (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba]. + ^self squaredToom3! Item was added: + ----- Method: LargePositiveInteger>>squaredKaratsuba (in category 'mathematical functions') ----- + squaredKaratsuba + "Use a divide and conquer algorithm to perform the multiplication. + Split in two parts like Karatsuba, but economize 2 additions by using asymetrical product." + + | half xHigh xLow low high mid | + + "Divide digits in two halves" + half := self digitLength + 1 // 2 bitClear: 2r11. + xLow := self lowestNDigits: half. + xHigh := self butLowestNDigits: half. + + "eventually use karatsuba" + low := xLow squared. + high := xHigh squared. + mid := xLow karatsubaTimes: xHigh. + + "Sum the parts of decomposition" + ^low + (mid bitShift: 8*half+1) + (high bitShift: 16*half) + + " + | a | + a := 440 factorial-1. + a digitLength. + self assert: a * a - a squaredKaratsuba = 0. + [Smalltalk garbageCollect. + [2000 timesRepeat: [a squaredKaratsuba]] timeToRun] value / + [Smalltalk garbageCollect. + [2000 timesRepeat: [a * a]] timeToRun] value asFloat + "! Item was added: + ----- Method: LargePositiveInteger>>squaredToom3 (in category 'mathematical functions') ----- + squaredToom3 + "Use a 3-way Toom-Cook divide and conquer algorithm to perform the multiplication" + + | third x0 x1 x2 x20 z0 z1 z2 z3 z4 | + "divide in 3 parts" + third := self digitLength + 2 // 3 bitClear: 2r11. + x2 := self butLowestNDigits: third * 2. + x1 := self copyDigitsFrom: third + 1 to: third * 2. + x0 := self lowestNDigits: third. + + "Toom-3 trick: 5 multiplications instead of 9" + z0 := x0 squared. + z4 := x2 squared. + x20 := x2 + x0. + z1 := (x20 + x1) squared. + x20 := x20 - x1. + z2 := x20 squared. + z3 := ((x20 + x2 bitShift: 1) - x0) squared. + + "Sum the parts of decomposition" + z3 := z3 - z1 quo: 3. + z1 := z1 - z2 bitShift: -1. + z2 := z2 - z0. + + z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1). + z2 := z2 + z1 - z4. + z1 := z1 - z3. + ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third) + + " + | a | + a := 1400 factorial-1. + a digitLength. + self assert: a * a - a squaredToom3 = 0. + [Smalltalk garbageCollect. + [1000 timesRepeat: [a squaredToom3]] timeToRun] value / + [Smalltalk garbageCollect. + [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat + "! Item was added: + ----- Method: LargePositiveInteger>>toom3Times: (in category 'arithmetic') ----- + toom3Times: anInteger + "Eventually use a variant of Toom-Cooke for performing multiplication. + Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm. + It divides operands in n parts instead of 2. + See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication + Here we divide each operand in 3 parts, thus the name Toom-3." + + | third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 | + "arrange to have the receiver be the shorter" + (xLen := self digitLength) > (yLen := anInteger digitLength) + ifTrue: [^anInteger toom3Times: self ]. + + "If too short to be worth, fallback to Karatsuba algorithm" + (xLen > 6000) ifFalse: [^self karatsubaTimes: anInteger]. + + "Seek for well balanced integers" + yLen > (5 * xLen bitShift: -2) + ifTrue: [^(0 to: yLen - 1 by: xLen + 1) digitShiftSum: [:yShift | + self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]]. + + "At this point, lengths are well balanced, divide in 3 parts" + third := yLen + 2 // 3 bitClear: 2r11. + x2 := self butLowestNDigits: third * 2. + x1 := self copyDigitsFrom: third + 1 to: third * 2. + x0 := self lowestNDigits: third. + y2 := anInteger butLowestNDigits: third * 2. + y1 := anInteger copyDigitsFrom: third + 1 to: third * 2. + y0 := anInteger lowestNDigits: third. + + "Toom-3 trick: 5 multiplications instead of 9" + z0 := x0 toom3Times: y0. + z4 := x2 toom3Times: y2. + x20 := x2 + x0. + y20 := y2 + y0. + z1 := x20 + x1 toom3Times: y20 + y1. + x20 := x20 - x1. + y20 := y20 - y1. + z2 := x20 toom3Times: y20. + z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0. + + "Sum the parts of decomposition" + z3 := z3 - z1 quo: 3. + z1 := z1 - z2 bitShift: -1. + z2 := z2 - z0. + + z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1). + z2 := z2 + z1 - z4. + z1 := z1 - z3. + ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third) + + " + | a b | + a :=5000 factorial - 1. + b := 6000 factorial - 1. + a digitLength min: b digitLength. + a digitLength max: b digitLength. + (a toom3Times: b) = (a * b) ifFalse: [self error]. + [Smalltalk garbageCollect. + [300 timesRepeat: [a toom3Times: b]] timeToRun] value / + [Smalltalk garbageCollect. + [300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat + "! Item was changed: ----- Method: Number>>raisedToInteger: (in category 'mathematical functions') ----- raisedToInteger: anInteger "The 0 raisedToInteger: 0 is an special case. In some contexts must be 1 and in others must be handled as an indeterminate form. I take the first context because that's the way that was previously handled. Maybe further discussion is required on this topic." | bitProbe result | anInteger negative ifTrue: [^(self raisedToInteger: anInteger negated) reciprocal]. bitProbe := 1 bitShift: anInteger highBit - 1. result := self class one. [ (anInteger bitAnd: bitProbe) > 0 ifTrue: [ result := result * self ]. (bitProbe := bitProbe bitShift: -1) > 0 ] + whileTrue: [ result := result squared ]. - whileTrue: [ result := result * result ]. ^result! |
Free forum by Nabble | Edit this page |