The Trunk: Kernel-nice.1220.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Kernel-nice.1220.mcz

commits-2
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1220.mcz

==================== Summary ====================

Name: Kernel-nice.1220
Author: nice
Time: 27 April 2019, 2:36:07.52994 pm
UUID: b2eb6e2e-1dfa-4d23-9421-ec0e5e8d4f86
Ancestors: Kernel-fn.1216

Accelerate some huge Integer arithmetic
(This is a squash of inbox commits)

These algorithms use divide and conquer strategy that recursively split a problem into simpler problems.

1) Multiplication and Squaring

 - Implement Karatsuba and 3-way Toom-Cook algorithms for fast large integer multiplication (a Bodrato Zanoni variant)

 - Implement the 2-way asymetrical Karatsuba, 3-way symetrical, and 4-way asymetrical Toom-Cook squaring variant of Chung-Hasan. Note that asymetrical squareToom4 over-performs the symetrical squaredToom3 even for medium size (800 bytes).

- provide a fastMultiply: that dispatch on appropriate variant based on heuristics. Currently this fast multiplication is used only in LargePositiveInteger>>squared, and other algorithme below.

- slightly modify raisedToInteger: to use that squared.
This result in a small penalty for small number exponentation, but large speed up for big integers.

2) Square-root

- implement the Karatsuba-like square root algorithm from Paul Zimmerman.

3) Division

- Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.

- Use this recursive algorithm to accelerate digitDiv:neg:

This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf 

4) Printing

- Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 4x for 50000 factorial printString.


Examples on a 64 bits image/VM (MBP 2015 i5 2.7GHz)

[10 raisedToInteger: 30] bench.
 '3,770,000 per second. 265 nanoseconds per run.' - before
 '3,500,000 per second. 286 nanoseconds per run.' - after
[10.0 raisedToInteger: 30] bench.
 '10,900,000 per second. 91.6 nanoseconds per run.' - before
 '9,540,000 per second. 105 nanoseconds per run.' - after

| x |
x := 100 factorial.
[x raisedToInteger: 30] bench.
 '15,900 per second. 63 microseconds per run.' - before
 '19,500 per second. 51.3 microseconds per run.' - after

| x |
x := 1000 factorial.
[x raisedToInteger: 30] bench.
 '58.8 per second. 17 milliseconds per run.' - before
 '179 per second. 5.6 milliseconds per run.' - after

| x |
x := 500 factorial - 100 factorial.
{
[x sqrtFloor] bench.
[x sqrtFloorNewtonRaphson] bench.
}
 #(
        '42,300 per second. 23.6 microseconds per run.'
        '8,240 per second. 121 microseconds per run.')
       
| x |
x := 50000 factorial - 10000 factorial + 5000 factorial - 1000 factorial + 500 factorial - 100 factorial + 50 factorial - 10 factorial + 5 factorial - 1.
{
        [x printString] bench.
}
 #('1.29 per second. 778 milliseconds per run.') - before
 #('5.12 per second. 195 milliseconds per run.') - after

=============== Diff against Kernel-fn.1216 ===============

Item was added:
+ ----- Method: Integer>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was added:
+ ----- Method: Integer>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was changed:
  ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ ^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
- "Answer with an array of (quotient, remainder)."
- | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
- <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
- arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
- "TFEI added this line"
- l := self digitLength - arg digitLength + 1.
- l <= 0 ifTrue: [^ Array with: 0 with: self].
- "shortcut against #highBit"
- d := 8 - arg lastDigit highBitOfByte.
- div := arg digitLshift: d.
- divDigitLength := div digitLength + 1.
- div := div growto: divDigitLength.
- "shifts so high order word is >=128"
- rem := self digitLshift: d.
- rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
- remDigitLength := rem digitLength.
- "makes a copy and shifts"
- quo := Integer new: l neg: ng.
- dl := divDigitLength - 1.
- "Last actual byte of data"
- ql := l.
- dh := div digitAt: dl.
- dnh := dl = 1
- ifTrue: [0]
- ifFalse: [div digitAt: dl - 1].
- 1 to: ql do:
- [:k |
- "maintain quo*arg+rem=self"
- "Estimate rem/div by dividing the leading to bytes of rem by dh."
- "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
- j := remDigitLength + 1 - k.
- "r1 := rem digitAt: j."
- (rem digitAt: j)
- = dh
- ifTrue: [qhi := qlo := 15
- "i.e. q=255"]
- ifFalse:
- ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
- Note that r1,r2 are bytes, not nibbles.  
- Be careful not to generate intermediate results exceeding 13  
- bits."
- "r2 := (rem digitAt: j - 1)."
- t := ((rem digitAt: j)
- bitShift: 4)
- + ((rem digitAt: j - 1)
- bitShift: -4).
- qhi := t // dh.
- t := (t \\ dh bitShift: 4)
- + ((rem digitAt: j - 1)
- bitAnd: 15).
- qlo := t // dh.
- t := t \\ dh.
- "Next compute (hi,lo) := q*dnh"
- hi := qhi * dnh.
- lo := qlo * dnh + ((hi bitAnd: 15)
- bitShift: 4).
- hi := (hi bitShift: -4)
- + (lo bitShift: -8).
- lo := lo bitAnd: 255.
- "Correct overestimate of q.  
- Max of 2 iterations through loop -- see Knuth vol. 2"
- r3 := j < 3
- ifTrue: [0]
- ifFalse: [rem digitAt: j - 2].
- [(t < hi
- or: [t = hi and: [r3 < lo]])
- and:
- ["i.e. (t,r3) < (hi,lo)"
- qlo := qlo - 1.
- lo := lo - dnh.
- lo < 0
- ifTrue:
- [hi := hi - 1.
- lo := lo + 256].
- hi >= dh]]
- whileTrue: [hi := hi - dh].
- qlo < 0
- ifTrue:
- [qhi := qhi - 1.
- qlo := qlo + 16]].
- "Subtract q*div from rem"
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- hi := (div digitAt: i)
- * qhi.
- lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
- bitShift: 4) - ((div digitAt: i)
- * qlo).
- rem digitAt: l put: lo - (lo // 256 * 256).
- "sign-tolerant form of (lo bitAnd: 255)"
- a := lo // 256 - (hi bitShift: -4).
- l := l + 1].
- a < 0
- ifTrue:
- ["Add div back into rem, decrease q by 1"
- qlo := qlo - 1.
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- a := (a bitShift: -8)
- + (rem digitAt: l) + (div digitAt: i).
- rem digitAt: l put: (a bitAnd: 255).
- l := l + 1]].
- quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
- + qlo].
- rem := rem
- digitRshift: d
- bytes: 0
- lookfirst: dl.
- ^ Array with: quo with: rem!

Item was added:
+ ----- Method: Integer>>fastMultiply: (in category 'arithmetic') -----
+ fastMultiply: anInteger
+ ^self * anInteger!

Item was added:
+ ----- Method: Integer>>karatsubaTimes: (in category 'arithmetic') -----
+ karatsubaTimes: anInteger
+ "Eventually use Karatsuba algorithm to perform the multiplication.
+ This is efficient only for large integers (see subclass).
+ By default, use regular multiplication."
+
+ ^self * anInteger!

Item was added:
+ ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
+ primDigitDiv: arg neg: ng
+ "Answer with an array of (quotient, remainder)."
+ | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+ <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+ arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+ "TFEI added this line"
+ l := self digitLength - arg digitLength + 1.
+ l <= 0 ifTrue: [^ Array with: 0 with: self].
+ "shortcut against #highBit"
+ d := 8 - arg lastDigit highBitOfByte.
+ div := arg digitLshift: d.
+ divDigitLength := div digitLength + 1.
+ div := div growto: divDigitLength.
+ "shifts so high order word is >=128"
+ rem := self digitLshift: d.
+ rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+ remDigitLength := rem digitLength.
+ "makes a copy and shifts"
+ quo := Integer new: l neg: ng.
+ dl := divDigitLength - 1.
+ "Last actual byte of data"
+ ql := l.
+ dh := div digitAt: dl.
+ dnh := dl = 1
+ ifTrue: [0]
+ ifFalse: [div digitAt: dl - 1].
+ 1 to: ql do:
+ [:k |
+ "maintain quo*arg+rem=self"
+ "Estimate rem/div by dividing the leading to bytes of rem by dh."
+ "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+ j := remDigitLength + 1 - k.
+ "r1 := rem digitAt: j."
+ (rem digitAt: j)
+ = dh
+ ifTrue: [qhi := qlo := 15
+ "i.e. q=255"]
+ ifFalse:
+ ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
+ Note that r1,r2 are bytes, not nibbles.  
+ Be careful not to generate intermediate results exceeding 13  
+ bits."
+ "r2 := (rem digitAt: j - 1)."
+ t := ((rem digitAt: j)
+ bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitShift: -4).
+ qhi := t // dh.
+ t := (t \\ dh bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitAnd: 15).
+ qlo := t // dh.
+ t := t \\ dh.
+ "Next compute (hi,lo) := q*dnh"
+ hi := qhi * dnh.
+ lo := qlo * dnh + ((hi bitAnd: 15)
+ bitShift: 4).
+ hi := (hi bitShift: -4)
+ + (lo bitShift: -8).
+ lo := lo bitAnd: 255.
+ "Correct overestimate of q.  
+ Max of 2 iterations through loop -- see Knuth vol. 2"
+ r3 := j < 3
+ ifTrue: [0]
+ ifFalse: [rem digitAt: j - 2].
+ [(t < hi
+ or: [t = hi and: [r3 < lo]])
+ and:
+ ["i.e. (t,r3) < (hi,lo)"
+ qlo := qlo - 1.
+ lo := lo - dnh.
+ lo < 0
+ ifTrue:
+ [hi := hi - 1.
+ lo := lo + 256].
+ hi >= dh]]
+ whileTrue: [hi := hi - dh].
+ qlo < 0
+ ifTrue:
+ [qhi := qhi - 1.
+ qlo := qlo + 16]].
+ "Subtract q*div from rem"
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ hi := (div digitAt: i)
+ * qhi.
+ lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+ bitShift: 4) - ((div digitAt: i)
+ * qlo).
+ rem digitAt: l put: lo - (lo // 256 * 256).
+ "sign-tolerant form of (lo bitAnd: 255)"
+ a := lo // 256 - (hi bitShift: -4).
+ l := l + 1].
+ a < 0
+ ifTrue:
+ ["Add div back into rem, decrease q by 1"
+ qlo := qlo - 1.
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ a := (a bitShift: -8)
+ + (rem digitAt: l) + (div digitAt: i).
+ rem digitAt: l put: (a bitAnd: 255).
+ l := l + 1]].
+ quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ + qlo].
+ rem := rem
+ digitRshift: d
+ bytes: 0
+ lookfirst: dl.
+ ^ Array with: quo with: rem!

Item was changed:
  ----- Method: Integer>>sqrtFloor (in category 'mathematical functions') -----
  sqrtFloor
+ "Return the integer part of the square root of self
+ Assume self >= 0
+ The following invariants apply:
+ 1) self sqrtFloor squared <= self
+ 2) (self sqrtFloor + 1) squared > self"
- "Return the integer part of the square root of self"
 
+ ^self sqrtFloorNewtonRaphson!
- | guess delta |
- guess := 1 bitShift: self highBit + 1 // 2.
- [
- delta := guess squared - self // (guess bitShift: 1).
- delta = 0 ] whileFalse: [
- guess := guess - delta ].
- ^guess - 1!

Item was added:
+ ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'mathematical functions') -----
+ sqrtFloorNewtonRaphson
+ "Return the integer part of the square root of self.
+ Use a Newton-Raphson iteration since it converges quadratically
+ (twice more bits of precision at each iteration)"
+
+ | guess delta |
+ guess := 1 bitShift: self highBit + 1 // 2.
+ [
+ delta := guess squared - self // (guess bitShift: 1).
+ delta = 0 ] whileFalse: [
+ guess := guess - delta ].
+ ^guess - 1!

Item was added:
+ ----- Method: Integer>>sqrtRem (in category 'mathematical functions') -----
+ sqrtRem
+ "Return an array with floor sqrt and sqrt remainder.
+ Assume self >= 0.
+ The following invariants apply:
+ 1) self  sqrtRem first squared <= self
+ 2) (self sqrtRem first + 1) squared > self
+ 3) self sqrtRem first squared + self sqrtRem last = self"
+
+ | s |
+ s := self sqrtFloorNewtonRaphson.
+ ^{ s. self - s squared } !

Item was added:
+ ----- Method: Integer>>toom3Times: (in category 'arithmetic') -----
+ toom3Times: anInteger
+ ^self * anInteger!

Item was added:
+ ----- Method: LargePositiveInteger>>butLowestNDigits: (in category 'private') -----
+ butLowestNDigits: N
+ "make a new integer removing N least significant digits of self."
+
+ ^self bitShiftMagnitude: -8 * N!

Item was added:
+ ----- Method: LargePositiveInteger>>copyDigitsFrom:to: (in category 'private') -----
+ copyDigitsFrom: start to: stop
+ "Make a new integer keeping only some digits of self."
+
+ | len slice |
+ start > 0 ifFalse: [^self error: 'start index should be at least 1'].
+ len := self digitLength.
+ (start > len or: [start > stop]) ifTrue: [^0].
+ stop >= len
+ ifTrue: [start = 1 ifTrue: [^self].
+ len := len - start + 1]
+ ifFalse: [len := stop - start + 1].
+ slice := self class new: len.
+ slice replaceFrom: 1 to: len with: self startingAt: start.
+ ^slice normalize!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide a two limbs receiver by 1 limb dividend
+ Each limb is decomposed in two halves of p bytes (8*p bits)
+ so as to continue the recursion"
+
+ | p qr1 qr2 |
+ p := anInteger digitLength + 1 bitShift: -1.
+ p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
+ qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
+ qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
+ qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
+ ^qr2!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
+ Each limb is made of p bytes (8*p bits).
+ This step transforms the division problem into multiplication
+ It must use the fastMultiply: to be worth the overhead costs."
+
+ | a2 b1 d p q qr r |
+ p := anInteger digitLength + 1 bitShift: -1.
+ (a2 := self butLowestNDigits: 2*p)
+ < (b1 := anInteger butLowestNDigits: p)
+ ifTrue:
+ [qr := (self butLowestNDigits: p) digitDiv21: b1.
+ q := qr first.
+ r := qr last]
+ ifFalse:
+ [q := (1 bitShift: 8*p) - 1.
+ r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ d := q fastMultiply: (anInteger lowestNDigits: p).
+ r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
+ [r < 0]
+ whileTrue:
+ [q := q - 1.
+ r := r + anInteger].
+ ^Array with: q with: r
+ !

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ "If length is worth, engage a recursive divide and conquer strategy"
+ | qr |
+ (anInteger digitLength <= 256
+ or: [self digitLength <= anInteger digitLength])
+ ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
+ qr := self abs recursiveDigitDiv: anInteger abs.
+ self negative ifTrue: [qr at: 2 put: qr second negated].
+ aBoolean ifTrue: [qr at: 1 put: qr first negated].
+ ^qr!

Item was added:
+ ----- Method: LargePositiveInteger>>fastMultiply: (in category 'arithmetic') -----
+ fastMultiply: anInteger
+ "Eventually use Karatsuba or 3-way Toom-Cook algorithm to perform the multiplication"
+
+ | xLen |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > anInteger digitLength
+ ifTrue: [^ anInteger fastMultiply: self ].
+
+ "If too short to be worth, fallback to naive algorithm"
+ (xLen >= 600) ifFalse: [^self * anInteger].
+ (xLen >= 6000) ifFalse: [^self karatsubaTimes: anInteger].
+ ^self toom3Times: anInteger!

Item was added:
+ ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'arithmetic') -----
+ karatsubaTimes: anInteger
+ "Eventually use Karatsuba algorithm to perform the multiplication
+ The Karatsuba algorithm divide number in two smaller parts.
+ Then it uses tricks to perform only 3 multiplications.
+ See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
+
+ | half xHigh xLow yHigh yLow low high mid xLen yLen |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > (yLen := anInteger digitLength)
+ ifTrue: [^ anInteger karatsubaTimes: self ].
+
+ "If too short to be worth, fallback to naive algorithm"
+ (xLen >= 600) ifFalse: [^self * anInteger].
+    
+ "Seek for integers of about the same length, else split the long one"
+ yLen >= (7 * xLen bitShift: -2)
+ ifTrue:
+ [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
+ self karatsubaTimes:
+ (anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]].
+
+ "At this point, lengths are similar, divide each integer in two halves"
+ half := (yLen + 1 bitShift: -1)  bitClear: 2r11.
+ xLow := self lowestNDigits: half.
+ xHigh := self butLowestNDigits: half.
+ yLow := anInteger lowestNDigits: half.
+ yHigh := anInteger butLowestNDigits: half.
+
+ "Karatsuba trick: perform with 3 multiplications instead of 4"
+ low := xLow karatsubaTimes: yLow.
+ high := xHigh karatsubaTimes: yHigh.
+ mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh).
+
+ "Sum the parts of decomposition"
+ ^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half)
+
+ "
+ | a b |
+ a := 650 factorial-1.
+ b := 700 factorial-1.
+ {a digitLength. b digitLength}.
+ self assert: (a karatsubaTimes: b) - (a * b) = 0.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a * b]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>lowestNDigits: (in category 'private') -----
+ lowestNDigits: N
+ "Make a new integer keeping only N least significant digits of self"
+
+ | low |
+ N >= self digitLength ifTrue: [^self].
+ low := self class new: N.
+ low replaceFrom: 1 to: N with: self startingAt: 1.
+ ^low normalize!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
  printOn: aStream base: b
  "Append a representation of this number in base b on aStream.
  In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
 
+ | halfDigits halfPower head tail nDigitsUnderestimate qr |
- | halfDigits halfPower head tail nDigitsUnderestimate |
  "Don't engage any arithmetic if not normalized"
  (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].
 
  nDigitsUnderestimate := b = 10
  ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
  ifFalse: [self highBit quo: b highBit].
 
  "splitting digits with a whole power of two is more efficient"
  halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
 
  halfDigits <= 1
  ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
  ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
 
  "Separate in two halves, head and tail"
  halfPower := b raisedToInteger: halfDigits.
+ qr := self digitDiv: halfPower neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: halfPower.
- tail := self - (head * halfPower).
 
  "print head"
  head printOn: aStream base: b.
 
  "print tail without the overhead to count the digits"
  tail printOn: aStream base: b nDigits: halfDigits!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
  printOn: aStream base: b nDigits: n
  "Append a representation of this number in base b on aStream using n digits.
  In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
  Should be invoked with: 0 <= self < (b raisedToInteger: n)"
 
+ | halfPower half head tail qr |
- | halfPower half head tail |
  n <= 1 ifTrue: [
  n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].
 
  "Note: this is to stop an infinite loop if one ever attempts to print with a huge base
  This can happen because choice was to not hardcode any limit for base b
  We let Character>>#digitValue: fail"
  ^aStream nextPut: (Character digitValue: self) ].
  halfPower := n bitShift: -1.
  half := b raisedToInteger: halfPower.
+ qr := self digitDiv: half neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: half.
- tail := self - (head * half).
  head printOn: aStream base: b nDigits: n - halfPower.
  tail printOn: aStream base: b nDigits: halfPower!

Item was added:
+ ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
+ recursiveDigitDiv: anInteger
+ "This is the recursive division algorithm from Burnikel - Ziegler
+ See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+ Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+ https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+
+ | s m t a b z qr q i |
+ "round digits up to next power of 2"
+ s := anInteger digitLength.
+ m := 1 bitShift: (s - 1) highBit.
+ "shift so that leading bit of leading byte be 1, and digitLength power of two"
+ s := m * 8 - anInteger highBit.
+ a := self bitShift: s.
+ b := anInteger bitShift: s.
+
+ "Decompose a into t limbs - each limb have m bytes
+ choose t such that leading bit of leading limb of a be 0"
+ t := (a highBit + 1 / (m * 8)) ceiling.
+ z := a butLowestNDigits: t - 2 * m.
+ i := t - 2.
+ q := 0.
+ "and do a division of two limb by 1 limb b for each pair of limb of a"
+ [qr := z digitDiv21: b.
+ q := (q bitShift: 8*m) + qr first. "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
+ (i := i - 1) >= 0] whileTrue:
+ [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+ ^Array with: q with: (qr last bitShift: s negated)!

Item was added:
+ ----- Method: LargePositiveInteger>>sqrtFloor (in category 'mathematical functions') -----
+ sqrtFloor
+ "Like super, but use a faster divide and conquer strategy if it's worth"
+
+ self digitLength >= 64 ifFalse: [^self sqrtFloorNewtonRaphson].
+ ^self sqrtRem first!

Item was added:
+ ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
+ sqrtRem
+ "Like super, but use a divide and conquer method to perform this operation.
+ See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
+ https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
+
+ | n  qr q s r sr high mid low |
+ n := self digitLength bitShift: -2.
+ n >= 16 ifFalse: [^super sqrtRem].
+ high := self butLowestNDigits: n * 2.
+ mid := self copyDigitsFrom: n + 1 to: n * 2.
+ low := self lowestNDigits: n.
+
+ sr := high sqrtRem.
+ qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+ q := qr first normalize.
+ s := (sr first bitShift: 8 * n) + q.
+ r := (qr last normalize bitShift: 8 * n) + low - q squared.
+ r negative
+ ifTrue:
+ [r := (s bitShift: 1) + r - 1.
+ s := s - 1].
+ sr at: 1 put: s; at: 2 put: r.
+ ^sr
+ !

Item was added:
+ ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
+ squared
+ "Eventually use a divide and conquer algorithm to perform the multiplication"
+
+ (self digitLength >= 400) ifFalse: [^self * self].
+ (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
+ ^self squaredToom4!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredKaratsuba (in category 'mathematical functions') -----
+ squaredKaratsuba
+ "Use a divide and conquer algorithm to perform the multiplication.
+ Split in two parts like Karatsuba, but economize 2 additions by using asymetrical product."
+
+ | half xHigh xLow low high mid |
+
+ "Divide digits in two halves"
+ half := self digitLength + 1 // 2 bitClear: 2r11.
+ xLow := self lowestNDigits: half.
+ xHigh := self butLowestNDigits: half.
+
+ "eventually use karatsuba"
+ low := xLow squared.
+ high := xHigh squared.
+ mid := xLow karatsubaTimes: xHigh.
+
+ "Sum the parts of decomposition"
+ ^low + (mid bitShift: 8*half+1) + (high bitShift: 16*half)
+
+ "
+ | a |
+ a := 440 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredKaratsuba = 0.
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a squaredKaratsuba]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a * a]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom3 (in category 'mathematical functions') -----
+ squaredToom3
+ "Use a 3-way Toom-Cook divide and conquer algorithm to perform the multiplication"
+
+ | third x0 x1 x2 x20 z0 z1 z2 z3 z4 |
+ "divide in 3 parts"
+ third := self digitLength + 2 // 3 bitClear: 2r11.
+ x2 := self butLowestNDigits: third * 2.
+ x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ x0 := self lowestNDigits: third.
+
+ "Toom-3 trick: 5 multiplications instead of 9"
+ z0 := x0 squared.
+ z4 := x2 squared.
+ x20 := x2 + x0.
+ z1 := (x20 + x1) squared.
+ x20 := x20 - x1.
+ z2 := x20 squared.
+ z3 := ((x20 + x2 bitShift: 1) - x0) squared.
+
+ "Sum the parts of decomposition"
+ z3 := z3 - z1 quo: 3.
+ z1 := z1 - z2 bitShift: -1.
+ z2 := z2 - z0.
+
+ z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ z2 := z2 + z1 - z4.
+ z1 := z1 - z3.
+ ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
+
+ "
+ | a |
+ a := 1400 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredToom3 = 0.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom3]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
+ squaredToom4
+ "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+ See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+ https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+
+ | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+ "divide in 4 parts"
+ p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+ a3 := self butLowestNDigits: p * 3.
+ a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+ a1 := self copyDigitsFrom: p + 1 to: p * 2.
+ a0 := self lowestNDigits: p.
+
+ "Toom-4 trick: 7 multiplications instead of 16"
+ a02 := a0 - a2.
+ a13 := a1 - a3.
+ s0 := a0 squared.
+ s1 := (a0 fastMultiply: a1) bitShift: 1.
+ s2 := (a02 + a13) fastMultiply: (a02 - a13).
+ s3 := ((a0 + a1) + (a2 + a3)) squared.
+ s4 := (a02 fastMultiply: a13) bitShift: 1.
+ s5 := (a3 fastMultiply: a2) bitShift: 1.
+ s6 := a3 squared.
+
+ "Interpolation"
+ t2 := s1 + s5.
+ t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+ s3 := t2 - s4.
+ s4 := t3 - s0.
+ s2 := t3 - s2 - s6.
+
+ "Sum the parts of decomposition"
+ ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+ +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>toom3Times: (in category 'arithmetic') -----
+ toom3Times: anInteger
+ "Eventually use a variant of Toom-Cooke for performing multiplication.
+ Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
+ It divides operands in n parts instead of 2.
+ See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
+ Here we divide each operand in 3 parts, thus the name Toom-3.
+ And use a Bodrato-Zanoni variant for the choice of interpolation points and matrix inversion
+ See What about Toom-Cook matrices optimality? - Marco Bodrato, Alberto Zanoni - Oct. 2006
+ http://www.bodrato.it/papers/WhatAboutToomCookMatricesOptimality.pdf"
+
+ | third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 |
+ "arrange to have the receiver be the shorter"
+ (xLen := self digitLength) > (yLen := anInteger digitLength)
+ ifTrue: [^anInteger toom3Times: self ].
+
+ "If too short to be worth, fallback to Karatsuba algorithm"
+ (xLen > 6000) ifFalse: [^self karatsubaTimes: anInteger].
+
+ "Seek for well balanced integers"
+ yLen > (5 * xLen bitShift: -2)
+ ifTrue: [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
+ self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]].
+
+ "At this point, lengths are well balanced, divide in 3 parts"
+ third := yLen + 2 // 3 bitClear: 2r11.
+ x2 := self butLowestNDigits: third * 2.
+ x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ x0 := self lowestNDigits: third.
+ y2 := anInteger butLowestNDigits: third * 2.
+ y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
+ y0 := anInteger lowestNDigits: third.
+
+ "Toom-3 trick: 5 multiplications instead of 9"
+ z0 := x0 toom3Times: y0.
+ z4 := x2 toom3Times: y2.
+ x20 := x2 + x0.
+ y20 := y2 + y0.
+ z1 := x20 + x1 toom3Times: y20 + y1.
+ x20 := x20 - x1.
+ y20 := y20 - y1.
+ z2 := x20 toom3Times: y20.
+ z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0.
+
+ "Sum the parts of decomposition"
+ z3 := z3 - z1 quo: 3.
+ z1 := z1 - z2 bitShift: -1.
+ z2 := z2 - z0.
+
+ z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ z2 := z2 + z1 - z4.
+ z1 := z1 - z3.
+ ^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
+
+ "
+ | a b |
+ a :=5000 factorial - 1.
+ b := 6000 factorial - 1.
+ a digitLength min: b digitLength.
+ a digitLength max: b digitLength.
+ (a toom3Times: b) = (a * b) ifFalse: [self error].
+ [Smalltalk garbageCollect.
+ [300 timesRepeat: [a toom3Times: b]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat
+ "!

Item was changed:
  ----- Method: Number>>raisedToInteger: (in category 'mathematical functions') -----
  raisedToInteger: anInteger
  "The 0 raisedToInteger: 0 is an special case. In some contexts must be 1 and in others must
  be handled as an indeterminate form.
  I take the first context because that's the way that was previously handled.
  Maybe further discussion is required on this topic."
 
  | bitProbe result |
  anInteger negative ifTrue: [^(self raisedToInteger: anInteger negated) reciprocal].
  bitProbe := 1 bitShift: anInteger highBit - 1.
  result := self class one.
  [
  (anInteger bitAnd: bitProbe) > 0 ifTrue: [ result := result * self ].
  (bitProbe := bitProbe bitShift: -1) > 0 ]
+ whileTrue: [ result := result squared ].
- whileTrue: [ result := result * result ].
  ^result!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Kernel-nice.1220.mcz

David T. Lewis
On Sat, Apr 27, 2019 at 12:36:17PM +0000, [hidden email] wrote:
> Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
> http://source.squeak.org/trunk/Kernel-nice.1220.mcz
>
> ==================== Summary ====================
>
> Name: Kernel-nice.1220
> Author: nice
> Time: 27 April 2019, 2:36:07.52994 pm

>
> Item was added:
> + ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
> + recursiveDigitDiv: anInteger
> + "This is the recursive division algorithm from Burnikel - Ziegler
> + See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
> + Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
> + https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
> +

It is really good to see method comments like this when browsing. It
explains what the method is doing, and now I have opened the link to
read the original publication.

Thank you Nicolas.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Kernel-nice.1220.mcz

timrowledge


> On 2019-04-27, at 10:13 AM, David T. Lewis <[hidden email]> wrote:
>
> On Sat, Apr 27, 2019 at 12:36:17PM +0000, [hidden email] wrote:
>> Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
>> http://source.squeak.org/trunk/Kernel-nice.1220.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Kernel-nice.1220
>> Author: nice
>> Time: 27 April 2019, 2:36:07.52994 pm
>
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
>> + recursiveDigitDiv: anInteger
>> + "This is the recursive division algorithm from Burnikel - Ziegler
>> + See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
>> + Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
>> + https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
>> +
>
> It is really good to see method comments like this when browsing. It
> explains what the method is doing, and now I have opened the link to
> read the original publication.
>
> Thank you Nicolas.

Plus several million on that.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Cap'n!  The spellchecker kinna take this abuse!