The Inbox: Kernel-nice.1218.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Kernel-nice.1218.mcz

commits-2
Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1218.mcz

==================== Summary ====================

Name: Kernel-nice.1218
Author: nice
Time: 27 April 2019, 10:41:24.794539 am
UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
Ancestors: Kernel-nice.1217

Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.

This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf 

Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.

Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).

=============== Diff against Kernel-nice.1217 ===============

Item was added:
+ ----- Method: Integer>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was added:
+ ----- Method: Integer>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was changed:
  ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ ^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
- "Answer with an array of (quotient, remainder)."
- | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
- <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
- arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
- "TFEI added this line"
- l := self digitLength - arg digitLength + 1.
- l <= 0 ifTrue: [^ Array with: 0 with: self].
- "shortcut against #highBit"
- d := 8 - arg lastDigit highBitOfByte.
- div := arg digitLshift: d.
- divDigitLength := div digitLength + 1.
- div := div growto: divDigitLength.
- "shifts so high order word is >=128"
- rem := self digitLshift: d.
- rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
- remDigitLength := rem digitLength.
- "makes a copy and shifts"
- quo := Integer new: l neg: ng.
- dl := divDigitLength - 1.
- "Last actual byte of data"
- ql := l.
- dh := div digitAt: dl.
- dnh := dl = 1
- ifTrue: [0]
- ifFalse: [div digitAt: dl - 1].
- 1 to: ql do:
- [:k |
- "maintain quo*arg+rem=self"
- "Estimate rem/div by dividing the leading to bytes of rem by dh."
- "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
- j := remDigitLength + 1 - k.
- "r1 := rem digitAt: j."
- (rem digitAt: j)
- = dh
- ifTrue: [qhi := qlo := 15
- "i.e. q=255"]
- ifFalse:
- ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
- Note that r1,r2 are bytes, not nibbles.  
- Be careful not to generate intermediate results exceeding 13  
- bits."
- "r2 := (rem digitAt: j - 1)."
- t := ((rem digitAt: j)
- bitShift: 4)
- + ((rem digitAt: j - 1)
- bitShift: -4).
- qhi := t // dh.
- t := (t \\ dh bitShift: 4)
- + ((rem digitAt: j - 1)
- bitAnd: 15).
- qlo := t // dh.
- t := t \\ dh.
- "Next compute (hi,lo) := q*dnh"
- hi := qhi * dnh.
- lo := qlo * dnh + ((hi bitAnd: 15)
- bitShift: 4).
- hi := (hi bitShift: -4)
- + (lo bitShift: -8).
- lo := lo bitAnd: 255.
- "Correct overestimate of q.  
- Max of 2 iterations through loop -- see Knuth vol. 2"
- r3 := j < 3
- ifTrue: [0]
- ifFalse: [rem digitAt: j - 2].
- [(t < hi
- or: [t = hi and: [r3 < lo]])
- and:
- ["i.e. (t,r3) < (hi,lo)"
- qlo := qlo - 1.
- lo := lo - dnh.
- lo < 0
- ifTrue:
- [hi := hi - 1.
- lo := lo + 256].
- hi >= dh]]
- whileTrue: [hi := hi - dh].
- qlo < 0
- ifTrue:
- [qhi := qhi - 1.
- qlo := qlo + 16]].
- "Subtract q*div from rem"
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- hi := (div digitAt: i)
- * qhi.
- lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
- bitShift: 4) - ((div digitAt: i)
- * qlo).
- rem digitAt: l put: lo - (lo // 256 * 256).
- "sign-tolerant form of (lo bitAnd: 255)"
- a := lo // 256 - (hi bitShift: -4).
- l := l + 1].
- a < 0
- ifTrue:
- ["Add div back into rem, decrease q by 1"
- qlo := qlo - 1.
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- a := (a bitShift: -8)
- + (rem digitAt: l) + (div digitAt: i).
- rem digitAt: l put: (a bitAnd: 255).
- l := l + 1]].
- quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
- + qlo].
- rem := rem
- digitRshift: d
- bytes: 0
- lookfirst: dl.
- ^ Array with: quo with: rem!

Item was added:
+ ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
+ primDigitDiv: arg neg: ng
+ "Answer with an array of (quotient, remainder)."
+ | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+ <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+ arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+ "TFEI added this line"
+ l := self digitLength - arg digitLength + 1.
+ l <= 0 ifTrue: [^ Array with: 0 with: self].
+ "shortcut against #highBit"
+ d := 8 - arg lastDigit highBitOfByte.
+ div := arg digitLshift: d.
+ divDigitLength := div digitLength + 1.
+ div := div growto: divDigitLength.
+ "shifts so high order word is >=128"
+ rem := self digitLshift: d.
+ rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+ remDigitLength := rem digitLength.
+ "makes a copy and shifts"
+ quo := Integer new: l neg: ng.
+ dl := divDigitLength - 1.
+ "Last actual byte of data"
+ ql := l.
+ dh := div digitAt: dl.
+ dnh := dl = 1
+ ifTrue: [0]
+ ifFalse: [div digitAt: dl - 1].
+ 1 to: ql do:
+ [:k |
+ "maintain quo*arg+rem=self"
+ "Estimate rem/div by dividing the leading to bytes of rem by dh."
+ "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+ j := remDigitLength + 1 - k.
+ "r1 := rem digitAt: j."
+ (rem digitAt: j)
+ = dh
+ ifTrue: [qhi := qlo := 15
+ "i.e. q=255"]
+ ifFalse:
+ ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
+ Note that r1,r2 are bytes, not nibbles.  
+ Be careful not to generate intermediate results exceeding 13  
+ bits."
+ "r2 := (rem digitAt: j - 1)."
+ t := ((rem digitAt: j)
+ bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitShift: -4).
+ qhi := t // dh.
+ t := (t \\ dh bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitAnd: 15).
+ qlo := t // dh.
+ t := t \\ dh.
+ "Next compute (hi,lo) := q*dnh"
+ hi := qhi * dnh.
+ lo := qlo * dnh + ((hi bitAnd: 15)
+ bitShift: 4).
+ hi := (hi bitShift: -4)
+ + (lo bitShift: -8).
+ lo := lo bitAnd: 255.
+ "Correct overestimate of q.  
+ Max of 2 iterations through loop -- see Knuth vol. 2"
+ r3 := j < 3
+ ifTrue: [0]
+ ifFalse: [rem digitAt: j - 2].
+ [(t < hi
+ or: [t = hi and: [r3 < lo]])
+ and:
+ ["i.e. (t,r3) < (hi,lo)"
+ qlo := qlo - 1.
+ lo := lo - dnh.
+ lo < 0
+ ifTrue:
+ [hi := hi - 1.
+ lo := lo + 256].
+ hi >= dh]]
+ whileTrue: [hi := hi - dh].
+ qlo < 0
+ ifTrue:
+ [qhi := qhi - 1.
+ qlo := qlo + 16]].
+ "Subtract q*div from rem"
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ hi := (div digitAt: i)
+ * qhi.
+ lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+ bitShift: 4) - ((div digitAt: i)
+ * qlo).
+ rem digitAt: l put: lo - (lo // 256 * 256).
+ "sign-tolerant form of (lo bitAnd: 255)"
+ a := lo // 256 - (hi bitShift: -4).
+ l := l + 1].
+ a < 0
+ ifTrue:
+ ["Add div back into rem, decrease q by 1"
+ qlo := qlo - 1.
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ a := (a bitShift: -8)
+ + (rem digitAt: l) + (div digitAt: i).
+ rem digitAt: l put: (a bitAnd: 255).
+ l := l + 1]].
+ quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ + qlo].
+ rem := rem
+ digitRshift: d
+ bytes: 0
+ lookfirst: dl.
+ ^ Array with: quo with: rem!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide a two limbs receiver by 1 limb dividend
+ Each limb is decomposed in two halves of p bytes (8*p bits)
+ so as to continue the recursion"
+
+ | p qr1 qr2 |
+ p := anInteger digitLength + 1 bitShift: -1.
+ p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
+ qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
+ qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
+ qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
+ ^qr2!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
+ Each limb is made of p bytes (8*p bits).
+ This step transforms the division problem into multiplication
+ It must use the fastMultiply: to be worth the overhead costs."
+
+ | a2 b1 d p q qr r |
+ p := anInteger digitLength + 1 bitShift: -1.
+ (a2 := self butLowestNDigits: 2*p)
+ < (b1 := anInteger butLowestNDigits: p)
+ ifTrue:
+ [qr := (self butLowestNDigits: p) digitDiv21: b1.
+ q := qr first.
+ r := qr last]
+ ifFalse:
+ [q := (1 bitShift: 8*p) - 1.
+ r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ d := q fastMultiply: (anInteger lowestNDigits: p).
+ r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
+ [r < 0]
+ whileTrue:
+ [q := q - 1.
+ r := r + anInteger].
+ ^Array with: q with: r
+ !

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ "If length is worth, engage a recursive divide and conquer strategy"
+ | qr |
+ (anInteger digitLength <= 256
+ or: [self digitLength <= anInteger digitLength])
+ ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
+ qr := self abs recursiveDigitDiv: anInteger abs.
+ ^ aBoolean
+ ifTrue: [qr collect: #negated]
+ ifFalse: [qr]!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
  printOn: aStream base: b
  "Append a representation of this number in base b on aStream.
  In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
 
+ | halfDigits halfPower head tail nDigitsUnderestimate qr |
- | halfDigits halfPower head tail nDigitsUnderestimate |
  "Don't engage any arithmetic if not normalized"
  (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].
 
  nDigitsUnderestimate := b = 10
  ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
  ifFalse: [self highBit quo: b highBit].
 
  "splitting digits with a whole power of two is more efficient"
  halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
 
  halfDigits <= 1
  ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
  ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
 
  "Separate in two halves, head and tail"
  halfPower := b raisedToInteger: halfDigits.
+ qr := self digitDiv: halfPower neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: halfPower.
- tail := self - (head * halfPower).
 
  "print head"
  head printOn: aStream base: b.
 
  "print tail without the overhead to count the digits"
  tail printOn: aStream base: b nDigits: halfDigits!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
  printOn: aStream base: b nDigits: n
  "Append a representation of this number in base b on aStream using n digits.
  In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
  Should be invoked with: 0 <= self < (b raisedToInteger: n)"
 
+ | halfPower half head tail qr |
- | halfPower half head tail |
  n <= 1 ifTrue: [
  n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].
 
  "Note: this is to stop an infinite loop if one ever attempts to print with a huge base
  This can happen because choice was to not hardcode any limit for base b
  We let Character>>#digitValue: fail"
  ^aStream nextPut: (Character digitValue: self) ].
  halfPower := n bitShift: -1.
  half := b raisedToInteger: halfPower.
+ qr := self digitDiv: half neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: half.
- tail := self - (head * half).
  head printOn: aStream base: b nDigits: n - halfPower.
  tail printOn: aStream base: b nDigits: halfPower!

Item was added:
+ ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
+ recursiveDigitDiv: anInteger
+ "This is the recursive division algorithm from Burnikel - Ziegler
+ See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+ Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+ https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+
+ | s m t a b z qr q i |
+ "round digits up to next power of 2"
+ s := anInteger digitLength.
+ m := 1 bitShift: (s - 1) highBit.
+ "shift so that leading bit of leading byte be 1, and digitLength power of two"
+ s := m * 8 - anInteger highBit.
+ a := self bitShift: s.
+ b := anInteger bitShift: s.
+
+ "Decompose a into t limbs - each limb have m bytes
+ choose t such that leading bit of leading limb of a be 0"
+ t := (a highBit + 1 / (m * 8)) ceiling.
+ z := a butLowestNDigits: t - 2 * m.
+ i := t - 2.
+ q := 0.
+ "and do a division of two limb by 1 limb b for each pair of limb of a"
+ [qr := z digitDiv21: b.
+ q := (q bitShift: 8*m) + qr first. "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
+ (i := i - 1) >= 0] whileTrue:
+ [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+ ^Array with: q with: (qr last bitShift: s negated)!

Item was changed:
  ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
  sqrtRem
  "Like super, but use a divide and conquer method to perform this operation.
  See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
  https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
 
+ | n  qr q s r sr high mid low |
- | n  qr s r sr high mid low |
  n := self digitLength bitShift: -2.
  n >= 16 ifFalse: [^super sqrtRem].
  high := self butLowestNDigits: n * 2.
  mid := self copyDigitsFrom: n + 1 to: n * 2.
  low := self lowestNDigits: n.
 
  sr := high sqrtRem.
  qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+ q := qr first normalize.
+ s := (sr first bitShift: 8 * n) + q.
+ r := (qr last normalize bitShift: 8 * n) + low - q squared.
- s := (sr first bitShift: 8 * n) + qr first.
- r := (qr last bitShift: 8 * n) + low - qr first squared.
  r negative
  ifTrue:
  [r := (s bitShift: 1) + r - 1.
  s := s - 1].
  sr at: 1 put: s; at: 2 put: r.
  ^sr
  !

Item was changed:
  ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
  squared
  "Eventually use a divide and conquer algorithm to perform the multiplication"
 
  (self digitLength >= 400) ifFalse: [^self * self].
+ (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
+ ^self squaredToom4!
- (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
- ^self squaredToom3!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
+ squaredToom4
+ "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+ See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+ https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+
+ | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+ "divide in 4 parts"
+ p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+ a3 := self butLowestNDigits: p * 3.
+ a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+ a1 := self copyDigitsFrom: p + 1 to: p * 2.
+ a0 := self lowestNDigits: p.
+
+ "Toom-4 trick: 7 multiplications instead of 16"
+ a02 := a0 - a2.
+ a13 := a1 - a3.
+ s0 := a0 squared.
+ s1 := (a0 fastMultiply: a1) bitShift: 1.
+ s2 := (a02 + a13) fastMultiply: (a02 - a13).
+ s3 := ((a0 + a1) + (a2 + a3)) squared.
+ s4 := (a02 fastMultiply: a13) bitShift: 1.
+ s5 := (a3 fastMultiply: a2) bitShift: 1.
+ s6 := a3 squared.
+
+ "Interpolation"
+ t2 := s1 + s5.
+ t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+ s3 := t2 - s4.
+ s4 := t3 - s0.
+ s2 := t3 - s2 - s6.
+
+ "Sum the parts of decomposition"
+ ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+ +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Kernel-nice.1218.mcz

Nicolas Cellier
Err, I messed up with the quo/rem signs...
Tests pass, but there are not enough tests!

Le sam. 27 avr. 2019 à 10:41, <[hidden email]> a écrit :
Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1218.mcz

==================== Summary ====================

Name: Kernel-nice.1218
Author: nice
Time: 27 April 2019, 10:41:24.794539 am
UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
Ancestors: Kernel-nice.1217

Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.

This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf

Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.

Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).

=============== Diff against Kernel-nice.1217 ===============

Item was added:
+ ----- Method: Integer>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+       
+       ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was added:
+ ----- Method: Integer>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+       
+       ^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was changed:
  ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+       ^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
-       "Answer with an array of (quotient, remainder)."
-       | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
-       <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
-       arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
-       "TFEI added this line"
-       l := self digitLength - arg digitLength + 1.
-       l <= 0 ifTrue: [^ Array with: 0 with: self].
-       "shortcut against #highBit"
-       d := 8 - arg lastDigit highBitOfByte.
-       div := arg digitLshift: d.
-       divDigitLength := div digitLength + 1.
-       div := div growto: divDigitLength.
-       "shifts so high order word is >=128"
-       rem := self digitLshift: d.
-       rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
-       remDigitLength := rem digitLength.
-       "makes a copy and shifts"
-       quo := Integer new: l neg: ng.
-       dl := divDigitLength - 1.
-       "Last actual byte of data"
-       ql := l.
-       dh := div digitAt: dl.
-       dnh := dl = 1
-                               ifTrue: [0]
-                               ifFalse: [div digitAt: dl - 1].
-       1 to: ql do:
-               [:k |
-               "maintain quo*arg+rem=self"
-               "Estimate rem/div by dividing the leading to bytes of rem by dh."
-               "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
-               j := remDigitLength + 1 - k.
-               "r1 := rem digitAt: j."
-               (rem digitAt: j)
-                       = dh
-                       ifTrue: [qhi := qlo := 15
-                               "i.e. q=255"]
-                       ifFalse:
-                               ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh. 
-                               Note that r1,r2 are bytes, not nibbles. 
-                               Be careful not to generate intermediate results exceeding 13 
-                               bits."
-                               "r2 := (rem digitAt: j - 1)."
-                               t := ((rem digitAt: j)
-                                                       bitShift: 4)
-                                                       + ((rem digitAt: j - 1)
-                                                                       bitShift: -4).
-                               qhi := t // dh.
-                               t := (t \\ dh bitShift: 4)
-                                                       + ((rem digitAt: j - 1)
-                                                                       bitAnd: 15).
-                               qlo := t // dh.
-                               t := t \\ dh.
-                               "Next compute (hi,lo) := q*dnh"
-                               hi := qhi * dnh.
-                               lo := qlo * dnh + ((hi bitAnd: 15)
-                                                               bitShift: 4).
-                               hi := (hi bitShift: -4)
-                                                       + (lo bitShift: -8).
-                               lo := lo bitAnd: 255.
-                               "Correct overestimate of q. 
-                               Max of 2 iterations through loop -- see Knuth vol. 2"
-                               r3 := j < 3
-                                                       ifTrue: [0]
-                                                       ifFalse: [rem digitAt: j - 2].
-                               [(t < hi
-                                       or: [t = hi and: [r3 < lo]])
-                                       and:
-                                               ["i.e. (t,r3) < (hi,lo)"
-                                               qlo := qlo - 1.
-                                               lo := lo - dnh.
-                                               lo < 0
-                                                       ifTrue:
-                                                               [hi := hi - 1.
-                                                               lo := lo + 256].
-                                               hi >= dh]]
-                                       whileTrue: [hi := hi - dh].
-                               qlo < 0
-                                       ifTrue:
-                                               [qhi := qhi - 1.
-                                               qlo := qlo + 16]].
-               "Subtract q*div from rem"
-               l := j - dl.
-               a := 0.
-               1 to: divDigitLength do:
-                       [:i |
-                       hi := (div digitAt: i)
-                                               * qhi.
-                       lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
-                                                       bitShift: 4) - ((div digitAt: i)
-                                                       * qlo).
-                       rem digitAt: l put: lo - (lo // 256 * 256).
-                       "sign-tolerant form of (lo bitAnd: 255)"
-                       a := lo // 256 - (hi bitShift: -4).
-                       l := l + 1].
-               a < 0
-                       ifTrue:
-                               ["Add div back into rem, decrease q by 1"
-                               qlo := qlo - 1.
-                               l := j - dl.
-                               a := 0.
-                               1 to: divDigitLength do:
-                                       [:i |
-                                       a := (a bitShift: -8)
-                                                               + (rem digitAt: l) + (div digitAt: i).
-                                       rem digitAt: l put: (a bitAnd: 255).
-                                       l := l + 1]].
-               quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
-                               + qlo].
-       rem := rem
-                               digitRshift: d
-                               bytes: 0
-                               lookfirst: dl.
-       ^ Array with: quo with: rem!

Item was added:
+ ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
+ primDigitDiv: arg neg: ng
+       "Answer with an array of (quotient, remainder)."
+       | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+       <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+       arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+       "TFEI added this line"
+       l := self digitLength - arg digitLength + 1.
+       l <= 0 ifTrue: [^ Array with: 0 with: self].
+       "shortcut against #highBit"
+       d := 8 - arg lastDigit highBitOfByte.
+       div := arg digitLshift: d.
+       divDigitLength := div digitLength + 1.
+       div := div growto: divDigitLength.
+       "shifts so high order word is >=128"
+       rem := self digitLshift: d.
+       rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+       remDigitLength := rem digitLength.
+       "makes a copy and shifts"
+       quo := Integer new: l neg: ng.
+       dl := divDigitLength - 1.
+       "Last actual byte of data"
+       ql := l.
+       dh := div digitAt: dl.
+       dnh := dl = 1
+                               ifTrue: [0]
+                               ifFalse: [div digitAt: dl - 1].
+       1 to: ql do:
+               [:k |
+               "maintain quo*arg+rem=self"
+               "Estimate rem/div by dividing the leading to bytes of rem by dh."
+               "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+               j := remDigitLength + 1 - k.
+               "r1 := rem digitAt: j."
+               (rem digitAt: j)
+                       = dh
+                       ifTrue: [qhi := qlo := 15
+                               "i.e. q=255"]
+                       ifFalse:
+                               ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh. 
+                               Note that r1,r2 are bytes, not nibbles. 
+                               Be careful not to generate intermediate results exceeding 13 
+                               bits."
+                               "r2 := (rem digitAt: j - 1)."
+                               t := ((rem digitAt: j)
+                                                       bitShift: 4)
+                                                       + ((rem digitAt: j - 1)
+                                                                       bitShift: -4).
+                               qhi := t // dh.
+                               t := (t \\ dh bitShift: 4)
+                                                       + ((rem digitAt: j - 1)
+                                                                       bitAnd: 15).
+                               qlo := t // dh.
+                               t := t \\ dh.
+                               "Next compute (hi,lo) := q*dnh"
+                               hi := qhi * dnh.
+                               lo := qlo * dnh + ((hi bitAnd: 15)
+                                                               bitShift: 4).
+                               hi := (hi bitShift: -4)
+                                                       + (lo bitShift: -8).
+                               lo := lo bitAnd: 255.
+                               "Correct overestimate of q. 
+                               Max of 2 iterations through loop -- see Knuth vol. 2"
+                               r3 := j < 3
+                                                       ifTrue: [0]
+                                                       ifFalse: [rem digitAt: j - 2].
+                               [(t < hi
+                                       or: [t = hi and: [r3 < lo]])
+                                       and:
+                                               ["i.e. (t,r3) < (hi,lo)"
+                                               qlo := qlo - 1.
+                                               lo := lo - dnh.
+                                               lo < 0
+                                                       ifTrue:
+                                                               [hi := hi - 1.
+                                                               lo := lo + 256].
+                                               hi >= dh]]
+                                       whileTrue: [hi := hi - dh].
+                               qlo < 0
+                                       ifTrue:
+                                               [qhi := qhi - 1.
+                                               qlo := qlo + 16]].
+               "Subtract q*div from rem"
+               l := j - dl.
+               a := 0.
+               1 to: divDigitLength do:
+                       [:i |
+                       hi := (div digitAt: i)
+                                               * qhi.
+                       lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+                                                       bitShift: 4) - ((div digitAt: i)
+                                                       * qlo).
+                       rem digitAt: l put: lo - (lo // 256 * 256).
+                       "sign-tolerant form of (lo bitAnd: 255)"
+                       a := lo // 256 - (hi bitShift: -4).
+                       l := l + 1].
+               a < 0
+                       ifTrue:
+                               ["Add div back into rem, decrease q by 1"
+                               qlo := qlo - 1.
+                               l := j - dl.
+                               a := 0.
+                               1 to: divDigitLength do:
+                                       [:i |
+                                       a := (a bitShift: -8)
+                                                               + (rem digitAt: l) + (div digitAt: i).
+                                       rem digitAt: l put: (a bitAnd: 255).
+                                       l := l + 1]].
+               quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+                               + qlo].
+       rem := rem
+                               digitRshift: d
+                               bytes: 0
+                               lookfirst: dl.
+       ^ Array with: quo with: rem!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+       "This is part of the recursive division algorithm from Burnikel - Ziegler
+       Divide a two limbs receiver by 1 limb dividend
+       Each limb is decomposed in two halves of p bytes (8*p bits)
+       so as to continue the recursion"
+       
+       | p qr1 qr2 |
+       p := anInteger digitLength + 1 bitShift: -1.
+       p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
+       qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
+       qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
+       qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
+       ^qr2!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+       "This is part of the recursive division algorithm from Burnikel - Ziegler
+       Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
+       Each limb is made of p bytes (8*p bits).
+       This step transforms the division problem into multiplication
+       It must use the fastMultiply: to be worth the overhead costs."
+       
+       | a2 b1 d p q qr r |
+       p := anInteger digitLength + 1 bitShift: -1.
+       (a2 := self butLowestNDigits: 2*p)
+               < (b1 := anInteger butLowestNDigits: p)
+               ifTrue:
+                       [qr := (self butLowestNDigits: p) digitDiv21: b1.
+                       q := qr first.
+                       r := qr last]
+               ifFalse:
+                       [q := (1 bitShift: 8*p) - 1.
+                       r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+       d := q fastMultiply: (anInteger lowestNDigits: p).
+       r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
+       [r < 0]
+               whileTrue:
+                       [q := q - 1.
+                       r := r + anInteger].
+       ^Array with: q with: r
+       !

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+       "If length is worth, engage a recursive divide and conquer strategy"
+       | qr |
+       (anInteger digitLength <= 256
+                       or: [self digitLength <= anInteger digitLength])
+               ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
+       qr := self abs recursiveDigitDiv: anInteger abs.
+       ^ aBoolean
+               ifTrue: [qr collect: #negated]
+               ifFalse: [qr]!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
  printOn: aStream base: b
        "Append a representation of this number in base b on aStream.
        In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."

+       | halfDigits halfPower head tail nDigitsUnderestimate qr |
-       | halfDigits halfPower head tail nDigitsUnderestimate |
        "Don't engage any arithmetic if not normalized"
        (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].

        nDigitsUnderestimate := b = 10
                ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
                ifFalse: [self highBit quo: b highBit].

        "splitting digits with a whole power of two is more efficient"
        halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.

        halfDigits <= 1
                ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
                        ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].

        "Separate in two halves, head and tail"
        halfPower := b raisedToInteger: halfDigits.
+       qr := self digitDiv: halfPower neg: self negative.
+       head := qr first normalize.
+       tail := qr last normalize.
-       head := self quo: halfPower.
-       tail := self - (head * halfPower).

        "print head"
        head printOn: aStream base: b.

        "print tail without the overhead to count the digits"
        tail printOn: aStream base: b nDigits: halfDigits!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
  printOn: aStream base: b nDigits: n
        "Append a representation of this number in base b on aStream using n digits.
        In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
        Should be invoked with: 0 <= self < (b raisedToInteger: n)"

+       | halfPower half head tail qr |
-       | halfPower half head tail |
        n <= 1 ifTrue: [
                n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].

                "Note: this is to stop an infinite loop if one ever attempts to print with a huge base
                This can happen because choice was to not hardcode any limit for base b
                We let Character>>#digitValue: fail"
                ^aStream nextPut: (Character digitValue: self) ].
        halfPower := n bitShift: -1.
        half := b raisedToInteger: halfPower.
+       qr := self digitDiv: half neg: self negative.
+       head := qr first normalize.
+       tail := qr last normalize.
-       head := self quo: half.
-       tail := self - (head * half).
        head printOn: aStream base: b nDigits: n - halfPower.
        tail printOn: aStream base: b nDigits: halfPower!

Item was added:
+ ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
+ recursiveDigitDiv: anInteger
+       "This is the recursive division algorithm from Burnikel - Ziegler
+       See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+       Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+       https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+       
+       | s m t a b z qr q i |
+       "round digits up to next power of 2"
+       s := anInteger digitLength.
+       m := 1 bitShift: (s - 1) highBit.
+       "shift so that leading bit of leading byte be 1, and digitLength power of two"
+       s := m * 8 - anInteger highBit.
+       a := self bitShift: s.
+       b := anInteger bitShift: s.
+       
+       "Decompose a into t limbs - each limb have m bytes
+       choose t such that leading bit of leading limb of a be 0"
+       t := (a highBit + 1 / (m * 8)) ceiling.
+       z := a butLowestNDigits: t - 2 * m.
+       i := t - 2.
+       q := 0.
+       "and do a division of two limb by 1 limb b for each pair of limb of a"
+       [qr := z digitDiv21: b.
+       q := (q bitShift: 8*m) + qr first.      "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
+       (i := i - 1) >= 0] whileTrue:
+               [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+       ^Array with: q with: (qr last bitShift: s negated)!

Item was changed:
  ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
  sqrtRem
        "Like super, but use a divide and conquer method to perform this operation.
        See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
        https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"

+       | n  qr q s r sr high mid low |
-       | n  qr s r sr high mid low |
        n := self digitLength bitShift: -2.
        n >= 16 ifFalse: [^super sqrtRem].
        high := self butLowestNDigits: n * 2.
        mid := self copyDigitsFrom: n + 1 to: n * 2.
        low := self lowestNDigits: n.

        sr := high sqrtRem.
        qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+       q := qr first normalize.
+       s := (sr first bitShift: 8 * n) + q.
+       r := (qr last normalize bitShift: 8 * n) + low - q squared.
-       s := (sr first bitShift: 8 * n) + qr first.
-       r := (qr last bitShift: 8 * n) + low - qr first squared.
        r negative
                ifTrue:
                        [r := (s bitShift: 1) + r - 1.
                        s := s - 1].
        sr at: 1 put: s; at: 2 put: r.
        ^sr
        !

Item was changed:
  ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
  squared
        "Eventually use a divide and conquer algorithm to perform the multiplication"

        (self digitLength >= 400) ifFalse: [^self * self].
+       (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
+       ^self squaredToom4!
-       (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
-       ^self squaredToom3!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
+ squaredToom4
+       "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+       See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+       https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+       
+       | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+       "divide in 4 parts"
+       p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+       a3 := self butLowestNDigits: p * 3.
+       a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+       a1 := self copyDigitsFrom: p + 1 to: p * 2.
+       a0 := self lowestNDigits: p.
+       
+       "Toom-4 trick: 7 multiplications instead of 16"
+       a02 := a0 - a2.
+       a13 := a1 - a3.
+       s0 := a0 squared.
+       s1 := (a0 fastMultiply: a1) bitShift: 1.
+       s2 := (a02 + a13) fastMultiply: (a02 - a13).
+       s3 := ((a0 + a1) + (a2 + a3)) squared.
+       s4 := (a02 fastMultiply: a13) bitShift: 1.
+       s5 := (a3 fastMultiply: a2) bitShift: 1.
+       s6 := a3 squared.
+       
+       "Interpolation"
+       t2 := s1 + s5.
+       t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+       s3 := t2 - s4.
+       s4 := t3 - s0.
+       s2 := t3 - s2 - s6.
+       
+       "Sum the parts of decomposition"
+       ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+       +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+       
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Kernel-nice.1218.mcz

Chris Muller-3
Cool, it should be interesting to see if this will speed up
MaHashIndex test suite.


On Sat, Apr 27, 2019 at 5:05 AM Nicolas Cellier
<[hidden email]> wrote:

>
> Err, I messed up with the quo/rem signs...
> Tests pass, but there are not enough tests!
>
> Le sam. 27 avr. 2019 à 10:41, <[hidden email]> a écrit :
>>
>> Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
>> http://source.squeak.org/inbox/Kernel-nice.1218.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Kernel-nice.1218
>> Author: nice
>> Time: 27 April 2019, 10:41:24.794539 am
>> UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
>> Ancestors: Kernel-nice.1217
>>
>> Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.
>>
>> This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf
>>
>> Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.
>>
>> Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).
>>
>> =============== Diff against Kernel-nice.1217 ===============
>>
>> Item was added:
>> + ----- Method: Integer>>digitDiv21: (in category 'private') -----
>> + digitDiv21: anInteger
>> +
>> +       ^(self digitDiv: anInteger neg: false) collect: #normalize!
>>
>> Item was added:
>> + ----- Method: Integer>>digitDiv32: (in category 'private') -----
>> + digitDiv32: anInteger
>> +
>> +       ^(self digitDiv: anInteger neg: false) collect: #normalize!
>>
>> Item was changed:
>>   ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
>> + digitDiv: anInteger neg: aBoolean
>> +       ^self primDigitDiv: anInteger neg: aBoolean!
>> - digitDiv: arg neg: ng
>> -       "Answer with an array of (quotient, remainder)."
>> -       | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
>> -       <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
>> -       arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
>> -       "TFEI added this line"
>> -       l := self digitLength - arg digitLength + 1.
>> -       l <= 0 ifTrue: [^ Array with: 0 with: self].
>> -       "shortcut against #highBit"
>> -       d := 8 - arg lastDigit highBitOfByte.
>> -       div := arg digitLshift: d.
>> -       divDigitLength := div digitLength + 1.
>> -       div := div growto: divDigitLength.
>> -       "shifts so high order word is >=128"
>> -       rem := self digitLshift: d.
>> -       rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
>> -       remDigitLength := rem digitLength.
>> -       "makes a copy and shifts"
>> -       quo := Integer new: l neg: ng.
>> -       dl := divDigitLength - 1.
>> -       "Last actual byte of data"
>> -       ql := l.
>> -       dh := div digitAt: dl.
>> -       dnh := dl = 1
>> -                               ifTrue: [0]
>> -                               ifFalse: [div digitAt: dl - 1].
>> -       1 to: ql do:
>> -               [:k |
>> -               "maintain quo*arg+rem=self"
>> -               "Estimate rem/div by dividing the leading to bytes of rem by dh."
>> -               "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
>> -               j := remDigitLength + 1 - k.
>> -               "r1 := rem digitAt: j."
>> -               (rem digitAt: j)
>> -                       = dh
>> -                       ifTrue: [qhi := qlo := 15
>> -                               "i.e. q=255"]
>> -                       ifFalse:
>> -                               ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
>> -                               Note that r1,r2 are bytes, not nibbles.
>> -                               Be careful not to generate intermediate results exceeding 13
>> -                               bits."
>> -                               "r2 := (rem digitAt: j - 1)."
>> -                               t := ((rem digitAt: j)
>> -                                                       bitShift: 4)
>> -                                                       + ((rem digitAt: j - 1)
>> -                                                                       bitShift: -4).
>> -                               qhi := t // dh.
>> -                               t := (t \\ dh bitShift: 4)
>> -                                                       + ((rem digitAt: j - 1)
>> -                                                                       bitAnd: 15).
>> -                               qlo := t // dh.
>> -                               t := t \\ dh.
>> -                               "Next compute (hi,lo) := q*dnh"
>> -                               hi := qhi * dnh.
>> -                               lo := qlo * dnh + ((hi bitAnd: 15)
>> -                                                               bitShift: 4).
>> -                               hi := (hi bitShift: -4)
>> -                                                       + (lo bitShift: -8).
>> -                               lo := lo bitAnd: 255.
>> -                               "Correct overestimate of q.
>> -                               Max of 2 iterations through loop -- see Knuth vol. 2"
>> -                               r3 := j < 3
>> -                                                       ifTrue: [0]
>> -                                                       ifFalse: [rem digitAt: j - 2].
>> -                               [(t < hi
>> -                                       or: [t = hi and: [r3 < lo]])
>> -                                       and:
>> -                                               ["i.e. (t,r3) < (hi,lo)"
>> -                                               qlo := qlo - 1.
>> -                                               lo := lo - dnh.
>> -                                               lo < 0
>> -                                                       ifTrue:
>> -                                                               [hi := hi - 1.
>> -                                                               lo := lo + 256].
>> -                                               hi >= dh]]
>> -                                       whileTrue: [hi := hi - dh].
>> -                               qlo < 0
>> -                                       ifTrue:
>> -                                               [qhi := qhi - 1.
>> -                                               qlo := qlo + 16]].
>> -               "Subtract q*div from rem"
>> -               l := j - dl.
>> -               a := 0.
>> -               1 to: divDigitLength do:
>> -                       [:i |
>> -                       hi := (div digitAt: i)
>> -                                               * qhi.
>> -                       lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
>> -                                                       bitShift: 4) - ((div digitAt: i)
>> -                                                       * qlo).
>> -                       rem digitAt: l put: lo - (lo // 256 * 256).
>> -                       "sign-tolerant form of (lo bitAnd: 255)"
>> -                       a := lo // 256 - (hi bitShift: -4).
>> -                       l := l + 1].
>> -               a < 0
>> -                       ifTrue:
>> -                               ["Add div back into rem, decrease q by 1"
>> -                               qlo := qlo - 1.
>> -                               l := j - dl.
>> -                               a := 0.
>> -                               1 to: divDigitLength do:
>> -                                       [:i |
>> -                                       a := (a bitShift: -8)
>> -                                                               + (rem digitAt: l) + (div digitAt: i).
>> -                                       rem digitAt: l put: (a bitAnd: 255).
>> -                                       l := l + 1]].
>> -               quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
>> -                               + qlo].
>> -       rem := rem
>> -                               digitRshift: d
>> -                               bytes: 0
>> -                               lookfirst: dl.
>> -       ^ Array with: quo with: rem!
>>
>> Item was added:
>> + ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
>> + primDigitDiv: arg neg: ng
>> +       "Answer with an array of (quotient, remainder)."
>> +       | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
>> +       <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
>> +       arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
>> +       "TFEI added this line"
>> +       l := self digitLength - arg digitLength + 1.
>> +       l <= 0 ifTrue: [^ Array with: 0 with: self].
>> +       "shortcut against #highBit"
>> +       d := 8 - arg lastDigit highBitOfByte.
>> +       div := arg digitLshift: d.
>> +       divDigitLength := div digitLength + 1.
>> +       div := div growto: divDigitLength.
>> +       "shifts so high order word is >=128"
>> +       rem := self digitLshift: d.
>> +       rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
>> +       remDigitLength := rem digitLength.
>> +       "makes a copy and shifts"
>> +       quo := Integer new: l neg: ng.
>> +       dl := divDigitLength - 1.
>> +       "Last actual byte of data"
>> +       ql := l.
>> +       dh := div digitAt: dl.
>> +       dnh := dl = 1
>> +                               ifTrue: [0]
>> +                               ifFalse: [div digitAt: dl - 1].
>> +       1 to: ql do:
>> +               [:k |
>> +               "maintain quo*arg+rem=self"
>> +               "Estimate rem/div by dividing the leading to bytes of rem by dh."
>> +               "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
>> +               j := remDigitLength + 1 - k.
>> +               "r1 := rem digitAt: j."
>> +               (rem digitAt: j)
>> +                       = dh
>> +                       ifTrue: [qhi := qlo := 15
>> +                               "i.e. q=255"]
>> +                       ifFalse:
>> +                               ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
>> +                               Note that r1,r2 are bytes, not nibbles.
>> +                               Be careful not to generate intermediate results exceeding 13
>> +                               bits."
>> +                               "r2 := (rem digitAt: j - 1)."
>> +                               t := ((rem digitAt: j)
>> +                                                       bitShift: 4)
>> +                                                       + ((rem digitAt: j - 1)
>> +                                                                       bitShift: -4).
>> +                               qhi := t // dh.
>> +                               t := (t \\ dh bitShift: 4)
>> +                                                       + ((rem digitAt: j - 1)
>> +                                                                       bitAnd: 15).
>> +                               qlo := t // dh.
>> +                               t := t \\ dh.
>> +                               "Next compute (hi,lo) := q*dnh"
>> +                               hi := qhi * dnh.
>> +                               lo := qlo * dnh + ((hi bitAnd: 15)
>> +                                                               bitShift: 4).
>> +                               hi := (hi bitShift: -4)
>> +                                                       + (lo bitShift: -8).
>> +                               lo := lo bitAnd: 255.
>> +                               "Correct overestimate of q.
>> +                               Max of 2 iterations through loop -- see Knuth vol. 2"
>> +                               r3 := j < 3
>> +                                                       ifTrue: [0]
>> +                                                       ifFalse: [rem digitAt: j - 2].
>> +                               [(t < hi
>> +                                       or: [t = hi and: [r3 < lo]])
>> +                                       and:
>> +                                               ["i.e. (t,r3) < (hi,lo)"
>> +                                               qlo := qlo - 1.
>> +                                               lo := lo - dnh.
>> +                                               lo < 0
>> +                                                       ifTrue:
>> +                                                               [hi := hi - 1.
>> +                                                               lo := lo + 256].
>> +                                               hi >= dh]]
>> +                                       whileTrue: [hi := hi - dh].
>> +                               qlo < 0
>> +                                       ifTrue:
>> +                                               [qhi := qhi - 1.
>> +                                               qlo := qlo + 16]].
>> +               "Subtract q*div from rem"
>> +               l := j - dl.
>> +               a := 0.
>> +               1 to: divDigitLength do:
>> +                       [:i |
>> +                       hi := (div digitAt: i)
>> +                                               * qhi.
>> +                       lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
>> +                                                       bitShift: 4) - ((div digitAt: i)
>> +                                                       * qlo).
>> +                       rem digitAt: l put: lo - (lo // 256 * 256).
>> +                       "sign-tolerant form of (lo bitAnd: 255)"
>> +                       a := lo // 256 - (hi bitShift: -4).
>> +                       l := l + 1].
>> +               a < 0
>> +                       ifTrue:
>> +                               ["Add div back into rem, decrease q by 1"
>> +                               qlo := qlo - 1.
>> +                               l := j - dl.
>> +                               a := 0.
>> +                               1 to: divDigitLength do:
>> +                                       [:i |
>> +                                       a := (a bitShift: -8)
>> +                                                               + (rem digitAt: l) + (div digitAt: i).
>> +                                       rem digitAt: l put: (a bitAnd: 255).
>> +                                       l := l + 1]].
>> +               quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
>> +                               + qlo].
>> +       rem := rem
>> +                               digitRshift: d
>> +                               bytes: 0
>> +                               lookfirst: dl.
>> +       ^ Array with: quo with: rem!
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
>> + digitDiv21: anInteger
>> +       "This is part of the recursive division algorithm from Burnikel - Ziegler
>> +       Divide a two limbs receiver by 1 limb dividend
>> +       Each limb is decomposed in two halves of p bytes (8*p bits)
>> +       so as to continue the recursion"
>> +
>> +       | p qr1 qr2 |
>> +       p := anInteger digitLength + 1 bitShift: -1.
>> +       p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
>> +       qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
>> +       qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
>> +       qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
>> +       ^qr2!
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
>> + digitDiv32: anInteger
>> +       "This is part of the recursive division algorithm from Burnikel - Ziegler
>> +       Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
>> +       Each limb is made of p bytes (8*p bits).
>> +       This step transforms the division problem into multiplication
>> +       It must use the fastMultiply: to be worth the overhead costs."
>> +
>> +       | a2 b1 d p q qr r |
>> +       p := anInteger digitLength + 1 bitShift: -1.
>> +       (a2 := self butLowestNDigits: 2*p)
>> +               < (b1 := anInteger butLowestNDigits: p)
>> +               ifTrue:
>> +                       [qr := (self butLowestNDigits: p) digitDiv21: b1.
>> +                       q := qr first.
>> +                       r := qr last]
>> +               ifFalse:
>> +                       [q := (1 bitShift: 8*p) - 1.
>> +                       r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
>> +       d := q fastMultiply: (anInteger lowestNDigits: p).
>> +       r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
>> +       [r < 0]
>> +               whileTrue:
>> +                       [q := q - 1.
>> +                       r := r + anInteger].
>> +       ^Array with: q with: r
>> +       !
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
>> + digitDiv: anInteger neg: aBoolean
>> +       "If length is worth, engage a recursive divide and conquer strategy"
>> +       | qr |
>> +       (anInteger digitLength <= 256
>> +                       or: [self digitLength <= anInteger digitLength])
>> +               ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
>> +       qr := self abs recursiveDigitDiv: anInteger abs.
>> +       ^ aBoolean
>> +               ifTrue: [qr collect: #negated]
>> +               ifFalse: [qr]!
>>
>> Item was changed:
>>   ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
>>   printOn: aStream base: b
>>         "Append a representation of this number in base b on aStream.
>>         In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
>>
>> +       | halfDigits halfPower head tail nDigitsUnderestimate qr |
>> -       | halfDigits halfPower head tail nDigitsUnderestimate |
>>         "Don't engage any arithmetic if not normalized"
>>         (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].
>>
>>         nDigitsUnderestimate := b = 10
>>                 ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
>>                 ifFalse: [self highBit quo: b highBit].
>>
>>         "splitting digits with a whole power of two is more efficient"
>>         halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
>>
>>         halfDigits <= 1
>>                 ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
>>                         ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
>>
>>         "Separate in two halves, head and tail"
>>         halfPower := b raisedToInteger: halfDigits.
>> +       qr := self digitDiv: halfPower neg: self negative.
>> +       head := qr first normalize.
>> +       tail := qr last normalize.
>> -       head := self quo: halfPower.
>> -       tail := self - (head * halfPower).
>>
>>         "print head"
>>         head printOn: aStream base: b.
>>
>>         "print tail without the overhead to count the digits"
>>         tail printOn: aStream base: b nDigits: halfDigits!
>>
>> Item was changed:
>>   ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
>>   printOn: aStream base: b nDigits: n
>>         "Append a representation of this number in base b on aStream using n digits.
>>         In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
>>         Should be invoked with: 0 <= self < (b raisedToInteger: n)"
>>
>> +       | halfPower half head tail qr |
>> -       | halfPower half head tail |
>>         n <= 1 ifTrue: [
>>                 n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].
>>
>>                 "Note: this is to stop an infinite loop if one ever attempts to print with a huge base
>>                 This can happen because choice was to not hardcode any limit for base b
>>                 We let Character>>#digitValue: fail"
>>                 ^aStream nextPut: (Character digitValue: self) ].
>>         halfPower := n bitShift: -1.
>>         half := b raisedToInteger: halfPower.
>> +       qr := self digitDiv: half neg: self negative.
>> +       head := qr first normalize.
>> +       tail := qr last normalize.
>> -       head := self quo: half.
>> -       tail := self - (head * half).
>>         head printOn: aStream base: b nDigits: n - halfPower.
>>         tail printOn: aStream base: b nDigits: halfPower!
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
>> + recursiveDigitDiv: anInteger
>> +       "This is the recursive division algorithm from Burnikel - Ziegler
>> +       See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
>> +       Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
>> +       https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
>> +
>> +       | s m t a b z qr q i |
>> +       "round digits up to next power of 2"
>> +       s := anInteger digitLength.
>> +       m := 1 bitShift: (s - 1) highBit.
>> +       "shift so that leading bit of leading byte be 1, and digitLength power of two"
>> +       s := m * 8 - anInteger highBit.
>> +       a := self bitShift: s.
>> +       b := anInteger bitShift: s.
>> +
>> +       "Decompose a into t limbs - each limb have m bytes
>> +       choose t such that leading bit of leading limb of a be 0"
>> +       t := (a highBit + 1 / (m * 8)) ceiling.
>> +       z := a butLowestNDigits: t - 2 * m.
>> +       i := t - 2.
>> +       q := 0.
>> +       "and do a division of two limb by 1 limb b for each pair of limb of a"
>> +       [qr := z digitDiv21: b.
>> +       q := (q bitShift: 8*m) + qr first.      "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
>> +       (i := i - 1) >= 0] whileTrue:
>> +               [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
>> +       ^Array with: q with: (qr last bitShift: s negated)!
>>
>> Item was changed:
>>   ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
>>   sqrtRem
>>         "Like super, but use a divide and conquer method to perform this operation.
>>         See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
>>         https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
>>
>> +       | n  qr q s r sr high mid low |
>> -       | n  qr s r sr high mid low |
>>         n := self digitLength bitShift: -2.
>>         n >= 16 ifFalse: [^super sqrtRem].
>>         high := self butLowestNDigits: n * 2.
>>         mid := self copyDigitsFrom: n + 1 to: n * 2.
>>         low := self lowestNDigits: n.
>>
>>         sr := high sqrtRem.
>>         qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
>> +       q := qr first normalize.
>> +       s := (sr first bitShift: 8 * n) + q.
>> +       r := (qr last normalize bitShift: 8 * n) + low - q squared.
>> -       s := (sr first bitShift: 8 * n) + qr first.
>> -       r := (qr last bitShift: 8 * n) + low - qr first squared.
>>         r negative
>>                 ifTrue:
>>                         [r := (s bitShift: 1) + r - 1.
>>                         s := s - 1].
>>         sr at: 1 put: s; at: 2 put: r.
>>         ^sr
>>         !
>>
>> Item was changed:
>>   ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
>>   squared
>>         "Eventually use a divide and conquer algorithm to perform the multiplication"
>>
>>         (self digitLength >= 400) ifFalse: [^self * self].
>> +       (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
>> +       ^self squaredToom4!
>> -       (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
>> -       ^self squaredToom3!
>>
>> Item was added:
>> + ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
>> + squaredToom4
>> +       "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
>> +       See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
>> +       https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
>> +
>> +       | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
>> +       "divide in 4 parts"
>> +       p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
>> +       a3 := self butLowestNDigits: p * 3.
>> +       a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
>> +       a1 := self copyDigitsFrom: p + 1 to: p * 2.
>> +       a0 := self lowestNDigits: p.
>> +
>> +       "Toom-4 trick: 7 multiplications instead of 16"
>> +       a02 := a0 - a2.
>> +       a13 := a1 - a3.
>> +       s0 := a0 squared.
>> +       s1 := (a0 fastMultiply: a1) bitShift: 1.
>> +       s2 := (a02 + a13) fastMultiply: (a02 - a13).
>> +       s3 := ((a0 + a1) + (a2 + a3)) squared.
>> +       s4 := (a02 fastMultiply: a13) bitShift: 1.
>> +       s5 := (a3 fastMultiply: a2) bitShift: 1.
>> +       s6 := a3 squared.
>> +
>> +       "Interpolation"
>> +       t2 := s1 + s5.
>> +       t3 := (s2 + s3 + s4 bitShift: -1) - t2.
>> +       s3 := t2 - s4.
>> +       s4 := t3 - s0.
>> +       s2 := t3 - s2 - s6.
>> +
>> +       "Sum the parts of decomposition"
>> +       ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
>> +       +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
>> +
>> + "
>> + | a |
>> + a := 770 factorial-1.
>> + a digitLength.
>> + [a * a - a squaredToom4 = 0] assert.
>> + [Smalltalk garbageCollect.
>> + [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
>> + [Smalltalk garbageCollect.
>> + [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
>> + "!
>>
>>
>