I tested this:
| x | x := SmallInteger maxVal raisedToInteger: 100. {Time millisecondsToRun: [100000 timesRepeat: [x byteShift: -200]]. Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1600]]. Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1597]]}. And got that: #(95 164 172) That means that case of exact byte boundary is not optimized for Rshift. As shown below, it is optimized for LShift: | x | x := SmallInteger maxVal raisedToInteger: 100. {Time millisecondsToRun: [100000 timesRepeat: [x byteShift: 200]]. Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1600]]. Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1597]]}. #(103 120 315) Notice that Rshift seems more efficient than Lshift on non byte boundary (nBits \\ 8 ~= 0)... Well, speed factor is not major (<2) but I would expect performance from these low level primitives... Ah, the byteShift: code is here if you wanna try: LargePositiveInteger>>byteShift: n "shift by bytes, which is faster than bits when n < 0" | shifted | n = 0 ifTrue: [^ self]. n > 0 ifTrue: [ shifted := self class new: self digitLength + n. shifted replaceFrom: n+1 to: self digitLength + n with: self startingAt: 1. ^ shifted]. self digitLength + n <= 0 ifTrue: [^0]. shifted := self class new: self digitLength + n. shifted replaceFrom: 1 to: self digitLength + n with: self startingAt: 1 - n. ^ shifted normalize Which could also be written: LargePositiveInteger>>byteShift: n "shift by bytes, which is faster than bits when n < 0" | shifted | n = 0 ifTrue: [^ self]. self digitLength + n <= 0 ifTrue: [^0]. shifted := self class new: self digitLength + n. shifted replaceFrom: (1 max: 1 + n) to: self digitLength + n with: self startingAt: (1 max: 1 - n). ^ shifted normalize |
nicolas cellier <ncellier <at> ifrance.com> writes:
> Notice that Rshift seems more efficient than Lshift on non byte boundary (nBits > \\ 8 ~= 0)... > x digitLength = 363, - so Rshift only has to shift 163 bytes, - versus 363 for Lshift... What is surprising is that byteShift: is not much different. - 95 ms for 16,300,000 bytes to copy, - 103 ms for 36,300,000 bytes to copy, That means time is not spent in copy. 100000 allocation of a LargeInt that's probably measuring some scavenging... I checked VW7.5 on same machine to get a reference: | x | x := SmallInteger maxVal raisedTo: 100. OrderedCollection new add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1600]]); add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1597]]); add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1597]]); add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1600]]); asArray VW: #(29 29 92 92) squeak: #(164 172 315 120) I'm surprised by the factor 3 in VW... But since more bytes are allocated in Lshift... The byteBounded Lshift is not far from VW. every other operation is definitly improvable. |
nicolas cellier a écrit :
> > > I checked VW7.5 on same machine to get a reference: > | x | > x := SmallInteger maxVal raisedTo: 100. > OrderedCollection new > add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1600]]); > add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: -1597]]); > add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1597]]); > add: (Time millisecondsToRun: [100000 timesRepeat: [x bitShift: 1600]]); > asArray > VW: #(29 29 92 92) > squeak: #(164 172 315 120) > > I'm surprised by the factor 3 in VW... > But since more bytes are allocated in Lshift... > The byteBounded Lshift is not far from VW. > every other operation is definitly improvable. > > I just pushed http://bugs.squeak.org/view.php?id=7109 for a small improvment of bitShift:. I marked as Kernel, but should rather be VM. Ken, can you here me? Above test shows a little speed up (don't compare to above results, my machine is sloooower): Regular 3.10-1unix VM: #(503 550 933 489) Patched 3.10-1nice VM: #(357 416 673 495) VWNC 7.6 reference VM: #(124 138 347 344) Now that i exercized a bit with plugins, I will try a Karatsuba * implementation. The algorithm is very simple. However naive algorithm should still be used when the smaller Integer operand is sparse, like (1 << 1000 + 1)*(1 << 10000 - 1)... |
> I marked as Kernel, but should rather be VM. Ken, can you here me? ear? |
Free forum by Nabble | Edit this page |