This shows that the inline response from #timesRepeat: is different from code: 5 timesRepeat: [false] nil | block | block := [false]. 5 timesRepeat: block 5 This shows that both the inline form and the message send form are affected by large integers: | ms response | ms := (Time millisecondsToRun: [ response := 2 * SmallInteger maxVal timesRepeat: [] ]) // 2. Array with: ms with: response. #(69431 nil) | ms response block | block := []. ms := (Time millisecondsToRun: [ response := 2 * SmallInteger maxVal timesRepeat: block ]) // 2. Array with: ms with: response. #(69539 1073741822) Compare these averaged ms times with the ones above to see how much can be gained by tuning #timesRepeat: to avoid large integers: (Time millisecondsToRun: [ 2 timesRepeat: [SmallInteger maxVal timesRepeat: []] ]) // 2 1349 | block | block := []. (Time millisecondsToRun: [ 2 timesRepeat: [SmallInteger maxVal timesRepeat: block] ]) // 2 3745 Add this method to optimize the #timesRepeat: message send: LargeInteger>>timesRepeat: aBlock "Optimization proposed to Cincom by ICE. -plb 2011.06.08" self // SmallInteger maxVal timesRepeat: [SmallInteger maxVal timesRepeat: [aBlock value]]. self \\ SmallInteger maxVal timesRepeat: [aBlock value]. This shows that the #timesRepeat: message send is now much faster than the inline #timesRepeat: for large integers: | ms response block | block := []. ms := (Time millisecondsToRun: [ response := 2 * SmallInteger maxVal timesRepeat: block ]) // 2. Array with: ms with: response. #(3645 1073741822) The inline form can also be made faster or by punting to a LargeInteger implementation if not a SmallInteger. That optimization would be made in the VM and is likely an easy change. The return value should be fixed too. These tests were
done with a VW 7.5 image and VW 7.6 VM. The performance difference between integer types skews the results of typical performance comparison tools. More efficient code sometimes profiles slower as a side effect of higher estimated repeat multiples. Paul Baumann This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
At some point I considered the timesRepeat: tuning, but I decided
against it. There are two factors driving this decision. One is what happens in the workspace. Consider the following: Time millisecondsToRun: [100000000 timesRepeat: [1 * 1 * 1 * 1 * 1]] 1323 "Comes with a warning against unoptimized blocks" | block | block := [1 * 1 * 1 * 1 * 1]. Time millisecondsToRun: [100000000 timesRepeat: block] 2191 "Using the [block value] idiom so the outer block is optimized" | block | block := [1 * 1 * 1 * 1 * 1]. Time millisecondsToRun: [100000000 timesRepeat: [block value]] 1717 So, as you can see, if the block you are measuring is clean, then it can be optimized more than if the block is not clean. This is particularly important when you have huge repetition counts, because any evaluation overhead will be amplified by the repetition count as well. The issue with the large integer optimization is that, well... how should it be implemented? Like this? LargePositiveInteger>>actualTimesRepeat: aBlock self // SmallInteger maxVal timesRepeat: [SmallInteger maxVal timesRepeat: [aBlock value]]. self \\ SmallInteger maxVal timesRepeat: [aBlock value] (note we have to avoid using a the special timesRepeat: selector) But this implementation defeats the purpose of large repetition counts because it cannot use the more optimized block evaluation mechanism. Consequently, measurements will be deceptive. Consider for example the above [1 * 1 * 1 * 1 * 1] block. Repetitions Milliseconds 100000000 ~1325 200000000 ~2550 300000000 ~3675 400000000 ~5150 500000000 ~6350 Clearly, for every 10^8 repetitions you get about 1300 milliseconds' worth of execution time. But what happens when we use the above method and run the code some more? Repetitions Milliseconds SmallInteger maxVal ~6875 SmallInteger maxVal + 1 ~9600 So, yeah, the provided code is faster than using the inlined timesRepeat: implementation we have, but what happened there? Now we have a huge measurement skew induced by adding a single repetition. So, even with the "optimization" in LargePositiveInteger, measurements are still significantly different depending on whether the receiver is a small integer or a large integer. This brings us to the second factor, which also affects measurement tools which might be arguably forced not to use clean blocks. So... do applications actually use code like this? largePositiveInteger timesRepeat: [business stuff] Of course it's hard to tell from where I am sitting, but I've yet to see such a thing on the field. Thus, I am tempted to say that the above case corresponds to measurement code. If that's the case, then what matters is the evaluation efficiency of [business stuff]. And if [business stuff] takes very little time, then there are grounds for using large repetition counts, and in turn that implies the need to minimize measurement overhead for the sake of measurement precision, and the requirement to reduce measurement overhead contradicts the large positive integer version of actualTimesRepeat: because we cannot use the more efficient block evaluation mechanism to implement it. Moreover, measuring [business stuff] with huge multipliers does increase the measured overhead no matter how small. Recall the table of evaluations of [1 * 1 * 1 * 1 * 1]: Repetitions Milliseconds 100000000 ~1325 200000000 ~2550 300000000 ~3675 400000000 ~5150 500000000 ~6350 Here is the table corresponding doing a single loop unroll, i.e.: we are measuring the evaluation of [1 * 1 * 1 * 1 * 1. 1 * 1 * 1 * 1 * 1] but with half the repetition count each time (note in the below table the repetitions list how many times the 4 multiplications were done, but in reality the repetition counts passed to timesRepeat: are half of what is listed). We get these times: Repetitions (total) Milliseconds 100000000 ~1100 200000000 ~2225 300000000 ~3350 400000000 ~4450 500000000 ~5550 So, each set of 10^8 multiplication expressions takes ~1100ms instead of ~1300ms, just because we changed the measurement methodology. That's about 20% worth of skew! So what happens if we do 10 expressions per block and decimate the repetition count? Repetitions (total) Milliseconds 100000000 ~1000 200000000 ~2025 300000000 ~2975 400000000 ~3975 500000000 ~5025 So that's another 10% of skew, for a total 30% of skew, compared to no loop unrolling at all. Of course decimating further will provide diminishing returns, so I will stop here. The bottom line for me is that I might ignore a few percentage points pretending they are noise, but 30% is *a lot*. Thus, because of the implementation issues with clean blocks in the workspace, and because of the induced overhead skew as noted above which are bound to affect every measurement (benchmark tool or not), I am not sure increasing the repetition count into the large integers is a good idea. Consequently, I am not convinced the apparently unnecessary and/or counterproductive case needs optimizing. Finally, note that this issue becomes basically invisible on 64 bit images. On 6/8/2011 8:18 AM, Paul Baumann wrote: > This shows that the inline response from #timesRepeat: is different from > code: > > 5 timesRepeat: [false] > > nil > > | block | > > block := [false]. > > 5 timesRepeat: block > > 5 > > This shows that both the inline form and the message send form are > affected by large integers: > > | ms response | > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: [] > > ]) // 2. > > Array with: ms with: response. > > #(69431 nil) > > | ms response block | > > block := []. > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: block > > ]) // 2. > > Array with: ms with: response. > > #(69539 1073741822) > > Compare these averaged ms times with the ones above to see how much can > be gained by tuning #timesRepeat: to avoid large integers: > > (Time millisecondsToRun: [ > > 2 timesRepeat: [SmallInteger maxVal timesRepeat: []] > > ]) // 2 > > 1349 > > | block | > > block := []. > > (Time millisecondsToRun: [ > > 2 timesRepeat: [SmallInteger maxVal timesRepeat: block] > > ]) // 2 > > 3745 > > Add this method to optimize the #timesRepeat: message send: > > LargeInteger>>timesRepeat: aBlock > > "Optimization proposed to Cincom by ICE. -plb 2011.06.08" > > self // SmallInteger maxVal timesRepeat: [SmallInteger maxVal > timesRepeat: [aBlock value]]. > > self \\ SmallInteger maxVal timesRepeat: [aBlock value]. > > This shows that the #timesRepeat: message send is now much faster than > the inline #timesRepeat: for large integers: > > | ms response block | > > block := []. > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: block > > ]) // 2. > > Array with: ms with: response. > > #(3645 1073741822) > > The inline form can also be made faster or by punting to a LargeInteger > implementation if not a SmallInteger. That optimization would be made in > the VM and is likely an easy change. The return value should be fixed > too. These tests were done with a VW 7.5 image and VW 7.6 VM. The > performance difference between integer types skews the results of > typical performance comparison tools. More efficient code sometimes > profiles slower as a side effect of higher estimated repeat multiples. > > Paul Baumann > > > ------------------------------------------------------------------------ > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have > reason to believe you are not an intended recipient of this message, > please delete it and notify the sender. This message may not represent > the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or > affiliates, and does not constitute a contract or guarantee. Unencrypted > electronic mail is not secure and the recipient of this message is > expected to provide safeguards from viruses and pursue alternate means > of communication where privacy or a binding message is desired. > > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
>> Finally, note that this issue becomes basically invisible on 64 bit images. Andres, Yes. I agree with your other comments. I could show how to reduce the hit, but no need. It is easy enough to avoid LPI repeats in the profiling code I found it in. It isn't a big deal. Computers will grow fast enough that the LPI would get hit more frequently in code like this, but installations will also grow into 64-bit images. I mentioned it because I suspected it might be easy to optimize in the VM. If it isn't, then no problem. Thanks for the thorough review. BTW, the optimized form accepts floats and doubles that don't implement #timesRepeat:. It is good that it does because I've encountered code that would break if it suddenly didn't. There is a performance hit for using the wrong type though. If it had been possible to make the LPIs fast then this would be a similar issue. Time millisecondsToRun: [536870911 timesRepeat: []]. 1245 Time millisecondsToRun: [536870911.0 timesRepeat: []]. 16801 Time millisecondsToRun: [536870911d timesRepeat: []]. 16140 Time millisecondsToRun: [536870911d asInteger timesRepeat: []]. 1403 Hint: Time millisecondsToRun: [aLargePositiveInteger forRepeatLoop timesRepeat: []]. Regards, Paul Baumann -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Andres Valloud Sent: Wednesday, June 08, 2011 15:36 To: [hidden email] Subject: Re: [vwnc] Tuning timesRepeat: At some point I considered the timesRepeat: tuning, but I decided against it. There are two factors driving this decision. One is what happens in the workspace. Consider the following: Time millisecondsToRun: [100000000 timesRepeat: [1 * 1 * 1 * 1 * 1]] 1323 "Comes with a warning against unoptimized blocks" | block | block := [1 * 1 * 1 * 1 * 1]. Time millisecondsToRun: [100000000 timesRepeat: block] 2191 "Using the [block value] idiom so the outer block is optimized" | block | block := [1 * 1 * 1 * 1 * 1]. Time millisecondsToRun: [100000000 timesRepeat: [block value]] 1717 So, as you can see, if the block you are measuring is clean, then it can be optimized more than if the block is not clean. This is particularly important when you have huge repetition counts, because any evaluation overhead will be amplified by the repetition count as well. The issue with the large integer optimization is that, well... how should it be implemented? Like this? LargePositiveInteger>>actualTimesRepeat: aBlock self // SmallInteger maxVal timesRepeat: [SmallInteger maxVal timesRepeat: [aBlock value]]. self \\ SmallInteger maxVal timesRepeat: [aBlock value] (note we have to avoid using a the special timesRepeat: selector) But this implementation defeats the purpose of large repetition counts because it cannot use the more optimized block evaluation mechanism. Consequently, measurements will be deceptive. Consider for example the above [1 * 1 * 1 * 1 * 1] block. Repetitions Milliseconds 100000000 ~1325 200000000 ~2550 300000000 ~3675 400000000 ~5150 500000000 ~6350 Clearly, for every 10^8 repetitions you get about 1300 milliseconds' worth of execution time. But what happens when we use the above method and run the code some more? Repetitions Milliseconds SmallInteger maxVal ~6875 SmallInteger maxVal + 1 ~9600 So, yeah, the provided code is faster than using the inlined timesRepeat: implementation we have, but what happened there? Now we have a huge measurement skew induced by adding a single repetition. So, even with the "optimization" in LargePositiveInteger, measurements are still significantly different depending on whether the receiver is a small integer or a large integer. This brings us to the second factor, which also affects measurement tools which might be arguably forced not to use clean blocks. So... do applications actually use code like this? largePositiveInteger timesRepeat: [business stuff] Of course it's hard to tell from where I am sitting, but I've yet to see such a thing on the field. Thus, I am tempted to say that the above case corresponds to measurement code. If that's the case, then what matters is the evaluation efficiency of [business stuff]. And if [business stuff] takes very little time, then there are grounds for using large repetition counts, and in turn that implies the need to minimize measurement overhead for the sake of measurement precision, and the requirement to reduce measurement overhead contradicts the large positive integer version of actualTimesRepeat: because we cannot use the more efficient block evaluation mechanism to implement it. Moreover, measuring [business stuff] with huge multipliers does increase the measured overhead no matter how small. Recall the table of evaluations of [1 * 1 * 1 * 1 * 1]: Repetitions Milliseconds 100000000 ~1325 200000000 ~2550 300000000 ~3675 400000000 ~5150 500000000 ~6350 Here is the table corresponding doing a single loop unroll, i.e.: we are measuring the evaluation of [1 * 1 * 1 * 1 * 1. 1 * 1 * 1 * 1 * 1] but with half the repetition count each time (note in the below table the repetitions list how many times the 4 multiplications were done, but in reality the repetition counts passed to timesRepeat: are half of what is listed). We get these times: Repetitions (total) Milliseconds 100000000 ~1100 200000000 ~2225 300000000 ~3350 400000000 ~4450 500000000 ~5550 So, each set of 10^8 multiplication expressions takes ~1100ms instead of ~1300ms, just because we changed the measurement methodology. That's about 20% worth of skew! So what happens if we do 10 expressions per block and decimate the repetition count? Repetitions (total) Milliseconds 100000000 ~1000 200000000 ~2025 300000000 ~2975 400000000 ~3975 500000000 ~5025 So that's another 10% of skew, for a total 30% of skew, compared to no loop unrolling at all. Of course decimating further will provide diminishing returns, so I will stop here. The bottom line for me is that I might ignore a few percentage points pretending they are noise, but 30% is *a lot*. Thus, because of the implementation issues with clean blocks in the workspace, and because of the induced overhead skew as noted above which are bound to affect every measurement (benchmark tool or not), I am not sure increasing the repetition count into the large integers is a good idea. Consequently, I am not convinced the apparently unnecessary and/or counterproductive case needs optimizing. Finally, note that this issue becomes basically invisible on 64 bit images. On 6/8/2011 8:18 AM, Paul Baumann wrote: > This shows that the inline response from #timesRepeat: is different from > code: > > 5 timesRepeat: [false] > > nil > > | block | > > block := [false]. > > 5 timesRepeat: block > > 5 > > This shows that both the inline form and the message send form are > affected by large integers: > > | ms response | > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: [] > > ]) // 2. > > Array with: ms with: response. > > #(69431 nil) > > | ms response block | > > block := []. > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: block > > ]) // 2. > > Array with: ms with: response. > > #(69539 1073741822) > > Compare these averaged ms times with the ones above to see how much can > be gained by tuning #timesRepeat: to avoid large integers: > > (Time millisecondsToRun: [ > > 2 timesRepeat: [SmallInteger maxVal timesRepeat: []] > > ]) // 2 > > 1349 > > | block | > > block := []. > > (Time millisecondsToRun: [ > > 2 timesRepeat: [SmallInteger maxVal timesRepeat: block] > > ]) // 2 > > 3745 > > Add this method to optimize the #timesRepeat: message send: > > LargeInteger>>timesRepeat: aBlock > > "Optimization proposed to Cincom by ICE. -plb 2011.06.08" > > self // SmallInteger maxVal timesRepeat: [SmallInteger maxVal > timesRepeat: [aBlock value]]. > > self \\ SmallInteger maxVal timesRepeat: [aBlock value]. > > This shows that the #timesRepeat: message send is now much faster than > the inline #timesRepeat: for large integers: > > | ms response block | > > block := []. > > ms := (Time millisecondsToRun: [ > > response := 2 * SmallInteger maxVal timesRepeat: block > > ]) // 2. > > Array with: ms with: response. > > #(3645 1073741822) > > The inline form can also be made faster or by punting to a LargeInteger > implementation if not a SmallInteger. That optimization would be made in > the VM and is likely an easy change. The return value should be fixed > too. These tests were done with a VW 7.5 image and VW 7.6 VM. The > performance difference between integer types skews the results of > typical performance comparison tools. More efficient code sometimes > profiles slower as a side effect of higher estimated repeat multiples. > > Paul Baumann > > > ------------------------------------------------------------------------ > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have > reason to believe you are not an intended recipient of this message, > please delete it and notify the sender. This message may not represent > the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or > affiliates, and does not constitute a contract or guarantee. Unencrypted > electronic mail is not secure and the recipient of this message is > expected to provide safeguards from viruses and pursue alternate means > of communication where privacy or a binding message is desired. > > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
> Yes. I agree with your other comments. I could show how to reduce the
> hit, but no need. It is easy enough to avoid LPI repeats in the > profiling code I found it in. It isn't a big deal. Computers will > grow fast enough that the LPI would get hit more frequently in code > like this, but installations will also grow into 64-bit images. I > mentioned it because I suspected it might be easy to optimize in the > VM. If it isn't, then no problem. Thanks for the thorough review. > BTW, the optimized form accepts floats and doubles that don't > implement #timesRepeat:. It is good that it does because I've > encountered code that would break if it suddenly didn't. At first sight, I think it's a matter of compiler optimization rather than VM optimization. If you look at the bytecodes produced for timesRepeat:, then you can see timesRepeat: is rewritten in terms of a [...] whileTrue: [...] construct. There's a new temp for a counter, and before each iteration there's a <= test against the timesRepeat: "receiver". Now, interestingly, floating point values and what not know how to compare themselves to small integers directly. For example, Time millisecondsToRun: [10000000 timesRepeat: [1 < 2.0d]] 673 Time millisecondsToRun: [10000000 timesRepeat: [2.0d > 1]] 86 So, in your case, it might be useful to reverse the timesRepeat: check at the head of the loop so that the comparison receiver is the timesRepeat: receiver. In other words, if we look at 1000d timesRepeat: [nil] and debug the expression to find the decompiled bytecodes, we find: unboundMethod | t1 | t1 := 1. [t1 <= 1000.0d] whileTrue: [t1 := t1 + 1]. ^nil So if instead of that we had unboundMethod | t1 | t1 := 1. [1000.0d >= t1] whileTrue: [t1 := t1 + 1]. ^nil then your cases would run faster. For example, Time millisecondsToRun: [ | t1 | t1 := 1. [t1 <= 10000000.0d] whileTrue: [t1 := t1 + 1]. nil ] 366 Time millisecondsToRun: [ | t1 | t1 := 1. [10000000.0d >= t1] whileTrue: [t1 := t1 + 1]. nil ] 95 The integer case seems to run just as fast regardless of the comparison order. Now, of course, if the limit was a (small) integer instead of floating point, then the code would be even faster. Time millisecondsToRun: [ | t1 | t1 := 1. [10000000 >= t1] whileTrue: [t1 := t1 + 1]. nil ] 31 What code is "sending" timesRepeat: to floating point values? Andres. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Andres wrote:
>> What code is "sending" timesRepeat: to floating point values? The repeat was the result of a calculation that included a float. Below is the fastest form that the compiler can optimize to while avoiding performance inconsistencies. Performance in the range of SmallInteger is about the same as the form you showed. It doubles the efficient range of integers while also efficiently handling other kinds of numbers. The code being repeated can fit neatly into the compiler optimized whileTrue: argument. Time millisecondsToRun: [ | times endAt t1 | times := SmallInteger maxVal * 2d. endAt := SmallInteger minVal. t1 := (times asInteger max: 0) + 1 + endAt. [endAt ~~ (t1 := t1 - 1)] whileTrue: ["compiler in-lines code here"]. nil ]. 2947 2957 2926 Here is the range adjusted performance of the form you showed (that slows without SmallInteger): (Time millisecondsToRun: [ | t1 times | times := SmallInteger maxVal. t1 := 1. [times >= t1] whileTrue: [t1 := t1 + 1]. nil ]) * 2 2948 2760 2802 When that same no-operation pattern is put into a method then it looks like this: ArithmaticValue>>otherTimesRepeat0: aBlock "Note that this doesn't actually do any work, it is to profile counting only." | endAt t1 | endAt := SmallInteger minVal. t1 := (self asInteger max: 0) + 1 + endAt. [endAt ~~ (t1 := t1 - 1)] whileTrue: []. ^nil (Time millisecondsToRun: [ SmallInteger maxVal otherTimesRepeat0: []. ]) * 2. 3014 This is what performance looks like when a block closure gets involved: ArithmaticValue>>otherTimesRepeat: aBlock | endAt t1 | endAt := SmallInteger minVal. t1 := (self asInteger max: 0) + 1 + endAt. [endAt ~~ (t1 := t1 - 1)] whileTrue: [aBlock value]. ^nil (Time millisecondsToRun: [ SmallInteger maxVal otherTimesRepeat: []. ]) * 2. 6260 This shows that performance scales fine into large integers: Time millisecondsToRun: [ SmallInteger maxVal * 2 otherTimesRepeat: []. ]. 6356 Having the compiler optimize the code to avoid the block closure is (of course) a large savings in this context. A fast and general way enumerate a block closure without a point of degrading performance is with these methods: ArithmeticValue>>myTimesRepeat: aBlock "Optimization proposed to Cincom by ICE. -plb 2011.06.09" ^[] repeat: self. BlockClosure>>repeat: times "Optimization proposed to Cincom by ICE. -plb 2011.06.09" "Evaluate the receiver multiple times. The receiver is expected to be a zero argument block." | t1 resets | t1 := times asInteger max: 0. resets := t1 // 536870911. t1 := (t1 \\ 536870911) + 1. [0 ~~ (t1 := t1 - 1) or: [t1 := 536870911. (resets := resets - 1) ~~ -1]] whileTrue: [self value]. ^nil This is how performance compares to the current implementation using small integers: (Time millisecondsToRun: [ | block | block := []. SmallInteger maxVal timesRepeat: block. ]) * 2 8304 8252 (Time millisecondsToRun: [ | block | block := []. SmallInteger maxVal myTimesRepeat: block. ]) *2. 6794 7246 This is how performance compares into large integers: Time millisecondsToRun: [ | block | block := []. SmallInteger maxVal * 2 timesRepeat: block. ]. 144143 Time millisecondsToRun: [ | block | block := []. SmallInteger maxVal * 2 myTimesRepeat: block. ]. 6902 6855 My point is that the problem can be fixed for both the compiler optimized code and for the message send form. The fix makes performance consistent without making it any slower. 64-bit will not fix performance of repeats on floats and fractions. Granted, those are application bugs, but they have been known to happen. Here is a little extra to think about with regard return values and the simple block costs that you'd mentioned... Consider how useful it can be to repeat with injection: BlockClosure>>repeat: times inject: initialValue "Optimization proposed to Cincom by ICE. -plb 2011.06.09" "Evaluate the receiver multiple times. The receiver is expected to be a one argument block. The argument is the result of the previous execution of the receiver." | t1 resets last | t1 := times asInteger max: 0. resets := t1 // 536870911. t1 := (t1 \\ 536870911) + 1. last := initialValue. [0 ~~ (t1 := t1 - 1) or: [t1 := 536870911. (resets := resets - 1) ~~ -1]] whileTrue: [last := self value: last]. ^last Consider how passing context in as argument to the block can keep the performance of a simple closure: BlockClosure>>repeat: times including: inc1 inject: initialValue "Optimization proposed to Cincom by ICE. -plb 2011.06.09" "Evaluate the receiver multiple times. The receiver is expected to be a two argument block. The first argument to the block can used to keep the block simple. The last argument is the result of the previous execution of the receiver." | t1 resets last | t1 := times asInteger max: 0. resets := t1 // 536870911. t1 := (t1 \\ 536870911) + 1. last := initialValue. [0 ~~ (t1 := t1 - 1) or: [t1 := 536870911. (resets := resets - 1) ~~ -1]] whileTrue: [last := self value: inc1 value: last]. ^last The performance is consistent no matter the kind of number (that can convert to an integer). Performance stays consistent regardless of the number of repeats. Here is a closure that is passed context so that it remains simple: Time millisecondsToRun: [ [:myself :last | myself ] repeat: SmallInteger maxVal * 2 including: self inject: nil. ]. 6773 7218 Here is the same test but the closure is not simple: Time millisecondsToRun: [ [:myself :last | self ] repeat: SmallInteger maxVal * 2 including: self inject: nil. ]. 10654 10698 The cost that closure adds to the repeat is equal to the cost of a #yourself message send: Time millisecondsToRun: [ [:myself :last | myself yourself ] repeat: SmallInteger maxVal * 2 including: self inject: nil. ]. 10631 10756 Regards, Paul Baumann This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |