Hi, I recently published a Chronology-Core version for using high resolution clock. On my 2.7GHz core i5 MBP (2015) I get this: Time highResClockTicksPerMillisecond 2699824 => OK, consistent with 2.7GHz Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond. -578 -563 => Huh, invoking a primitive takes that long (500 to 600 nanoseconds) But I can correct it. Let's try it: [10 factorial] bench. '14,000,000 per second. 71.2 microseconds per run.' => this is the reference result (1 to: 10) collect: [:i | | ticks | ticks := Time highResClock. 10 factorial. Time highResClock - ticks + + (Time highResClock - Time highResClock) "correction" * 1000000 // Time highResClockTicksPerMillisecond "get nanoseconds"]. #(1309 247 88 84 74 69 71 71 71 69) => OK, first runs are a bit long, but we get 70ns per run as the reference #(1977 191 143 148 142 120 122 122 117 120) => Oups??? Second run gives different results??? #(2239 180 143 143 142 116 117 117 117 114) => and third about the same than second... Any idea how to explain that? |
Le jeu. 25 avr. 2019 à 23:22, Nicolas Cellier <[hidden email]> a écrit :
sorry for the extra +, editing code in mail and not doing it is a recipe for not validating!
If I run 200 times, I sometimes get the two regimes separated by hickups #(1899 107 142 150 123 124 115 122 117 122 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 123 117 122 117 122 116 122 117 123 117 122 117 122 7950 128 117 122 120 118 9963 87 69 71 69 71 69 69 71 71 68 69 72 71 68 69 72 71 68 69 72 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 72 71 68 69 72 71 68 69 69 71 69 71 69 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71) |
Huh, in fact it's a bit different... Le jeu. 25 avr. 2019 à 23:28, Nicolas Cellier <[hidden email]> a écrit :
(1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond]. #(-528 -96 -71 -61 -63 -61 -61 -61 -63 -61 -58 -61 -61 -61 -61 -61 -58 -58 -61 -57 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -61 -61 -64 -58 -63 -61 -58 -61 -58 -61 -61 -61 -57 -61 -61 -58 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -65 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -58 -61 -57 -61 -58 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -57 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -63 -61 -58 -61 -57 -61 -61 -61 -61 -58 -61 -58 -63 -57 -61 -58 -61 -61 -58 -63 -58 -63 -58 -58 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -61 -61 -61 -57 -61 -58) In fact it's 500ns at first eval (JIT or something), then it's only 60ns.
Huh, I messed up with printing the timing (because I modified benchFor: using µsec clock) [10 factorial] bench. '14,500,000 per second. 69 nanoseconds per run.' [] bench. '191,000,000 per second. 5.25 nanoseconds per run.' So this is including 5ns for evaluating block + incrementing counter + looping... tmp := []. (1 to: 50) collect: [:i || ticks | ticks := Time highResClock. tmp value. Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. #(1020 54 14 11 4 4 4 4 2 5 1 2 4 4 2 4 5 2 4 2 2 5 2 2 2 4 4 4 5 4 4 4 4 2 5 4 4 4 2 0 5 2 5 4 5 5 4 4 4 4) => it's about 5ns but here we are in the noise of correction (it varies between 57 and 63ns as we saw above) That's the limit os single evaluation with µ-bench | tmp | tmp := [10 factorial]. (1 to: 200) collect: [:i || ticks | ticks := Time highResClock. tmp value. Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. | tmp | tmp := [10 factorial]. (1 to: 100) collect: [:i || ticks | ticks := Time highResClock. tmp value. Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. #(3227 267 212 161 130 125 130 127 127 126 127 126 125 133 124 130 125 128 125 130 127 125 126 127 128 127 130 127 130 127 130 124 125 124 125 126 124 126 124 131 127 132 125 130 124 130 124 130 126 124 128 126 130 126 130 127 131 126 130 125 124 125 124 127 125 127 125 130 127 133 124 130 125 128 127 125 122 130 124 127 126 124 128 124 130 123 130 124 130 126 130 124 127 128 127 128 124 130 125 130) #(1663 208 97 94 96 118 97 94 94 94 96 94 93 93 94 93 94 93 93 94 94 93 93 96 93 94 93 94 91 94 93 93 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 77 93 93 77 80 94 80 94 93 93 94 91) two regimes again... and not the 70ns of bench this time...
|
70ns difference can easily come from cache misses. I think you'd get
more consistent results if you compiled a method and ran that instead of compiling a new block for each execution. The first two numbers are larger because the jit kicks in. IIRC no jit for the first run, jitting during the second run. All futher executions use the jitted code. Also, the "hickups" probably happen because your OS is running other processes as well. IIRC the highResClock primitive uses TSC on Intel CPUs, so it may not be as reliable as you'd expect it to be[1], especially on older CPUs (clock rate depends on CPU frequency, cores' clocks are not synced). Levente [1] https://en.wikipedia.org/wiki/Time_Stamp_Counter On Fri, 26 Apr 2019, Nicolas Cellier wrote: > Huh, in fact it's a bit different... > > Le jeu. 25 avr. 2019 à 23:28, Nicolas Cellier <[hidden email]> a écrit : > > > Le jeu. 25 avr. 2019 à 23:22, Nicolas Cellier <[hidden email]> a écrit : > Hi, > I recently published a Chronology-Core version for using high resolution clock. > > On my 2.7GHz core i5 MBP (2015) I get this: > > Time highResClockTicksPerMillisecond > > 2699824 > => OK, consistent with 2.7GHz > > Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond. > > -578 -563 > => Huh, invoking a primitive takes that long (500 to 600 nanoseconds) > > > > (1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond]. > > #(-528 -96 -71 -61 -63 -61 -61 -61 -63 -61 -58 -61 -61 -61 -61 -61 -58 -58 -61 -57 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -61 -61 -64 -58 > -63 -61 -58 -61 -58 -61 -61 -61 -57 -61 -61 -58 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -65 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -58 -63 -61 -61 -61 -61 -61 -61 > -63 -61 -58 -61 -57 -61 -58 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -57 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -63 -61 -58 -61 -57 -61 -61 > -61 -61 -58 -61 -58 -63 -57 -61 -58 -61 -61 -58 -63 -58 -63 -58 -58 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -61 -61 -61 -57 -61 -58) > In fact it's 500ns at first eval (JIT or something), then it's only 60ns. > > > But I can correct it. Let's try it: > > [10 factorial] bench. > > '14,000,000 per second. 71.2 microseconds per run.' > => this is the reference result > > Huh, I messed up with printing the timing (because I modified benchFor: using µsec clock) > > [10 factorial] bench. > '14,500,000 per second. 69 nanoseconds per run.' > > [] bench. > '191,000,000 per second. 5.25 nanoseconds per run.' > > So this is including 5ns for evaluating block + incrementing counter + looping... > > tmp := []. > (1 to: 50) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > #(1020 54 14 11 4 4 4 4 2 5 1 2 4 4 2 4 5 2 4 2 2 5 2 2 2 4 4 4 5 4 4 4 4 2 5 4 4 4 2 0 5 2 5 4 5 5 4 4 4 4) > => it's about 5ns but here we are in the noise of correction (it varies between 57 and 63ns as we saw above) > That's the limit os single evaluation with µ-bench > > | tmp | > tmp := [10 factorial]. > (1 to: 200) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > | tmp | > tmp := [10 factorial]. > (1 to: 100) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > #(3227 267 212 161 130 125 130 127 127 126 127 126 125 133 124 130 125 128 125 130 127 125 126 127 128 127 130 127 130 127 130 124 125 124 125 126 124 126 124 131 127 132 125 130 124 130 124 130 126 124 128 126 130 126 130 > 127 131 126 130 125 124 125 124 127 125 127 125 130 127 133 124 130 125 128 127 125 122 130 124 127 126 124 128 124 130 123 130 124 130 126 130 124 127 128 127 128 124 130 125 130) > #(1663 208 97 94 96 118 97 94 94 94 96 94 93 93 94 93 94 93 93 94 94 93 93 96 93 94 93 94 91 94 93 93 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 94 93 94 94 93 93 96 94 93 93 94 91 93 > 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 77 93 93 77 80 94 80 94 93 93 94 91) > > two regimes again... and not the 70ns of bench this time... > > > (1 to: 10) collect: [:i | > | ticks | > ticks := Time highResClock. > 10 factorial. > Time highResClock - ticks + > > sorry for the extra +, editing code in mail and not doing it is a recipe for not validating! > + (Time highResClock - Time highResClock) "correction" > * 1000000 // Time highResClockTicksPerMillisecond "get nanoseconds"]. > > #(1309 247 88 84 74 69 71 71 71 69) > => OK, first runs are a bit long, but we get 70ns per run as the reference > > #(1977 191 143 148 142 120 122 122 117 120) > => Oups??? Second run gives different results??? > > #(2239 180 143 143 142 116 117 117 117 114) > => and third about the same than second... > Any idea how to explain that? > > > If I run 200 times, I sometimes get the two regimes separated by hickups > > #(1899 107 142 150 123 124 115 122 117 122 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 123 117 122 117 122 116 122 117 123 > 117 122 117 122 7950 128 117 122 120 118 9963 87 69 71 69 71 69 69 71 71 68 69 72 71 68 69 72 71 68 69 72 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 > 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 72 71 68 69 72 71 68 69 69 71 69 71 69 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 > 69 71 69 71) > > > |
Thanks Le ven. 26 avr. 2019 à 01:15, Levente Uzonyi <[hidden email]> a écrit : 70ns difference can easily come from cache misses. I think you'd get Yes, it makes sense
Yes, constant rate was the first question that came to my mind. But it's not really a problem: having the number of ticks is a good information per se, whatever the power saving, it tells how many cycles we use More problematic is un-synced drifting TSC on multi-core, because diffing makes no sense then if ever affinity changes... I suggest making it a Preference (inbox). Normally, the best feature of TSC is that they have very cheap access. Unfortunately, t's not that cheap thru the primitive. I re-ran today: [Time utcMicrosecondClock] bench. '20,500,000 per second. 48.7 nanoseconds per run.' [Time highResClock] bench. '22,800,000 per second. 43.8 nanoseconds per run.' and (1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond] #(-5479 -203 -45 -41 -36 -39 -38 -38 -41 -41 -39 -42 -41 -41 -41 -41 -39 -41 -39 -41 -39 -39 -38 -39 -41 -38 -41 -41 -41 -41 -41 -41 -39 -41 -41 -39 -39 -41 -41 -39 -43 -43 -39 -36 -36 -39 -38 -38 -38 -36 -35 -36 -35 -35 -35 -35 -35 -36 -36 -35 -35 -35 -36 -35 -36 -35 -35 -36 -35 -35 -35 -35 -36 -35 -36 -35 -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35 -36 -36 -35 -35 -36 -35 -36 -36 -35 -35 -35 -36 -35 -35 -35 -35 -38 -36 -35 -36 -35 -35 -35 -35 -35 -36 -36 -35 -35 -35 -36 -35 -36 -36 -35 -36 -35 -35 -35 -35 -36 -35 -36 -35 -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35 -36 -36 -35 -35 -36 -36 -36 -36 -35 -35 -35 -35 -36 -35 -36 -35 -36 -36 -36 -35 -35 -36 -35 -35 -35 -36 -36 -35 -35 -35 -36 -35 -36 -36 -35 -36 -35 -35 -35 -35 -36 -35 -36 -35 -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35) which is somehow consistent The utcMicrosecondsClock is very poor on windows VM (1 ms resolution). So maybe the high res clock can still be an interesting add-on...
Levente [1] https://en.wikipedia.org/wiki/Time_Stamp_Counter On Fri, 26 Apr 2019, Nicolas Cellier wrote: > Huh, in fact it's a bit different... > > Le jeu. 25 avr. 2019 à 23:28, Nicolas Cellier <[hidden email]> a écrit : > > > Le jeu. 25 avr. 2019 à 23:22, Nicolas Cellier <[hidden email]> a écrit : > Hi, > I recently published a Chronology-Core version for using high resolution clock. > > On my 2.7GHz core i5 MBP (2015) I get this: > > Time highResClockTicksPerMillisecond > > 2699824 > => OK, consistent with 2.7GHz > > Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond. > > -578 -563 > => Huh, invoking a primitive takes that long (500 to 600 nanoseconds) > > > > (1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond]. > > #(-528 -96 -71 -61 -63 -61 -61 -61 -63 -61 -58 -61 -61 -61 -61 -61 -58 -58 -61 -57 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -61 -61 -64 -58 > -63 -61 -58 -61 -58 -61 -61 -61 -57 -61 -61 -58 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -65 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -58 -63 -61 -61 -61 -61 -61 -61 > -63 -61 -58 -61 -57 -61 -58 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -57 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -63 -61 -58 -61 -57 -61 -61 > -61 -61 -58 -61 -58 -63 -57 -61 -58 -61 -61 -58 -63 -58 -63 -58 -58 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -61 -61 -61 -57 -61 -58) > In fact it's 500ns at first eval (JIT or something), then it's only 60ns. > > > But I can correct it. Let's try it: > > [10 factorial] bench. > > '14,000,000 per second. 71.2 microseconds per run.' > => this is the reference result > > Huh, I messed up with printing the timing (because I modified benchFor: using µsec clock) > > [10 factorial] bench. > '14,500,000 per second. 69 nanoseconds per run.' > > [] bench. > '191,000,000 per second. 5.25 nanoseconds per run.' > > So this is including 5ns for evaluating block + incrementing counter + looping... > > tmp := []. > (1 to: 50) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > #(1020 54 14 11 4 4 4 4 2 5 1 2 4 4 2 4 5 2 4 2 2 5 2 2 2 4 4 4 5 4 4 4 4 2 5 4 4 4 2 0 5 2 5 4 5 5 4 4 4 4) > => it's about 5ns but here we are in the noise of correction (it varies between 57 and 63ns as we saw above) > That's the limit os single evaluation with µ-bench > > | tmp | > tmp := [10 factorial]. > (1 to: 200) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > | tmp | > tmp := [10 factorial]. > (1 to: 100) collect: [:i || ticks | > ticks := Time highResClock. > tmp value. > Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond]. > > #(3227 267 212 161 130 125 130 127 127 126 127 126 125 133 124 130 125 128 125 130 127 125 126 127 128 127 130 127 130 127 130 124 125 124 125 126 124 126 124 131 127 132 125 130 124 130 124 130 126 124 128 126 130 126 130 > 127 131 126 130 125 124 125 124 127 125 127 125 130 127 133 124 130 125 128 127 125 122 130 124 127 126 124 128 124 130 123 130 124 130 126 130 124 127 128 127 128 124 130 125 130) > #(1663 208 97 94 96 118 97 94 94 94 96 94 93 93 94 93 94 93 93 94 94 93 93 96 93 94 93 94 91 94 93 93 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 94 93 94 94 93 93 96 94 93 93 94 91 93 > 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 77 93 93 77 80 94 80 94 93 93 94 91) > > two regimes again... and not the 70ns of bench this time... > > > (1 to: 10) collect: [:i | > | ticks | > ticks := Time highResClock. > 10 factorial. > Time highResClock - ticks + > > sorry for the extra +, editing code in mail and not doing it is a recipe for not validating! > + (Time highResClock - Time highResClock) "correction" > * 1000000 // Time highResClockTicksPerMillisecond "get nanoseconds"]. > > #(1309 247 88 84 74 69 71 71 71 69) > => OK, first runs are a bit long, but we get 70ns per run as the reference > > #(1977 191 143 148 142 120 122 122 117 120) > => Oups??? Second run gives different results??? > > #(2239 180 143 143 142 116 117 117 117 114) > => and third about the same than second... > Any idea how to explain that? > > > If I run 200 times, I sometimes get the two regimes separated by hickups > > #(1899 107 142 150 123 124 115 122 117 122 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 123 117 122 117 122 116 122 117 123 > 117 122 117 122 7950 128 117 122 120 118 9963 87 69 71 69 71 69 69 71 71 68 69 72 71 68 69 72 71 68 69 72 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 > 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 72 71 68 69 72 71 68 69 69 71 69 71 69 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 > 69 71 69 71) > > > |
Free forum by Nabble | Edit this page |