For Case [1] I got an unrelated CI failure [2] for
NeoUUIDGeneratorTests>>#testTwoSameGenerator Checking this locally by running this a dozen times... 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp testTwoSameGenerator ] Two distinct failures are apparent... 1. Takes longer than expected. self assert: (time2 - time1) abs < 100 last=1 ==> a time difference of 109 last=1 ==> a time difference of 102 last=1 ==> a time difference of 155 last=1 ==> a time difference of 105 Interesting that whenever this happens its on the first iteration. 2. Invalid consecutive ids. self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). last=127 ==> TestFailure: Got 256 instead of 0. last=828 ==> TestFailure: Got 256 instead of 0. last=128 ==> TestFailure: Got 256 instead of 0. last=316 ==> TestFailure: Got 256 instead of 0. Maybe a problem with rollover? So instrumenting a bit: testTwoSameGenerator: n | uuid1 uuid2 consecutiveOk time1 time2 timeOk | .... "same generator, consequetive counters" consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). "same generators, similar clock, within 100 milliseconds" time1 := self timeFromUUID: uuid1. time2 := self timeFromUUID: uuid2. timeOk := (time2 - time1) abs < 100. ( timeOk and: [ consecutiveOk ]) ifFalse: [ Transcript cr; show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; show: ' time ', ((time2 - time1) printString) , '<100'; tab; show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 10) printString). ]. self assert: consecutiveOk. self assert: timeOk. From playground running... 1 to: 1000 do: [:n| NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. Transcript crShow: 'x'. Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts triggered each run. fail n=001 time 104<100 uuid 160=160 fail n=269 time 2<100 uuid 256=0 fail n=001 time 152<100 uuid 166=166 fail n=117 time 3<100 uuid 256=0 fail n=001 time 123<100 uuid 128=128 fail n=278 time 3<100 uuid 256=0 fail n=001 time 145<100 uuid 230=230 fail n=673 time 1340<100 uuid 141=141 fail n=707 time 3<100 uuid 256=0 fail n=001 time 132<100 uuid 73=73 fail n=001 time 117<100 uuid 206=206 fail n=063 time 2<100 uuid 256=0 fail n=345 time 3<100 uuid 256=0 fail n=001 time 141<100 uuid 232=232 x x fail n=201 time 3<100 uuid 256=0 fail n=268 time 3<100 uuid 256=0 fail n=359 time 3<100 uuid 256=0 fail n=408 time 2<100 uuid 256=0 fail n=559 time 3<100 uuid 256=0 x fail n=139 time 3<100 uuid 256=0 fail n=854 time 7<100 uuid 256=0 x fail n=389 time 3<100 uuid 256=0 fail n=669 time 3<100 uuid 256=0 fail n=876 time 3<100 uuid 256=0 fail n=916 time 3<100 uuid 256=0 fail n=953 time 2<100 uuid 256=0 fail n=954 time 3<100 uuid 256=0 x fail n=148 time 3<100 uuid 256=0 fail n=741 time 3<100 uuid 256=0 x fail n=116 time 2<100 uuid 256=0 fail n=480 time 3<100 uuid 256=0 x After treating the rollover like this... consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). running the last playground script gives... (Note the n=001 fail reoccurs after waiting a while between running the playground script again.) fail n=001 time 123<100 uuid 110=110 x fail n=345 time 1287<100 uuid 24=24 fail n=404 time 1313<100 uuid 125=125 x x x x x fail n=449 time 1290<100 uuid 24=24 x x x x x Thats looking better, so upping to 10000 iterations... fail n=0001 time 151<100 uuid 160=160 fail n=2977 time 1437<100 uuid 221=221 fail n=5498 time 1454<100 uuid 224=224 fail n=9007 time 1351<100 uuid 239=239 x fail n=3389 time 1381<100 uuid 223=223 fail n=7768 time 1365<100 uuid 82=82 fail n=9773 time 1395<100 uuid 164=164 x fail n=3325 time 1448<100 uuid 67=67 fail n=9930 time 1456<100 uuid 103=103 x fail n=0060 time 2914<100 uuid 159=159 fail n=1829 time 1438<100 uuid 32=32 fail n=7918 time 1436<100 uuid 182=182 x So the rollover looks cured. Now adding some more instrumentation to view GC Time and Time To Run and (see attached ST file) 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 x fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 x fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 x fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 x x x x x fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 x x fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 x Now I wonder at the class comment saying... "the first 8 bytes are the *millisecond* clock value" but #clockOn: is using *microsecond* clock. Perhaps "self assert: (time2 - time1) abs < 100" is meant to be 100ms not 100us ?? cheers -ben NeoUUIDGeneratorTests-testTwoSameGenerator.st (1K) Download Attachment |
Hi Ben,
Thanks for looking into this, it was bothering me for quite some time ;-) https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure Yes, I think the counter roll-over is the problem, stupid of me to to have seen this. The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. I will provide a slice with fixes. Thanks again ! Regards, Sven > On 11 Dec 2016, at 09:07, Ben Coman <[hidden email]> wrote: > > For Case [1] I got an unrelated CI failure [2] for > NeoUUIDGeneratorTests>>#testTwoSameGenerator > > Checking this locally by running this a dozen times... > 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp testTwoSameGenerator ] > > Two distinct failures are apparent... > > 1. Takes longer than expected. > self assert: (time2 - time1) abs < 100 > last=1 ==> a time difference of 109 > last=1 ==> a time difference of 102 > last=1 ==> a time difference of 155 > last=1 ==> a time difference of 105 > Interesting that whenever this happens its on the first iteration. > > 2. Invalid consecutive ids. > self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). > last=127 ==> TestFailure: Got 256 instead of 0. > last=828 ==> TestFailure: Got 256 instead of 0. > last=128 ==> TestFailure: Got 256 instead of 0. > last=316 ==> TestFailure: Got 256 instead of 0. > Maybe a problem with rollover? > > So instrumenting a bit: > testTwoSameGenerator: n > | uuid1 uuid2 consecutiveOk time1 time2 timeOk | > .... > "same generator, consequetive counters" > consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). > "same generators, similar clock, within 100 milliseconds" > time1 := self timeFromUUID: uuid1. > time2 := self timeFromUUID: uuid2. > timeOk := (time2 - time1) abs < 100. > ( timeOk and: [ consecutiveOk ]) ifFalse: [ > Transcript cr; > show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; > show: ' time ', ((time2 - time1) printString) , '<100'; tab; > show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 10) printString). > ]. > self assert: consecutiveOk. > self assert: timeOk. > > From playground running... > 1 to: 1000 do: [:n| > NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. > Transcript crShow: 'x'. > > Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts triggered each run. > fail n=001 time 104<100 uuid 160=160 > fail n=269 time 2<100 uuid 256=0 > fail n=001 time 152<100 uuid 166=166 > fail n=117 time 3<100 uuid 256=0 > fail n=001 time 123<100 uuid 128=128 > fail n=278 time 3<100 uuid 256=0 > fail n=001 time 145<100 uuid 230=230 > fail n=673 time 1340<100 uuid 141=141 > fail n=707 time 3<100 uuid 256=0 > fail n=001 time 132<100 uuid 73=73 > fail n=001 time 117<100 uuid 206=206 > fail n=063 time 2<100 uuid 256=0 > fail n=345 time 3<100 uuid 256=0 > > Now a strange thing is that after commenting out those two asserts for consecutiveOk and timeOk, the fails at n=001 stop happening. > > fail n=001 time 141<100 uuid 232=232 > x > x > fail n=201 time 3<100 uuid 256=0 > fail n=268 time 3<100 uuid 256=0 > fail n=359 time 3<100 uuid 256=0 > fail n=408 time 2<100 uuid 256=0 > fail n=559 time 3<100 uuid 256=0 > x > fail n=139 time 3<100 uuid 256=0 > fail n=854 time 7<100 uuid 256=0 > x > fail n=389 time 3<100 uuid 256=0 > fail n=669 time 3<100 uuid 256=0 > fail n=876 time 3<100 uuid 256=0 > fail n=916 time 3<100 uuid 256=0 > fail n=953 time 2<100 uuid 256=0 > fail n=954 time 3<100 uuid 256=0 > x > fail n=148 time 3<100 uuid 256=0 > fail n=741 time 3<100 uuid 256=0 > x > fail n=116 time 2<100 uuid 256=0 > fail n=480 time 3<100 uuid 256=0 > x > > After treating the rollover like this... > consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). > running the last playground script gives... > > (Note the n=001 fail reoccurs after waiting a while between running the playground script again.) > fail n=001 time 123<100 uuid 110=110 > x > fail n=345 time 1287<100 uuid 24=24 > fail n=404 time 1313<100 uuid 125=125 > x > x > x > x > x > fail n=449 time 1290<100 uuid 24=24 > x > x > x > x > x > > > Thats looking better, so upping to 10000 iterations... > fail n=0001 time 151<100 uuid 160=160 > fail n=2977 time 1437<100 uuid 221=221 > fail n=5498 time 1454<100 uuid 224=224 > fail n=9007 time 1351<100 uuid 239=239 > x > fail n=3389 time 1381<100 uuid 223=223 > fail n=7768 time 1365<100 uuid 82=82 > fail n=9773 time 1395<100 uuid 164=164 > x > fail n=3325 time 1448<100 uuid 67=67 > fail n=9930 time 1456<100 uuid 103=103 > x > fail n=0060 time 2914<100 uuid 159=159 > fail n=1829 time 1438<100 uuid 32=32 > fail n=7918 time 1436<100 uuid 182=182 > x > > So the rollover looks cured. > > Now adding some more instrumentation to view GC Time and Time To Run and (see attached ST file) > > 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 > x > fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 > fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 > x > fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 > x > fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 > fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 > x > x > x > x > x > fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 > x > x > fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 > x > > > Now I wonder at the class comment saying... > "the first 8 bytes are the *millisecond* clock value" > but #clockOn: is using *microsecond* clock. > Perhaps "self assert: (time2 - time1) abs < 100" > is meant to be 100ms not 100us ?? > > cheers -ben > > [1] https://pharo.fogbugz.com/f/cases/19438 > [2] https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifact/validationReport.html > <NeoUUIDGeneratorTests-testTwoSameGenerator.st> |
>The timing problem might indeed be resolution dependent: 100us is probably to short sometimes.
Time microsecondClockValue gives the time in microseconds, but it does not guarantee microsecond resolution, on windows the resolution is still in milliseconds. ---------- | col | col := Array new: 30000 streamContents: [ :stream | 1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ] ]. col asSet "a Set(3658909536504622 3658909536503622 3658909536502622)"' --------- Best regards, Henrik -----Original Message----- From: Pharo-dev [mailto:[hidden email]] On Behalf Of Sven Van Caekenberghe Sent: Sunday, December 11, 2016 11:56 AM To: Pharo Development List <[hidden email]> Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure Hi Ben, Thanks for looking into this, it was bothering me for quite some time ;-) https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure Yes, I think the counter roll-over is the problem, stupid of me to to have seen this. The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. I will provide a slice with fixes. Thanks again ! Regards, Sven > On 11 Dec 2016, at 09:07, Ben Coman <[hidden email]> wrote: > > For Case [1] I got an unrelated CI failure [2] for > NeoUUIDGeneratorTests>>#testTwoSameGenerator > > Checking this locally by running this a dozen times... > 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp > testTwoSameGenerator ] > > Two distinct failures are apparent... > > 1. Takes longer than expected. > self assert: (time2 - time1) abs < 100 > last=1 ==> a time difference of 109 > last=1 ==> a time difference of 102 > last=1 ==> a time difference of 155 > last=1 ==> a time difference of 105 > Interesting that whenever this happens its on the first iteration. > > 2. Invalid consecutive ids. > self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). > last=127 ==> TestFailure: Got 256 instead of 0. > last=828 ==> TestFailure: Got 256 instead of 0. > last=128 ==> TestFailure: Got 256 instead of 0. > last=316 ==> TestFailure: Got 256 instead of 0. > Maybe a problem with rollover? > > So instrumenting a bit: > testTwoSameGenerator: n > | uuid1 uuid2 consecutiveOk time1 time2 timeOk | > .... > "same generator, consequetive counters" > consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). > "same generators, similar clock, within 100 milliseconds" > time1 := self timeFromUUID: uuid1. > time2 := self timeFromUUID: uuid2. > timeOk := (time2 - time1) abs < 100. > ( timeOk and: [ consecutiveOk ]) ifFalse: [ > Transcript cr; > show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; > show: ' time ', ((time2 - time1) printString) , '<100'; tab; > show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 10) printString). > ]. > self assert: consecutiveOk. > self assert: timeOk. > > From playground running... > 1 to: 1000 do: [:n| > NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. > Transcript crShow: 'x'. > > Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts triggered each run. > fail n=001 time 104<100 uuid 160=160 > fail n=269 time 2<100 uuid 256=0 > fail n=001 time 152<100 uuid 166=166 > fail n=117 time 3<100 uuid 256=0 > fail n=001 time 123<100 uuid 128=128 > fail n=278 time 3<100 uuid 256=0 > fail n=001 time 145<100 uuid 230=230 > fail n=673 time 1340<100 uuid 141=141 > fail n=707 time 3<100 uuid 256=0 > fail n=001 time 132<100 uuid 73=73 > fail n=001 time 117<100 uuid 206=206 > fail n=063 time 2<100 uuid 256=0 > fail n=345 time 3<100 uuid 256=0 > > Now a strange thing is that after commenting out those two asserts for consecutiveOk and timeOk, the fails at n=001 stop happening. > > fail n=001 time 141<100 uuid 232=232 > x > x > fail n=201 time 3<100 uuid 256=0 > fail n=268 time 3<100 uuid 256=0 > fail n=359 time 3<100 uuid 256=0 > fail n=408 time 2<100 uuid 256=0 > fail n=559 time 3<100 uuid 256=0 > x > fail n=139 time 3<100 uuid 256=0 > fail n=854 time 7<100 uuid 256=0 > x > fail n=389 time 3<100 uuid 256=0 > fail n=669 time 3<100 uuid 256=0 > fail n=876 time 3<100 uuid 256=0 > fail n=916 time 3<100 uuid 256=0 > fail n=953 time 2<100 uuid 256=0 > fail n=954 time 3<100 uuid 256=0 > x > fail n=148 time 3<100 uuid 256=0 > fail n=741 time 3<100 uuid 256=0 > x > fail n=116 time 2<100 uuid 256=0 > fail n=480 time 3<100 uuid 256=0 > x > > After treating the rollover like this... > consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). > running the last playground script gives... > > (Note the n=001 fail reoccurs after waiting a while between running the playground script again.) > fail n=001 time 123<100 uuid 110=110 > x > fail n=345 time 1287<100 uuid 24=24 > fail n=404 time 1313<100 uuid 125=125 > x > x > x > x > x > fail n=449 time 1290<100 uuid 24=24 > x > x > x > x > x > > > Thats looking better, so upping to 10000 iterations... > fail n=0001 time 151<100 uuid 160=160 > fail n=2977 time 1437<100 uuid 221=221 > fail n=5498 time 1454<100 uuid 224=224 > fail n=9007 time 1351<100 uuid 239=239 > x > fail n=3389 time 1381<100 uuid 223=223 > fail n=7768 time 1365<100 uuid 82=82 > fail n=9773 time 1395<100 uuid 164=164 > x > fail n=3325 time 1448<100 uuid 67=67 > fail n=9930 time 1456<100 uuid 103=103 > x > fail n=0060 time 2914<100 uuid 159=159 > fail n=1829 time 1438<100 uuid 32=32 > fail n=7918 time 1436<100 uuid 182=182 > x > > So the rollover looks cured. > > Now adding some more instrumentation to view GC Time and Time To Run > and (see attached ST file) > > 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 > x > fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 > fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 > x > fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 > x > fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 > fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 > x > x > x > x > x > fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 > x > x > fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 > x > > > Now I wonder at the class comment saying... > "the first 8 bytes are the *millisecond* clock value" > but #clockOn: is using *microsecond* clock. > Perhaps "self assert: (time2 - time1) abs < 100" > is meant to be 100ms not 100us ?? > > cheers -ben > > [1] https://pharo.fogbugz.com/f/cases/19438 > [2] > https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac > t/validationReport.html > <NeoUUIDGeneratorTests-testTwoSameGenerator.st> |
> On 11 Dec 2016, at 12:49, Henrik Nergaard <[hidden email]> wrote: > >> The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. > Time microsecondClockValue gives the time in microseconds, but it does not guarantee microsecond resolution, on windows the resolution is still in milliseconds. > > ---------- > | col | > > col := Array new: 30000 streamContents: [ :stream | > 1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ] > ]. > > col asSet "a Set(3658909536504622 3658909536503622 3658909536502622)"' > --------- Cool test, very sad result ;-) > Best regards, > Henrik > > -----Original Message----- > From: Pharo-dev [mailto:[hidden email]] On Behalf Of Sven Van Caekenberghe > Sent: Sunday, December 11, 2016 11:56 AM > To: Pharo Development List <[hidden email]> > Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure > > Hi Ben, > > Thanks for looking into this, it was bothering me for quite some time ;-) > > https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure > > Yes, I think the counter roll-over is the problem, stupid of me to to have seen this. > > The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. > > I will provide a slice with fixes. > > Thanks again ! > > Regards, > > Sven > >> On 11 Dec 2016, at 09:07, Ben Coman <[hidden email]> wrote: >> >> For Case [1] I got an unrelated CI failure [2] for >> NeoUUIDGeneratorTests>>#testTwoSameGenerator >> >> Checking this locally by running this a dozen times... >> 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp >> testTwoSameGenerator ] >> >> Two distinct failures are apparent... >> >> 1. Takes longer than expected. >> self assert: (time2 - time1) abs < 100 >> last=1 ==> a time difference of 109 >> last=1 ==> a time difference of 102 >> last=1 ==> a time difference of 155 >> last=1 ==> a time difference of 105 >> Interesting that whenever this happens its on the first iteration. >> >> 2. Invalid consecutive ids. >> self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). >> last=127 ==> TestFailure: Got 256 instead of 0. >> last=828 ==> TestFailure: Got 256 instead of 0. >> last=128 ==> TestFailure: Got 256 instead of 0. >> last=316 ==> TestFailure: Got 256 instead of 0. >> Maybe a problem with rollover? >> >> So instrumenting a bit: >> testTwoSameGenerator: n >> | uuid1 uuid2 consecutiveOk time1 time2 timeOk | >> .... >> "same generator, consequetive counters" >> consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). >> "same generators, similar clock, within 100 milliseconds" >> time1 := self timeFromUUID: uuid1. >> time2 := self timeFromUUID: uuid2. >> timeOk := (time2 - time1) abs < 100. >> ( timeOk and: [ consecutiveOk ]) ifFalse: [ >> Transcript cr; >> show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; >> show: ' time ', ((time2 - time1) printString) , '<100'; tab; >> show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 10) printString). >> ]. >> self assert: consecutiveOk. >> self assert: timeOk. >> >> From playground running... >> 1 to: 1000 do: [:n| >> NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. >> Transcript crShow: 'x'. >> >> Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts triggered each run. >> fail n=001 time 104<100 uuid 160=160 >> fail n=269 time 2<100 uuid 256=0 >> fail n=001 time 152<100 uuid 166=166 >> fail n=117 time 3<100 uuid 256=0 >> fail n=001 time 123<100 uuid 128=128 >> fail n=278 time 3<100 uuid 256=0 >> fail n=001 time 145<100 uuid 230=230 >> fail n=673 time 1340<100 uuid 141=141 >> fail n=707 time 3<100 uuid 256=0 >> fail n=001 time 132<100 uuid 73=73 >> fail n=001 time 117<100 uuid 206=206 >> fail n=063 time 2<100 uuid 256=0 >> fail n=345 time 3<100 uuid 256=0 >> >> Now a strange thing is that after commenting out those two asserts for consecutiveOk and timeOk, the fails at n=001 stop happening. >> >> fail n=001 time 141<100 uuid 232=232 >> x >> x >> fail n=201 time 3<100 uuid 256=0 >> fail n=268 time 3<100 uuid 256=0 >> fail n=359 time 3<100 uuid 256=0 >> fail n=408 time 2<100 uuid 256=0 >> fail n=559 time 3<100 uuid 256=0 >> x >> fail n=139 time 3<100 uuid 256=0 >> fail n=854 time 7<100 uuid 256=0 >> x >> fail n=389 time 3<100 uuid 256=0 >> fail n=669 time 3<100 uuid 256=0 >> fail n=876 time 3<100 uuid 256=0 >> fail n=916 time 3<100 uuid 256=0 >> fail n=953 time 2<100 uuid 256=0 >> fail n=954 time 3<100 uuid 256=0 >> x >> fail n=148 time 3<100 uuid 256=0 >> fail n=741 time 3<100 uuid 256=0 >> x >> fail n=116 time 2<100 uuid 256=0 >> fail n=480 time 3<100 uuid 256=0 >> x >> >> After treating the rollover like this... >> consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). >> running the last playground script gives... >> >> (Note the n=001 fail reoccurs after waiting a while between running the playground script again.) >> fail n=001 time 123<100 uuid 110=110 >> x >> fail n=345 time 1287<100 uuid 24=24 >> fail n=404 time 1313<100 uuid 125=125 >> x >> x >> x >> x >> x >> fail n=449 time 1290<100 uuid 24=24 >> x >> x >> x >> x >> x >> >> >> Thats looking better, so upping to 10000 iterations... >> fail n=0001 time 151<100 uuid 160=160 >> fail n=2977 time 1437<100 uuid 221=221 >> fail n=5498 time 1454<100 uuid 224=224 >> fail n=9007 time 1351<100 uuid 239=239 >> x >> fail n=3389 time 1381<100 uuid 223=223 >> fail n=7768 time 1365<100 uuid 82=82 >> fail n=9773 time 1395<100 uuid 164=164 >> x >> fail n=3325 time 1448<100 uuid 67=67 >> fail n=9930 time 1456<100 uuid 103=103 >> x >> fail n=0060 time 2914<100 uuid 159=159 >> fail n=1829 time 1438<100 uuid 32=32 >> fail n=7918 time 1436<100 uuid 182=182 >> x >> >> So the rollover looks cured. >> >> Now adding some more instrumentation to view GC Time and Time To Run >> and (see attached ST file) >> >> 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 >> x >> fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 >> fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 >> x >> fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 >> x >> fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 >> fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 >> x >> x >> x >> x >> x >> fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 >> x >> x >> fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 >> x >> >> >> Now I wonder at the class comment saying... >> "the first 8 bytes are the *millisecond* clock value" >> but #clockOn: is using *microsecond* clock. >> Perhaps "self assert: (time2 - time1) abs < 100" >> is meant to be 100ms not 100us ?? >> >> cheers -ben >> >> [1] https://pharo.fogbugz.com/f/cases/19438 >> [2] >> https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac >> t/validationReport.html >> <NeoUUIDGeneratorTests-testTwoSameGenerator.st> > > > |
In reply to this post by Henrik Nergaard-2
On Sun, Dec 11, 2016 at 7:49 PM, Henrik Nergaard <[hidden email]> wrote: >The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. So the majority of the time you'll assert: 3622 - 3622 < 100 which passes, but other times you'll cross the boundary and you'll assert: ...4622 - ...3622 < 100, which fails. The important question is, what resolution of time is important for UUIDs? If 10ms is sufficient, then ...4622 - ...3622 < 10000 easily passes on even the slowest computer..
col asSet size "==>16135" btw2, For a challenge I'm lining up to improve the VM clock "sometime", maybe end of Feb (if it proves within my capability no one else gets to it first).
Cool. Ping me from the slice. cheers -ben
|
> On 11 Dec 2016, at 16:50, Ben Coman <[hidden email]> wrote: > > > > On Sun, Dec 11, 2016 at 7:49 PM, Henrik Nergaard <[hidden email]> wrote: > >The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. > Time microsecondClockValue gives the time in microseconds, but it does not guarantee microsecond resolution, on windows the resolution is still in milliseconds. > > Useful to know. But regardless of the resolution, the units are still microseconds. > So the majority of the time you'll assert: 3622 - 3622 < 100 > which passes, > but other times you'll cross the boundary and you'll assert: ...4622 - ...3622 < 100, > which fails. > > The important question is, what resolution of time is important for UUIDs? > If 10ms is sufficient, then ...4622 - ...3622 < 10000 easily passes on even the slowest computer.. I don't think it is terribly important, but if #microsecondClockValue has such a bad resolution on Windows (as hendrik showed), that is a general problem. It just not good. > ---------- > | col | > > col := Array new: 30000 streamContents: [ :stream | > 1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ] > ]. > > col asSet "a Set(3658909536504622 3658909536503622 3658909536502622)"' > > btw I forgot to mention I'm on 32-bit Linux. So thankfully I get... > col asSet size "==>16135" > > btw2, For a challenge I'm lining up to improve the VM clock "sometime", maybe end of Feb (if it proves within my capability no one else gets to it first). > https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/36 > > > > -----Original Message----- > From: Pharo-dev [mailto:[hidden email]] On Behalf Of Sven Van Caekenberghe > Sent: Sunday, December 11, 2016 11:56 AM > To: Pharo Development List <[hidden email]> > Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure > > Hi Ben, > > Thanks for looking into this, it was bothering me for quite some time ;-) > > https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure > > Yes, I think the counter roll-over is the problem, stupid of me to to have seen this. > > The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. > > I will provide a slice with fixes. > > Cool. Ping me from the slice. > cheers -ben > > > Thanks again ! > > Regards, > > Sven > > > On 11 Dec 2016, at 09:07, Ben Coman <[hidden email]> wrote: > > > > For Case [1] I got an unrelated CI failure [2] for > > NeoUUIDGeneratorTests>>#testTwoSameGenerator > > > > Checking this locally by running this a dozen times... > > 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp > > testTwoSameGenerator ] > > > > Two distinct failures are apparent... > > > > 1. Takes longer than expected. > > self assert: (time2 - time1) abs < 100 > > last=1 ==> a time difference of 109 > > last=1 ==> a time difference of 102 > > last=1 ==> a time difference of 155 > > last=1 ==> a time difference of 105 > > Interesting that whenever this happens its on the first iteration. > > > > 2. Invalid consecutive ids. > > self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). > > last=127 ==> TestFailure: Got 256 instead of 0. > > last=828 ==> TestFailure: Got 256 instead of 0. > > last=128 ==> TestFailure: Got 256 instead of 0. > > last=316 ==> TestFailure: Got 256 instead of 0. > > Maybe a problem with rollover? > > > > So instrumenting a bit: > > testTwoSameGenerator: n > > | uuid1 uuid2 consecutiveOk time1 time2 timeOk | > > .... > > "same generator, consequetive counters" > > consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). > > "same generators, similar clock, within 100 milliseconds" > > time1 := self timeFromUUID: uuid1. > > time2 := self timeFromUUID: uuid2. > > timeOk := (time2 - time1) abs < 100. > > ( timeOk and: [ consecutiveOk ]) ifFalse: [ > > Transcript cr; > > show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; > > show: ' time ', ((time2 - time1) printString) , '<100'; tab; > > show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 10) printString). > > ]. > > self assert: consecutiveOk. > > self assert: timeOk. > > > > From playground running... > > 1 to: 1000 do: [:n| > > NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. > > Transcript crShow: 'x'. > > > > Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts triggered each run. > > fail n=001 time 104<100 uuid 160=160 > > fail n=269 time 2<100 uuid 256=0 > > fail n=001 time 152<100 uuid 166=166 > > fail n=117 time 3<100 uuid 256=0 > > fail n=001 time 123<100 uuid 128=128 > > fail n=278 time 3<100 uuid 256=0 > > fail n=001 time 145<100 uuid 230=230 > > fail n=673 time 1340<100 uuid 141=141 > > fail n=707 time 3<100 uuid 256=0 > > fail n=001 time 132<100 uuid 73=73 > > fail n=001 time 117<100 uuid 206=206 > > fail n=063 time 2<100 uuid 256=0 > > fail n=345 time 3<100 uuid 256=0 > > > > Now a strange thing is that after commenting out those two asserts for consecutiveOk and timeOk, the fails at n=001 stop happening. > > > > fail n=001 time 141<100 uuid 232=232 > > x > > x > > fail n=201 time 3<100 uuid 256=0 > > fail n=268 time 3<100 uuid 256=0 > > fail n=359 time 3<100 uuid 256=0 > > fail n=408 time 2<100 uuid 256=0 > > fail n=559 time 3<100 uuid 256=0 > > x > > fail n=139 time 3<100 uuid 256=0 > > fail n=854 time 7<100 uuid 256=0 > > x > > fail n=389 time 3<100 uuid 256=0 > > fail n=669 time 3<100 uuid 256=0 > > fail n=876 time 3<100 uuid 256=0 > > fail n=916 time 3<100 uuid 256=0 > > fail n=953 time 2<100 uuid 256=0 > > fail n=954 time 3<100 uuid 256=0 > > x > > fail n=148 time 3<100 uuid 256=0 > > fail n=741 time 3<100 uuid 256=0 > > x > > fail n=116 time 2<100 uuid 256=0 > > fail n=480 time 3<100 uuid 256=0 > > x > > > > After treating the rollover like this... > > consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). > > running the last playground script gives... > > > > (Note the n=001 fail reoccurs after waiting a while between running the playground script again.) > > fail n=001 time 123<100 uuid 110=110 > > x > > fail n=345 time 1287<100 uuid 24=24 > > fail n=404 time 1313<100 uuid 125=125 > > x > > x > > x > > x > > x > > fail n=449 time 1290<100 uuid 24=24 > > x > > x > > x > > x > > x > > > > > > Thats looking better, so upping to 10000 iterations... > > fail n=0001 time 151<100 uuid 160=160 > > fail n=2977 time 1437<100 uuid 221=221 > > fail n=5498 time 1454<100 uuid 224=224 > > fail n=9007 time 1351<100 uuid 239=239 > > x > > fail n=3389 time 1381<100 uuid 223=223 > > fail n=7768 time 1365<100 uuid 82=82 > > fail n=9773 time 1395<100 uuid 164=164 > > x > > fail n=3325 time 1448<100 uuid 67=67 > > fail n=9930 time 1456<100 uuid 103=103 > > x > > fail n=0060 time 2914<100 uuid 159=159 > > fail n=1829 time 1438<100 uuid 32=32 > > fail n=7918 time 1436<100 uuid 182=182 > > x > > > > So the rollover looks cured. > > > > Now adding some more instrumentation to view GC Time and Time To Run > > and (see attached ST file) > > > > 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 > > x > > fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 > > fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 > > x > > fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 > > x > > fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 > > fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 > > x > > x > > x > > x > > x > > fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 > > x > > x > > fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 > > x > > > > > > Now I wonder at the class comment saying... > > "the first 8 bytes are the *millisecond* clock value" > > but #clockOn: is using *microsecond* clock. > > Perhaps "self assert: (time2 - time1) abs < 100" > > is meant to be 100ms not 100us ?? > > > > cheers -ben > > > > [1] https://pharo.fogbugz.com/f/cases/19438 > > [2] > > https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac > > t/validationReport.html > > <NeoUUIDGeneratorTests-testTwoSameGenerator.st> > > > > |
Free forum by Nabble | Edit this page |