some stupid failures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

some stupid failures

Nicolas Cellier
 
Hi all,
sometimes, some build fail for just 1 test...

Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
a squeak.stack.v3

RenderBugz
 ✗ #testSetForward (7ms)
TestFailure: Block evaluation took more than the expected 0:00:00:00.004
RenderBugz(TestCase)>>assert:description:
RenderBugz(TestCase)>>should:notTakeMoreThan:
RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
RenderBugz>>shouldntTakeLong:
RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
self assert: ( t forwardDirection = 0.0 )  ]
RenderBugz(TestCase)>>performTest

4ms, really? On C.I. infrastructure, anything can happen...
Do we really want to keep this kind of test?
We eventually could once startup performance is known (see
isLowerPerformance discussion on squeak-dev), but in the interim, I
suggest we neutralize this specific test in Smalltalk-CI.
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] some stupid failures

Ron Teitelbaum
 
Seems like more of a warning and not a failure. 

All the best,

Ron Teitelbaum  

On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel <[hidden email]> wrote:
Hi Nicolas.

Do we really want to keep this kind of test?

Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.

Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...

Best,
Marcel

Am 05.01.2021 09:08:46 schrieb Nicolas Cellier <[hidden email]>:


Hi all,
sometimes, some build fail for just 1 test...

Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
a squeak.stack.v3

RenderBugz
✗ #testSetForward (7ms)
TestFailure: Block evaluation took more than the expected 0:00:00:00.004
RenderBugz(TestCase)>>assert:description:
RenderBugz(TestCase)>>should:notTakeMoreThan:
RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
RenderBugz>>shouldntTakeLong:
RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
self assert: ( t forwardDirection = 0.0 ) ]
RenderBugz(TestCase)>>performTest

4ms, really? On C.I. infrastructure, anything can happen...
Do we really want to keep this kind of test?
We eventually could once startup performance is known (see
isLowerPerformance discussion on squeak-dev), but in the interim, I
suggest we neutralize this specific test in Smalltalk-CI.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] some stupid failures

Nicolas Cellier
 
Here is another source of frequent C.I. failures:

MCMethodDefinitionTest

 ✗ #testLoadAndUnload (20255ms)

TestFailure: Test timed out

Presumably not a lean and mean test...

Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum <[hidden email]> a écrit :

>
>
> Seems like more of a warning and not a failure.
>
> All the best,
>
> Ron Teitelbaum
>
> On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel <[hidden email]> wrote:
>>
>> Hi Nicolas.
>>
>> > Do we really want to keep this kind of test?
>>
>> Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
>>
>> Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
>>
>> Best,
>> Marcel
>>
>> Am 05.01.2021 09:08:46 schrieb Nicolas Cellier <[hidden email]>:
>>
>>
>> Hi all,
>> sometimes, some build fail for just 1 test...
>>
>> Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
>> a squeak.stack.v3
>>
>> RenderBugz
>> ✗ #testSetForward (7ms)
>> TestFailure: Block evaluation took more than the expected 0:00:00:00.004
>> RenderBugz(TestCase)>>assert:description:
>> RenderBugz(TestCase)>>should:notTakeMoreThan:
>> RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
>> RenderBugz>>shouldntTakeLong:
>> RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
>> self assert: ( t forwardDirection = 0.0 ) ]
>> RenderBugz(TestCase)>>performTest
>>
>> 4ms, really? On C.I. infrastructure, anything can happen...
>> Do we really want to keep this kind of test?
>> We eventually could once startup performance is known (see
>> isLowerPerformance discussion on squeak-dev), but in the interim, I
>> suggest we neutralize this specific test in Smalltalk-CI.
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] some stupid failures

Nicolas Cellier
 
Yet another one (stack.v3)

SUnitToolBuilderTests
837fef_b498

 ✗ #testHandlingNotification (18863ms)

Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier
<[hidden email]> a écrit :

>
> Here is another source of frequent C.I. failures:
>
> MCMethodDefinitionTest
>
>  ✗ #testLoadAndUnload (20255ms)
>
> TestFailure: Test timed out
>
> Presumably not a lean and mean test...
>
> Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum <[hidden email]> a écrit :
> >
> >
> > Seems like more of a warning and not a failure.
> >
> > All the best,
> >
> > Ron Teitelbaum
> >
> > On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel <[hidden email]> wrote:
> >>
> >> Hi Nicolas.
> >>
> >> > Do we really want to keep this kind of test?
> >>
> >> Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
> >>
> >> Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
> >>
> >> Best,
> >> Marcel
> >>
> >> Am 05.01.2021 09:08:46 schrieb Nicolas Cellier <[hidden email]>:
> >>
> >>
> >> Hi all,
> >> sometimes, some build fail for just 1 test...
> >>
> >> Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
> >> a squeak.stack.v3
> >>
> >> RenderBugz
> >> ✗ #testSetForward (7ms)
> >> TestFailure: Block evaluation took more than the expected 0:00:00:00.004
> >> RenderBugz(TestCase)>>assert:description:
> >> RenderBugz(TestCase)>>should:notTakeMoreThan:
> >> RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
> >> RenderBugz>>shouldntTakeLong:
> >> RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
> >> self assert: ( t forwardDirection = 0.0 ) ]
> >> RenderBugz(TestCase)>>performTest
> >>
> >> 4ms, really? On C.I. infrastructure, anything can happen...
> >> Do we really want to keep this kind of test?
> >> We eventually could once startup performance is known (see
> >> isLowerPerformance discussion on squeak-dev), but in the interim, I
> >> suggest we neutralize this specific test in Smalltalk-CI.
> >>
> >>
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] some stupid failures

Nicolas Cellier
 
And the fun of it, each time I retry, I see a different random failure...

#########################

# 1 tests did not pass: #

#########################

CompiledMethodTest
16ccae_ca85

 ✗ #testCopyWithTrailerBytes (11332ms)

Le mar. 12 janv. 2021 à 15:23, Nicolas Cellier
<[hidden email]> a écrit :

>
> Yet another one (stack.v3)
>
> SUnitToolBuilderTests
> 837fef_b498
>
>  ✗ #testHandlingNotification (18863ms)
>
> Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier
> <[hidden email]> a écrit :
> >
> > Here is another source of frequent C.I. failures:
> >
> > MCMethodDefinitionTest
> >
> >  ✗ #testLoadAndUnload (20255ms)
> >
> > TestFailure: Test timed out
> >
> > Presumably not a lean and mean test...
> >
> > Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum <[hidden email]> a écrit :
> > >
> > >
> > > Seems like more of a warning and not a failure.
> > >
> > > All the best,
> > >
> > > Ron Teitelbaum
> > >
> > > On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel <[hidden email]> wrote:
> > >>
> > >> Hi Nicolas.
> > >>
> > >> > Do we really want to keep this kind of test?
> > >>
> > >> Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
> > >>
> > >> Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
> > >>
> > >> Best,
> > >> Marcel
> > >>
> > >> Am 05.01.2021 09:08:46 schrieb Nicolas Cellier <[hidden email]>:
> > >>
> > >>
> > >> Hi all,
> > >> sometimes, some build fail for just 1 test...
> > >>
> > >> Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
> > >> a squeak.stack.v3
> > >>
> > >> RenderBugz
> > >> ✗ #testSetForward (7ms)
> > >> TestFailure: Block evaluation took more than the expected 0:00:00:00.004
> > >> RenderBugz(TestCase)>>assert:description:
> > >> RenderBugz(TestCase)>>should:notTakeMoreThan:
> > >> RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
> > >> RenderBugz>>shouldntTakeLong:
> > >> RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
> > >> self assert: ( t forwardDirection = 0.0 ) ]
> > >> RenderBugz(TestCase)>>performTest
> > >>
> > >> 4ms, really? On C.I. infrastructure, anything can happen...
> > >> Do we really want to keep this kind of test?
> > >> We eventually could once startup performance is known (see
> > >> isLowerPerformance discussion on squeak-dev), but in the interim, I
> > >> suggest we neutralize this specific test in Smalltalk-CI.
> > >>
> > >>
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] some stupid failures

Nicolas Cellier
 
Hmm, for the sake of documenting the randomly failing tests, here are
two others:

######################################################
# Squeak-4.6 on Travis CI (2361.31)                  #
# 3396 Tests with 2 Failures and 0 Errors in 158.13s #
######################################################

#########################
# 2 tests did not pass: #
#########################

PureBehaviorTest
8401de_4bcf

 ✗ #testMethodCategoryReorganization (20517ms)

SecureHashAlgorithmTest
b63682_4bcf

 ✗ #testEmptyInput (12145ms)

Le mar. 12 janv. 2021 à 15:41, Nicolas Cellier
<[hidden email]> a écrit :

>
> And the fun of it, each time I retry, I see a different random failure...
>
> #########################
>
> # 1 tests did not pass: #
>
> #########################
>
> CompiledMethodTest
> 16ccae_ca85
>
>  ✗ #testCopyWithTrailerBytes (11332ms)
>
> Le mar. 12 janv. 2021 à 15:23, Nicolas Cellier
> <[hidden email]> a écrit :
> >
> > Yet another one (stack.v3)
> >
> > SUnitToolBuilderTests
> > 837fef_b498
> >
> >  ✗ #testHandlingNotification (18863ms)
> >
> > Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier
> > <[hidden email]> a écrit :
> > >
> > > Here is another source of frequent C.I. failures:
> > >
> > > MCMethodDefinitionTest
> > >
> > >  ✗ #testLoadAndUnload (20255ms)
> > >
> > > TestFailure: Test timed out
> > >
> > > Presumably not a lean and mean test...
> > >
> > > Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum <[hidden email]> a écrit :
> > > >
> > > >
> > > > Seems like more of a warning and not a failure.
> > > >
> > > > All the best,
> > > >
> > > > Ron Teitelbaum
> > > >
> > > > On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel <[hidden email]> wrote:
> > > >>
> > > >> Hi Nicolas.
> > > >>
> > > >> > Do we really want to keep this kind of test?
> > > >>
> > > >> Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
> > > >>
> > > >> Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
> > > >>
> > > >> Best,
> > > >> Marcel
> > > >>
> > > >> Am 05.01.2021 09:08:46 schrieb Nicolas Cellier <[hidden email]>:
> > > >>
> > > >>
> > > >> Hi all,
> > > >> sometimes, some build fail for just 1 test...
> > > >>
> > > >> Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844
> > > >> a squeak.stack.v3
> > > >>
> > > >> RenderBugz
> > > >> ✗ #testSetForward (7ms)
> > > >> TestFailure: Block evaluation took more than the expected 0:00:00:00.004
> > > >> RenderBugz(TestCase)>>assert:description:
> > > >> RenderBugz(TestCase)>>should:notTakeMoreThan:
> > > >> RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds:
> > > >> RenderBugz>>shouldntTakeLong:
> > > >> RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 .
> > > >> self assert: ( t forwardDirection = 0.0 ) ]
> > > >> RenderBugz(TestCase)>>performTest
> > > >>
> > > >> 4ms, really? On C.I. infrastructure, anything can happen...
> > > >> Do we really want to keep this kind of test?
> > > >> We eventually could once startup performance is known (see
> > > >> isLowerPerformance discussion on squeak-dev), but in the interim, I
> > > >> suggest we neutralize this specific test in Smalltalk-CI.
> > > >>
> > > >>