CI Hickups ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

CI Hickups ...

Torsten Bergmann
While our general build from Pharo 7 is (more or less often) green

https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/development/

I've noticed more and more that the CI builds for the PR's to Pharo 7 even for trivial changes are
often broken and not green

https://github.com/pharo-project/pharo/pulls

which is annoying and one still has the feeling to break the builds.

As always I think getting a more stable CI is dependent on time, resources and priorities. But does anyone know
if this is a general issue or something that is already worked on?

Thx
T.

Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Marcus Denker-4
Hello,

Yes… I try to make notes about all failing test so we can detect patterns

It seems most of them are related to timing issues that come from the fact that
it runs on a virtualised box…
Others are related to the fact that we run multiple instances in parallel on the
same machine and they try to open the same port. (not that often).

Another problem is that the vm sometimes crashes (related to the font rendering bug)

We really need to try to find a way to stabilise the CI.

        Marcus

> On 26 Jan 2018, at 10:23, Torsten Bergmann <[hidden email]> wrote:
>
> While our general build from Pharo 7 is (more or less often) green
>
> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/development/
>
> I've noticed more and more that the CI builds for the PR's to Pharo 7 even for trivial changes are
> often broken and not green
>
> https://github.com/pharo-project/pharo/pulls
>
> which is annoying and one still has the feeling to break the builds.
>
> As always I think getting a more stable CI is dependent on time, resources and priorities. But does anyone know
> if this is a general issue or something that is already worked on?
>
> Thx
> T.
>


Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Juraj Kubelka


> On Jan 26, 2018, at 09:13, Marcus Denker <[hidden email]> wrote:
>
> Hello,
>
> Yes… I try to make notes about all failing test so we can detect patterns
>

Will you share the notes?

> It seems most of them are related to timing issues that come from the fact that
> it runs on a virtualised box…
> Others are related to the fact that we run multiple instances in parallel on the
> same machine and they try to open the same port. (not that often).

That part can be fixed using port 0 in corresponding test cases. It opens a free port.

Cheers,
Juraj

>
> Another problem is that the vm sometimes crashes (related to the font rendering bug)
>
> We really need to try to find a way to stabilise the CI.
>
> Marcus
>
>> On 26 Jan 2018, at 10:23, Torsten Bergmann <[hidden email]> wrote:
>>
>> While our general build from Pharo 7 is (more or less often) green
>>
>> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/development/
>>
>> I've noticed more and more that the CI builds for the PR's to Pharo 7 even for trivial changes are
>> often broken and not green
>>
>> https://github.com/pharo-project/pharo/pulls
>>
>> which is annoying and one still has the feeling to break the builds.
>>
>> As always I think getting a more stable CI is dependent on time, resources and priorities. But does anyone know
>> if this is a general issue or something that is already worked on?
>>
>> Thx
>> T.
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Marcus Denker-4


> On 26 Jan 2018, at 14:42, Juraj Kubelka <[hidden email]> wrote:
>
>
>
>> On Jan 26, 2018, at 09:13, Marcus Denker <[hidden email]> wrote:
>>
>> Hello,
>>
>> Yes… I try to make notes about all failing test so we can detect patterns
>>
>
> Will you share the notes?
>

Here are some from todays reviews:

testPatch – MacOS32.Zinc.Tests.ZnClientTests
        Failed to start server on port 1719. Is there one already?
       
       
testTwiceDeliveredDataSholdBeDetected – MacOS32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testTwiceDeliveredDataSholdBeDetected – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testDeliverNow3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testDeliverNow2 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testAddCollector3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testNotDeliveredDataShouldBeResent – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest


testGetPharoVersion – MacOS32.Zinc.Zodiac.ZnHTTPSTests

testExecuteOnceAfterSchedulingMultipleTimes – MacOS32.OmbuTests.OmDeferrerTest



Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Stephane Ducasse-3
In reply to this post by Torsten Bergmann
Yes now this is not easy to find the problems. I spent time reviewing
your tearDown fixes and this is annoying to get broken tests
when normally nothing should be red.

On Fri, Jan 26, 2018 at 10:23 AM, Torsten Bergmann <[hidden email]> wrote:

> While our general build from Pharo 7 is (more or less often) green
>
> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/development/
>
> I've noticed more and more that the CI builds for the PR's to Pharo 7 even for trivial changes are
> often broken and not green
>
> https://github.com/pharo-project/pharo/pulls
>
> which is annoying and one still has the feeling to break the builds.
>
> As always I think getting a more stable CI is dependent on time, resources and priorities. But does anyone know
> if this is a general issue or something that is already worked on?
>
> Thx
> T.
>

Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Marcus Denker-4
In reply to this post by Marcus Denker-4


> On 26 Jan 2018, at 15:06, Marcus Denker <[hidden email]> wrote:
>
>
>
>> On 26 Jan 2018, at 14:42, Juraj Kubelka <[hidden email]> wrote:
>>
>>
>>
>>> On Jan 26, 2018, at 09:13, Marcus Denker <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> Yes… I try to make notes about all failing test so we can detect patterns
>>>
>>
>> Will you share the notes?
>>
>
> Here are some from todays reviews:
>
> testPatch – MacOS32.Zinc.Tests.ZnClientTests
> Failed to start server on port 1719. Is there one already?
>
>
> testTwiceDeliveredDataSholdBeDetected – MacOS32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
> testTwiceDeliveredDataSholdBeDetected – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
> testDeliverNow3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
> testDeliverNow2 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
> testAddCollector3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
> testNotDeliveredDataShouldBeResent – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>
>
> testGetPharoVersion – MacOS32.Zinc.Zodiac.ZnHTTPSTests
>
> testExecuteOnceAfterSchedulingMultipleTimes – MacOS32.OmbuTests.OmDeferrerTest
>

More interesting data points:

-> end of the week (Friday to Sunday) most of the fails happened due to the macOS Slave being not
    in a good state. I killed the virtual server and re-created it sunday evening
-.> after that, we got quite a lot of successful green CI runs. As this is Sunday, my guess
is that the load on the CI infrastructure is fairly low.

==> what we need to do:
1) dead infrastructure cases need to be detected somehow.
2) the CI infrastructure overload seems to be the root of the problem to some extend.

        Marcus


Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Stephane Ducasse-3
thanks marcus.
I was wondering if we should not buy a fast machine just for the slaves.
It would be good invested money because we lose too much time for that.

stef

On Sun, Jan 28, 2018 at 8:06 PM, Marcus Denker <[hidden email]> wrote:

>
>
>> On 26 Jan 2018, at 15:06, Marcus Denker <[hidden email]> wrote:
>>
>>
>>
>>> On 26 Jan 2018, at 14:42, Juraj Kubelka <[hidden email]> wrote:
>>>
>>>
>>>
>>>> On Jan 26, 2018, at 09:13, Marcus Denker <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Yes… I try to make notes about all failing test so we can detect patterns
>>>>
>>>
>>> Will you share the notes?
>>>
>>
>> Here are some from todays reviews:
>>
>> testPatch – MacOS32.Zinc.Tests.ZnClientTests
>>       Failed to start server on port 1719. Is there one already?
>>
>>
>> testTwiceDeliveredDataSholdBeDetected – MacOS32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>> testTwiceDeliveredDataSholdBeDetected – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>> testDeliverNow3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>> testDeliverNow2 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>> testAddCollector3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>> testNotDeliveredDataShouldBeResent – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
>>
>>
>> testGetPharoVersion – MacOS32.Zinc.Zodiac.ZnHTTPSTests
>>
>> testExecuteOnceAfterSchedulingMultipleTimes – MacOS32.OmbuTests.OmDeferrerTest
>>
>
> More interesting data points:
>
> -> end of the week (Friday to Sunday) most of the fails happened due to the macOS Slave being not
>     in a good state. I killed the virtual server and re-created it sunday evening
> -.> after that, we got quite a lot of successful green CI runs. As this is Sunday, my guess
> is that the load on the CI infrastructure is fairly low.
>
> ==> what we need to do:
> 1) dead infrastructure cases need to be detected somehow.
> 2) the CI infrastructure overload seems to be the root of the problem to some extend.
>
>         Marcus
>
>

Reply | Threaded
Open this post in threaded view
|

Re: CI Hickups ...

Juraj Kubelka
In reply to this post by Marcus Denker-4
Hi Marcus,

Thanks for the report. I have opened an issue: https://pharo.manuscript.com/f/cases/21173/Some-test-cases-fail-occasionally 
I will check them this week. 

Cheers,
Juraj

On Jan 26, 2018, at 11:06, Marcus Denker <[hidden email]> wrote:



On 26 Jan 2018, at 14:42, Juraj Kubelka <[hidden email]> wrote:



On Jan 26, 2018, at 09:13, Marcus Denker <[hidden email]> wrote:

Hello,

Yes… I try to make notes about all failing test so we can detect patterns


Will you share the notes?


Here are some from todays reviews:

testPatch – MacOS32.Zinc.Tests.ZnClientTests
Failed to start server on port 1719. Is there one already?


testTwiceDeliveredDataSholdBeDetected – MacOS32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testTwiceDeliveredDataSholdBeDetected – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testDeliverNow3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testDeliverNow2 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testAddCollector3 – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest
testNotDeliveredDataShouldBeResent – Windows32.GT.EventRecorder.Tests.Core.GTEventRecorderTest


testGetPharoVersion – MacOS32.Zinc.Zodiac.ZnHTTPSTests

testExecuteOnceAfterSchedulingMultipleTimes – MacOS32.OmbuTests.OmDeferrerTest