Analysis of CI issues and call to action

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Analysis of CI issues and call to action

Max Leske
Hi

I’ve been poking at the issue validator to figure out why it’s behaving so funny. I’ve run these tests locally and either the first part of the validation runs fine and the validation proceeds to run the tests or it fails after about one minute due to a timeout (which isn’t actually one, see below).
I found two things which may or may not be related:

1. Many times (but not always) the forked images signal the scheduled delays upon startup.
2. Some of the forked images do not (always) get a new session (it’s like the startup code doesn’t even run… so weird).

AFAICT, both always occur together. The main effect of this is that timeout blocks get evaluated when there’s really no timeout (the timeout is set to 100 minutes for rules and 20 minutes for forked images). A timeout is treated as a validation error which is reflected in the output:

Validation Errors: a CISUnitTestsRule: Timeout (after 0:01:40:00) occured while loading "Can not create a subclass of TestCase”

A different effect is that the image worker can not distinguish the forked image from the parent image since the session hasn’t changed. This can (potentially) lead to the parent image waiting for the forked image which is waiting for itself (thinking it is the parent image) and the whole thing will be hung up.
Note that “forked” in this context actually means an image that is a copy, saved with #backupAs: and run with OSProcess (see ImageWorker).


I think both issues need to be investigated urgently. They are not only a problem for the CI but have implications for every user. I have opened two sub cases of https://pharo.fogbugz.com/f/cases/17778/Monkey-now-fails-all-the-time to track both these issues:

https://pharo.fogbugz.com/f/cases/17814/Delays-in-forked-images-get-signaled-upon-startup
https://pharo.fogbugz.com/f/cases/17815/Session-isn-t-always-updated-correctly-in-forked-images



Please, if you have a little spare time (and a lot of patience) help us to get rid of these issues.


To run the code locally you can use the tracker image (with some extra logging and missing information filled in by me) found here: https://www.dropbox.com/s/7b6hbsb8gjxagxc/tracker_image.zip?dl=0 or get your own fresh copy from here: https://ci.inria.fr/pharo/view/5.0-Analysis/job/Pharo-5.0-Issue-Tracker-Image/.
Then run it from the console like so: ./pharo ~/Downloads/Pharo-5/Pharo-5.0-Issue-Tracker-Image.image ci slice --issue=17809

I’ve tested this with

CoInterpreter VMMaker.oscog-eem.1722 uuid: e5c44d63-ba75-4cd1-bf4e-c92c4232bbfe Mar 11 2016
StackToRegisterMappingCogit VMMaker.oscog-eem.1722 uuid: e5c44d63-ba75-4cd1-bf4e-c92c4232bbfe Mar 11 2016
https://github.com/pharo-project/pharo-vm.git Commit: ded69213d3b73ffbda27417c19cb75fd6b325c82 Date: 2016-03-11 17:48:52 +0100 By: Esteban Lorenzano <[hidden email]> Jenkins build #572

on OS X 10.11.3 (El Capitan).


Cheers,
Max
Reply | Threaded
Open this post in threaded view
|

Re: Analysis of CI issues and call to action

stepharo
***THANKS A LOT, MAX!!!***

Le 13/3/16 17:07, Max Leske a écrit :

> Hi
>
> I’ve been poking at the issue validator to figure out why it’s behaving so funny. I’ve run these tests locally and either the first part of the validation runs fine and the validation proceeds to run the tests or it fails after about one minute due to a timeout (which isn’t actually one, see below).
> I found two things which may or may not be related:
>
> 1. Many times (but not always) the forked images signal the scheduled delays upon startup.
> 2. Some of the forked images do not (always) get a new session (it’s like the startup code doesn’t even run… so weird).
>
> AFAICT, both always occur together. The main effect of this is that timeout blocks get evaluated when there’s really no timeout (the timeout is set to 100 minutes for rules and 20 minutes for forked images). A timeout is treated as a validation error which is reflected in the output:
>
> Validation Errors: a CISUnitTestsRule: Timeout (after 0:01:40:00) occured while loading "Can not create a subclass of TestCase”
>
> A different effect is that the image worker can not distinguish the forked image from the parent image since the session hasn’t changed. This can (potentially) lead to the parent image waiting for the forked image which is waiting for itself (thinking it is the parent image) and the whole thing will be hung up.
> Note that “forked” in this context actually means an image that is a copy, saved with #backupAs: and run with OSProcess (see ImageWorker).
>
>
> I think both issues need to be investigated urgently. They are not only a problem for the CI but have implications for every user. I have opened two sub cases of https://pharo.fogbugz.com/f/cases/17778/Monkey-now-fails-all-the-time to track both these issues:
>
> https://pharo.fogbugz.com/f/cases/17814/Delays-in-forked-images-get-signaled-upon-startup
> https://pharo.fogbugz.com/f/cases/17815/Session-isn-t-always-updated-correctly-in-forked-images
>
>
>
> Please, if you have a little spare time (and a lot of patience) help us to get rid of these issues.
>
>
> To run the code locally you can use the tracker image (with some extra logging and missing information filled in by me) found here: https://www.dropbox.com/s/7b6hbsb8gjxagxc/tracker_image.zip?dl=0 or get your own fresh copy from here: https://ci.inria.fr/pharo/view/5.0-Analysis/job/Pharo-5.0-Issue-Tracker-Image/.
> Then run it from the console like so: ./pharo ~/Downloads/Pharo-5/Pharo-5.0-Issue-Tracker-Image.image ci slice --issue=17809
>
> I’ve tested this with
>
> CoInterpreter VMMaker.oscog-eem.1722 uuid: e5c44d63-ba75-4cd1-bf4e-c92c4232bbfe Mar 11 2016
> StackToRegisterMappingCogit VMMaker.oscog-eem.1722 uuid: e5c44d63-ba75-4cd1-bf4e-c92c4232bbfe Mar 11 2016
> https://github.com/pharo-project/pharo-vm.git Commit: ded69213d3b73ffbda27417c19cb75fd6b325c82 Date: 2016-03-11 17:48:52 +0100 By: Esteban Lorenzano <[hidden email]> Jenkins build #572
>
> on OS X 10.11.3 (El Capitan).
>
>
> Cheers,
> Max
>