Why does the test runner show red when I correct a test?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Tim Mackinnon
I would agree that grey is better than red - but I personally think we’re being too pedantic on this - particularly when doing TDD and coding in the debugger. If I’m writing straight forward tests and like to see a red failure (either by deliberately returning false, or a subclass responsibility, or -1) and then correct that failure in the debugger to return the correct result - then its tedious to have to run the test again (in fact it feels odd). For the rare time that I got it wrong - it will show up when I run all tests on the next phase.

I accept that others may see this the other way around - but I’m a more optimistic guy. This said - maybe we make it an option (or an easy code switch) - I’d default it to the optimistic TDD mode personally. My CI server will give me the full lowdown.

Tim

> On 15 Nov 2017, at 21:50, Sean P. DeNigris <[hidden email]> wrote:
>
> Richard Sargent wrote
>> I would go a little further. Any method modified by the developer during
>> the course of running a test voids the ability to claim the test
>> succeeded.
>> Likewise, for any object editted in an inspector.
>
> That makes sense to me.
>
>
>
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Prof. Andrew P. Black
While we are discussing colors, what should we do about a test that does not make any assertions at all?

A couple of years ago, a smart student who was working on a testing dialect for me decided that such tests should be in a new category all of their own.  Later I simplified things and just made these tests fail.  I am pretty convinced now that this is the right behaviour.  It certainly suits my purposes (teaching TDD); if the student omits to make any assertions, they have a failing test.  The message is: "Failure: test made no assertions".

Currently, such a test is green in Pharo.  In 2013, I filed a bug report on a bunch of tests of #printOn: on collections that made no assertions.  (The test writer didn’t understand how streams worked, and made assertions for each element of the empty string.)  I was reminded of this just this week, because those tests have yet to be fixed.  I suspect that if they had been yellow, rather than green, then they would have been fixed before now.

I plan to fix those tests on Friday, but I also wonder about changing the behaviour of the testing framework.  What do you think?

        Andrew

 
Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Richard Sargent
Administrator
A test that asserts nothing has only asserted that the code ran through without throwing an error. :-)

I like your proposal that such a test is an inherently failing test.


(Of course, that will result in the student adding a single assertion at the end, of the form "self assert: true"!)

On Tue, Nov 21, 2017 at 4:07 PM, Prof. Andrew P. Black <[hidden email]> wrote:
While we are discussing colors, what should we do about a test that does not make any assertions at all?

A couple of years ago, a smart student who was working on a testing dialect for me decided that such tests should be in a new category all of their own.  Later I simplified things and just made these tests fail.  I am pretty convinced now that this is the right behaviour.  It certainly suits my purposes (teaching TDD); if the student omits to make any assertions, they have a failing test.  The message is: "Failure: test made no assertions".

Currently, such a test is green in Pharo.  In 2013, I filed a bug report on a bunch of tests of #printOn: on collections that made no assertions.  (The test writer didn’t understand how streams worked, and made assertions for each element of the empty string.)  I was reminded of this just this week, because those tests have yet to be fixed.  I suspect that if they had been yellow, rather than green, then they would have been fixed before now.

I plan to fix those tests on Friday, but I also wonder about changing the behaviour of the testing framework.  What do you think?

        Andrew



Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Stephane Ducasse-3
In reply to this post by Prof. Andrew P. Black
Hi andrew

I like your idea.
It is fun and at least we can spot test.
Now sometimes we will have to have self assert: true because running a
code can be also considered as a test.
Now what I do not like is empty test method because there are green.

stef

On Tue, Nov 21, 2017 at 10:07 PM, Prof. Andrew P. Black
<[hidden email]> wrote:

> While we are discussing colors, what should we do about a test that does not make any assertions at all?
>
> A couple of years ago, a smart student who was working on a testing dialect for me decided that such tests should be in a new category all of their own.  Later I simplified things and just made these tests fail.  I am pretty convinced now that this is the right behaviour.  It certainly suits my purposes (teaching TDD); if the student omits to make any assertions, they have a failing test.  The message is: "Failure: test made no assertions".
>
> Currently, such a test is green in Pharo.  In 2013, I filed a bug report on a bunch of tests of #printOn: on collections that made no assertions.  (The test writer didn’t understand how streams worked, and made assertions for each element of the empty string.)  I was reminded of this just this week, because those tests have yet to be fixed.  I suspect that if they had been yellow, rather than green, then they would have been fixed before now.
>
> I plan to fix those tests on Friday, but I also wonder about changing the behaviour of the testing framework.  What do you think?
>
>         Andrew
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Stephane Ducasse-3
BTW I like grey for a method that was edited in the debugger.
Because red is not good. The proof is that when I rerun my test is green.
While saying I do not know you have to rerun is a good solution.
I like it.

I got always frustrated with the red.

On Fri, Nov 24, 2017 at 12:05 AM, Stephane Ducasse
<[hidden email]> wrote:

> Hi andrew
>
> I like your idea.
> It is fun and at least we can spot test.
> Now sometimes we will have to have self assert: true because running a
> code can be also considered as a test.
> Now what I do not like is empty test method because there are green.
>
> stef
>
> On Tue, Nov 21, 2017 at 10:07 PM, Prof. Andrew P. Black
> <[hidden email]> wrote:
>> While we are discussing colors, what should we do about a test that does not make any assertions at all?
>>
>> A couple of years ago, a smart student who was working on a testing dialect for me decided that such tests should be in a new category all of their own.  Later I simplified things and just made these tests fail.  I am pretty convinced now that this is the right behaviour.  It certainly suits my purposes (teaching TDD); if the student omits to make any assertions, they have a failing test.  The message is: "Failure: test made no assertions".
>>
>> Currently, such a test is green in Pharo.  In 2013, I filed a bug report on a bunch of tests of #printOn: on collections that made no assertions.  (The test writer didn’t understand how streams worked, and made assertions for each element of the empty string.)  I was reminded of this just this week, because those tests have yet to be fixed.  I suspect that if they had been yellow, rather than green, then they would have been fixed before now.
>>
>> I plan to fix those tests on Friday, but I also wonder about changing the behaviour of the testing framework.  What do you think?
>>
>>         Andrew
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Ben Coman
In reply to this post by Richard Sargent


On 16 November 2017 at 00:20, Richard Sargent <[hidden email]> wrote:
I think that setting the button back to gray is a good behaviour.
 - it is the same thing that happens once you modify a method (which is what is happening during debugging)
 - it explicitly says "please rerun the test because you may have introduced side effects"

I would go a little further. Any method modified by the developer during the course of running a test voids the ability to claim the test succeeded. Likewise, for any object editted in an inspector.ere 

So to summarise the viewpoints as I understand them, consider an interrupted test that later runs to completion
and is then...

A. incorrectly marked red (or grey) [false negative]
==> frequent during TDD
==> need to manually rerun test *every* time
==> large extra effort for developer

B. incorrectly marked green [false positive]
==> infrequent  (presumed) 
==> error picked up anytime test is run again, or when test group is run 
==> small extra effort for developer 
==> developer may be aware when they make suspect changes which undermine test result, to judge to run it again.

C. test is automatically rerun a second time
==> infrequently some tests run too long for this to be practical

A and B are like the philosophical difference between engineers and scientists.  i.e. engineers deal with approximations** that make the design process more efficient.  
For C, I still don't fully understand the concrete problem.  How much time do such tests take?  

Anyway, I prefer early efficiency with late bound correctness, so my vote would be to avoid A.


In other words, if the test run was interrupted, it cannot be considered successful. Keep things simple. (as simple as possible)

There are costs associated with perfect correctness.  i.e. costs associated with both false-negatives and false-positives.
The alternate approach is to consider the consequence of temporary false-positive.
I'd vote for making things as simple and *efficient* as possible.  

cheers -ben

 
**A mathematician, a scientist, and an engineer are given the task of finding how high a particular red rubber ball will bounce when dropped from a given height onto a given surface.
The mathematician derives the elasticity of the ball from its chemical makeup, derives the equations to determine how high it will bounce and calculates it.
The physicist takes the ball into the lab, measures its elasticity, and plugs the variables into a formula.
The engineer looks it up in his red rubber ball book.


 


On Wed, Nov 15, 2017 at 2:49 AM, Guillermo Polito <[hidden email]> wrote:


On Wed, Nov 15, 2017 at 11:41 AM, Denis Kudriashov <[hidden email]> wrote:

2017-11-15 11:08 GMT+01:00 Guillermo Polito <[hidden email]>:
On Wed, Nov 15, 2017 at 11:06 AM, Denis Kudriashov <[hidden email]> wrote:


2017-11-15 11:00 GMT+01:00 Guillermo Polito <[hidden email]>:
And just putting it back to gray? As "not run"?

We can implement any logic. 
Personally I need current behaviour.

But it is not about you personally. It is about implementing the most common and the less strange for newcomers.

To know what is the most common case people should tell personal opinion.
And in this thread only Richard was against current logic.

But you're assuming here that:
 - people that is not reading this email do not care and don't have a say
 - so pleople that is not subscribed to the mailing list don't care
 - and that includes newbies

Our role of experienced guys it not only look after "our" best defaults. But also after the defaults of people without experience.

I think that setting the button back to gray is a good behaviour.
 - it is the same thing that happens once you modify a method (which is what is happening during debugging)
 - it explicitly says "please rerun the test because you may have introduced side effects"

Unless you make the debugger more intelligent, you cannot be sure that the result you obtained at the end of the test is really reproducible. And moreover, to be able to make such assumption you should be an expert that understands how the underlying framework behaves.
 
 
 
 

On Wed, Nov 15, 2017 at 10:44 AM, Denis Kudriashov <[hidden email]> wrote:
2017-11-15 1:49 GMT+01:00 Sean P. DeNigris <[hidden email]>:
Ben Coman wrote
> Or it could go to Amber, half-way between green & red to mean probably
> correct.

Ha ha.

Again, it seems that just automatically rerunning the test immediately after
a human-manipulated run and setting the color based on that second run
addresses all points on both sides, no?

Except that sometimes we are debugging slow test and running it second time automatically after "proceed" can be not appropriate.
We are talking about single test run. If user have any doubts about result It is his responsibility to rerun the test. User knows what he is doing when he debug and fix the test. No intelligence is required here.

And anyway current fix just provides consistent behaviour to debugging from explicit breakpoint/halt. In that case the result was always in sync with debug session.





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13



Reply | Threaded
Open this post in threaded view
|

Re: Why does the test runner show red when I correct a test?

Richard Sargent
Administrator
Ben, I think you understand the dilemma. 

My philosophy is to avoid claims that are known to be not provably true. i.e. don't colour it green, because the test runner cannot claim that knowledge.

I am agnostic to whether it should be left uncoloured or coloured red to document the fact that it failed (as in was unable to run to completion without a problem). I truly don't care.

Our profession has a tendency to make claims which aren't provably true. It is one of the many reasons we have poor software. So I am adamant about avoiding that particular mistake. And, I am adamant about dissuading others from the same kinds of mistakes.

Write software which conveys exactly what it can claim without Deus ex machina intervention. The test failed. It was unable to run to completion without an error. It doesn't matter whether that particular error was corrected or not. The software running the test is incapable of determining whether the change that allowed the test to run to completion was code, data editing, or something else and it is incapable of knowing whether whatever allowed the test to finish is a correct fix for the error.

Be adamant about what your software can know to be true. Equally important, be adamant about your own approach to problems and solutions. Godel had some good advice.

On Nov 23, 2017 19:31, "Ben Coman" <[hidden email]> wrote:


On 16 November 2017 at 00:20, Richard Sargent <[hidden email]> wrote:
I think that setting the button back to gray is a good behaviour.
 - it is the same thing that happens once you modify a method (which is what is happening during debugging)
 - it explicitly says "please rerun the test because you may have introduced side effects"

I would go a little further. Any method modified by the developer during the course of running a test voids the ability to claim the test succeeded. Likewise, for any object editted in an inspector.ere 

So to summarise the viewpoints as I understand them, consider an interrupted test that later runs to completion
and is then...

A. incorrectly marked red (or grey) [false negative]
==> frequent during TDD
==> need to manually rerun test *every* time
==> large extra effort for developer

B. incorrectly marked green [false positive]
==> infrequent  (presumed) 
==> error picked up anytime test is run again, or when test group is run 
==> small extra effort for developer 
==> developer may be aware when they make suspect changes which undermine test result, to judge to run it again.

C. test is automatically rerun a second time
==> infrequently some tests run too long for this to be practical

A and B are like the philosophical difference between engineers and scientists.  i.e. engineers deal with approximations** that make the design process more efficient.  
For C, I still don't fully understand the concrete problem.  How much time do such tests take?  

Anyway, I prefer early efficiency with late bound correctness, so my vote would be to avoid A.


In other words, if the test run was interrupted, it cannot be considered successful. Keep things simple. (as simple as possible)

There are costs associated with perfect correctness.  i.e. costs associated with both false-negatives and false-positives.
The alternate approach is to consider the consequence of temporary false-positive.
I'd vote for making things as simple and *efficient* as possible.  

cheers -ben

 
**A mathematician, a scientist, and an engineer are given the task of finding how high a particular red rubber ball will bounce when dropped from a given height onto a given surface.
The mathematician derives the elasticity of the ball from its chemical makeup, derives the equations to determine how high it will bounce and calculates it.
The physicist takes the ball into the lab, measures its elasticity, and plugs the variables into a formula.
The engineer looks it up in his red rubber ball book.


 


On Wed, Nov 15, 2017 at 2:49 AM, Guillermo Polito <[hidden email]> wrote:


On Wed, Nov 15, 2017 at 11:41 AM, Denis Kudriashov <[hidden email]> wrote:

2017-11-15 11:08 GMT+01:00 Guillermo Polito <[hidden email]>:
On Wed, Nov 15, 2017 at 11:06 AM, Denis Kudriashov <[hidden email]> wrote:


2017-11-15 11:00 GMT+01:00 Guillermo Polito <[hidden email]>:
And just putting it back to gray? As "not run"?

We can implement any logic. 
Personally I need current behaviour.

But it is not about you personally. It is about implementing the most common and the less strange for newcomers.

To know what is the most common case people should tell personal opinion.
And in this thread only Richard was against current logic.

But you're assuming here that:
 - people that is not reading this email do not care and don't have a say
 - so pleople that is not subscribed to the mailing list don't care
 - and that includes newbies

Our role of experienced guys it not only look after "our" best defaults. But also after the defaults of people without experience.

I think that setting the button back to gray is a good behaviour.
 - it is the same thing that happens once you modify a method (which is what is happening during debugging)
 - it explicitly says "please rerun the test because you may have introduced side effects"

Unless you make the debugger more intelligent, you cannot be sure that the result you obtained at the end of the test is really reproducible. And moreover, to be able to make such assumption you should be an expert that understands how the underlying framework behaves.
 
 
 
 

On Wed, Nov 15, 2017 at 10:44 AM, Denis Kudriashov <[hidden email]> wrote:
2017-11-15 1:49 GMT+01:00 Sean P. DeNigris <[hidden email]>:
Ben Coman wrote
> Or it could go to Amber, half-way between green & red to mean probably
> correct.

Ha ha.

Again, it seems that just automatically rerunning the test immediately after
a human-manipulated run and setting the color based on that second run
addresses all points on both sides, no?

Except that sometimes we are debugging slow test and running it second time automatically after "proceed" can be not appropriate.
We are talking about single test run. If user have any doubts about result It is his responsibility to rerun the test. User knows what he is doing when he debug and fix the test. No intelligence is required here.

And anyway current fix just provides consistent behaviour to debugging from explicit breakpoint/halt. In that case the result was always in sync with debug session.





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13



123