Re: [Pharo-project] SUnit Time out

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

Chris Muller-3
(Copying squeak-dev too).

I'm not sold on the whole test timeout thing.  When I run tests, I
want to know the answer to the question, "is the software working?"

Putting a timeout on tests trades a slower, but definitive, "yes" or
"no" for a supposedly-faster "maybe".  But is getting a "maybe" back
really faster?  I've just incurred the cost of running a test suite,
but left without my answer.  I get a "maybe", what am I supposed to do
next?  Find a faster machine?  Hack into the code to fiddle with a
timeout pragma?  That's not faster..

But, the reason given for the change was not for running tests
interactively (the 99% case), rather, all tests form the beginning of
time are now saddled with a timeout for the 1% case:

  "The purpose of the timeout is to catch issues like infinite loops,
unexpected user input etc. in automated test environments."

If tests are supposed to be quick (and deterministic) anyway, wouldn't
an infinite loop or user-input be caught the first time the test was
run (interactively)?  Seriously, when you make software changes, we
run the tests interactively first, and then the purpose of night-time
automated test environment is to catch regressions on the merged
code..

In that case, the high-level test-controller which spits out the
results could and should be responsible for handling "unexpected user
input" and/or putting in a timeout, not each and every last test
method..

IMO, we want short tests, so let's just write them to be short.  If
they're too long, then the encouragement to shorten them comes from
our own impatience of running them interactively.  Running them in
batch at night requires no patience, because we're sleeping, and
besides, the batch processor should take responsibility for handling
those rare scenarios at a higher-level..

Regards,
  Chris


On Sat, May 29, 2010 at 2:53 AM, stephane ducasse
<[hidden email]> wrote:

> Hi guys
>
> in Squeak andreas introduced the idea of test time out
> Do you think that this is interesting?
>
> Stef
>
> SUnit
> -----
> All test cases now have an associated timeout after which the test is considered failed. The purpose of the timeout is to catch issues like infinite loops, unexpected user input etc. in automated test environments. Timeouts can be set on an individual test basis using the <timeout: seconds> tag or for an entire test case by implementing the #defaultTimeout method.
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

Travis Griggs-4
Well put.

Sent from my iPhone

On May 30, 2010, at 11:52, Chris Muller <[hidden email]> wrote:

> (Copying squeak-dev too).
>
> I'm not sold on the whole test timeout thing.  When I run tests, I
> want to know the answer to the question, "is the software working?"
>
> Putting a timeout on tests trades a slower, but definitive, "yes" or
> "no" for a supposedly-faster "maybe".  But is getting a "maybe" back
> really faster?  I've just incurred the cost of running a test suite,
> but left without my answer.  I get a "maybe", what am I supposed to do
> next?  Find a faster machine?  Hack into the code to fiddle with a
> timeout pragma?  That's not faster..
>
> But, the reason given for the change was not for running tests
> interactively (the 99% case), rather, all tests form the beginning of
> time are now saddled with a timeout for the 1% case:
>
>  "The purpose of the timeout is to catch issues like infinite loops,
> unexpected user input etc. in automated test environments."
>
> If tests are supposed to be quick (and deterministic) anyway, wouldn't
> an infinite loop or user-input be caught the first time the test was
> run (interactively)?  Seriously, when you make software changes, we
> run the tests interactively first, and then the purpose of night-time
> automated test environment is to catch regressions on the merged
> code..
>
> In that case, the high-level test-controller which spits out the
> results could and should be responsible for handling "unexpected user
> input" and/or putting in a timeout, not each and every last test
> method..
>
> IMO, we want short tests, so let's just write them to be short.  If
> they're too long, then the encouragement to shorten them comes from
> our own impatience of running them interactively.  Running them in
> batch at night requires no patience, because we're sleeping, and
> besides, the batch processor should take responsibility for handling
> those rare scenarios at a higher-level..
>
> Regards,
>  Chris
>
>
> On Sat, May 29, 2010 at 2:53 AM, stephane ducasse
> <[hidden email]> wrote:
>> Hi guys
>>
>> in Squeak andreas introduced the idea of test time out
>> Do you think that this is interesting?
>>
>> Stef
>>
>> SUnit
>> -----
>> All test cases now have an associated timeout after which the test  
>> is considered failed. The purpose of the timeout is to catch issues  
>> like infinite loops, unexpected user input etc. in automated test  
>> environments. Timeouts can be set on an individual test basis using  
>> the <timeout: seconds> tag or for an entire test case by  
>> implementing the #defaultTimeout method.
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

Andreas.Raab
In reply to this post by Chris Muller-3
Hi Chris -

Let me comment on this from a more general point of view first, before
going into the specifics. I've spent the last five years building a
distributed system and during this time I've learned a couple of things
about the value of timeouts :-) One thing that I've come to understand
is that *no* operation is unbounded. We may leisurely talk about "just
wait until it's done" but the reality is that regardless of what the
operation is we never actually wait forever. At some point we *will*
give up no matter what you may think. This is THE fundamental point
here. Everything else is basically haggling about what the right timeout is.

For the right timeout the second fundamental thing to understand is that
if there's a question of whether the operation "maybe" completed, then
your timeout is too short. Period. The timeout's value is not to
indicate that "maybe" the operation completed, it is there to say
unequivocally that something caused it to not complete and that it DID fail.

Obviously, introducing timeouts will create some initial false
positives. But it may be interesting to be a bit more precise on what
we're talking about. To do this I attributed TestRunner to measure the
time it takes to run each test and then ran all the tests in 4.2 to see
where that leads us. As you might expect, the distribution is extremely
uneven. Out of 2681 tests run  2588 execute in < 500 msecs (approx. 1800
execute with no measurable time);  2630 execute in less than one second,
leaving a total of 51 that take more than a second and only three tests
actually take longer than 5 seconds and they are all tagged as such.

As you can see the vast majority of tests have a "safety margin" of 10x
or more between the time the test usually takes and its timeout value.
Generally speaking, this margin is sufficient to compensate for "other"
effects that might rightfully delay the completion of the test in time.
If you have tests that commonly vary by 10x I'd be interested in finding
out more about what makes them so unpredictable.

So if your question is "are my timeouts to tight" one thing we could do
is to introduce the 10x as a more or less general guideline for
executing tests, and perhaps add a transcript notifier if we ever come
closer than 1/3rd of the specified timeout value (i.e., indicating that
something in the nature of the test has changed that should be reflected
in its timeout). This would give you ample warning that you need to
adjust your test even if it isn't (yet) failing on the timeout.

That said, a couple of concrete comments to your post:

On 5/30/2010 11:52 AM, Chris Muller wrote:
> (Copying squeak-dev too).
>
> I'm not sold on the whole test timeout thing.  When I run tests, I
> want to know the answer to the question, "is the software working?"

Correct.

> Putting a timeout on tests trades a slower, but definitive, "yes" or
> "no" for a supposedly-faster "maybe".  But is getting a "maybe" back
> really faster?  I've just incurred the cost of running a test suite,
> but left without my answer.  I get a "maybe", what am I supposed to do
> next?  Find a faster machine?  Hack into the code to fiddle with a
> timeout pragma?  That's not faster..

See above. If you're thinking "maybe", then the timeout is too short.

> But, the reason given for the change was not for running tests
> interactively (the 99% case), rather, all tests form the beginning of
> time are now saddled with a timeout for the 1% case:

As the data shows, this is already the case. It may be interesting to
note that so far there were a total of 5 (five) places that had to be
adjusted in Squeak. One was a general place (the default timeout for the
decompiler tests) and four were individual methods. Considering that
computers usually don't become slower over time, it seems unlikely that
further adjustments will be necessary here. So the bottom line is that
the changes required aren't exactly excessive.

>    "The purpose of the timeout is to catch issues like infinite loops,
> unexpected user input etc. in automated test environments."
>
> If tests are supposed to be quick (and deterministic) anyway, wouldn't
> an infinite loop or user-input be caught the first time the test was
> run (interactively)?  Seriously, when you make software changes, we
> run the tests interactively first, and then the purpose of night-time
> automated test environment is to catch regressions on the merged
> code.

These changes are largely intended for automated integration testing. I
am hoping to automate the tests for community supported packages to a
point where there will be no user in front of the system. Even if there
were, it's not clear whether that person can fix the issue immediately
or whether the entire process is stuck because someone can momentarily
not fix the problem at hand and the tests will never run to completion
and produce any useful result.

So the idea here is not that unit tests are *only* to catch regressions
in previously manually tested (combinations of) code. The idea is to
catch interactions, and integration bugs and be able to produce a result
even if there is no user to watch the particular combination of packages
being loaded together in this particular form.

Perhaps that is our problem here? It seems to me that you're taking a
view that says unit tests are exclusively for regression testing and
consequently there is no way a previously successful test would suddenly
become unsuccessful in a way that makes it time out ... but you know,
having written this sentence, it makes no sense to me. If we'd know
beforehand that tests fail only in particular known ways we wouldn't
have to run them to begin with. The whole idea of running the tests to
catch *unexpected* situations and as a consequence there is value of
capturing these situations instead of hanging and producing no useful
result.

> In that case, the high-level test-controller which spits out the
> results could and should be responsible for handling "unexpected user
> input" and/or putting in a timeout, not each and every last test
> method..

Do you have such a "high-level test-controller"? Or do you mean a human
being spending their time watching the tests run to completion? If the
former, I'm curious as to how it would differ from what I did. If the
latter, are you volunteering? ;-)

> IMO, we want short tests, so let's just write them to be short.  If
> they're too long, then the encouragement to shorten them comes from
> our own impatience of running them interactively.  Running them in
> batch at night requires no patience, because we're sleeping, and
> besides, the batch processor should take responsibility for handling
> those rare scenarios at a higher-level..

The goal for the timeouts is *not* to cause you to write shorter tests.
If you're looking at it this way you're looking at it from the wrong
angle. Up your timeout to whatever you feel is sensible to have trust in
the results of the tests. As I said earlier, I'm quite happy to discuss
the default timeout; it's simply that with some 95% coverage on a 10x
safety margin it feels to me that we're playing it safe enough for the
remaining cases to have explicit timeouts.

Cheers,
   - Andreas



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

Chris Muller-3
Thanks for clarifying your goals w.r.t. introducing the timeout.  I
think that's important because, as I've said, legacy tests that live
in external packages are affected.

I read your whole note a few times, and one part in particular stuck
out to me as a potentially useful use-case for test-case timeout:

> These changes are largely intended for automated integration testing. I am
> hoping to automate the tests for community supported packages to a point
> where there will be no user in front of the system.

If, by this, you mean you want to simply have a headless running
squeak image which:

  [ true ] whileTrue:
    [ loadLatestPackageCombinations.
    runTestSuite.
    mailResultsToSqueakDev ]

THEN, that brings us down to only haggling about the default timeout,
although I still would prefer to handle timeout it at a higher level..

If, however, this isn't the goal, then I still don't seem to have
grasped, what I sense is, some key point.. or that my own concerns
were properly understood.  If so, let me try one more time.  :)

> done" but the reality is that regardless of what the operation is we never
> actually wait forever. At some point we *will* give up no matter what you
> may think. This is THE fundamental point here. Everything else is basically
> haggling about what the right timeout is.

Of course we would "give up" after an unreasonable amount of time.  In
either case, there is something to interrogate, either a live looping
test-runner machine, or a static report of test results with one or
more that say, "timed out".

In the former case, we have a bevy of useful information, (e.g., which
test is it trying to run?  How much memory is the test image using
right now?  Can I Alt+. interrupt it and get even more information?)

In the latter case, there is no choice but to start at square 1:  Try
to recreate the problem.  (What if it works?)

Personally, I would always prefer to deal with the former case than the latter..

> For the right timeout the second fundamental thing to understand is that if
> there's a question of whether the operation "maybe" completed, then your
> timeout is too short. Period. The timeout's value is not to indicate that
> "maybe" the operation completed, it is there to say unequivocally that
> something caused it to not complete and that it DID fail.

I didn't understand this.  There is no question about "maybe
completed".  We know if a test times out then it _didn't_ complete.
The "maybe" I referred to was about the core question:  whether the
underlying software being tested can be used or not.  "Maybe" it
could, then again, maybe it shouldn't.  It sounds like we agree, a
timeout would *have* to be regarded as a failure.

> Obviously, introducing timeouts will create some initial false positives.

You mean false negatives?  If we are saying that we must treat a
timeout as failure, and failure is "negative", then a timeout would be
false negative or a true negative....?

> But it may be interesting to be a bit more precise on what we're talking
> about. To do this I attributed TestRunner to measure the time it takes to
> run each test and then ran all the tests in 4.2 to see where that leads us.
> As you might expect, the distribution is extremely uneven. Out of 2681 tests
> run  2588 execute in < 500 msecs (approx. 1800 execute with no measurable
> time);  2630 execute in less than one second, leaving a total of 51 that
> take more than a second and only three tests actually take longer than 5
> seconds and they are all tagged as such.

That's fine for the 4.2 tests, but there are hundreds of tests in
external packages.  With a mere 5-second default, many will need to be
updated with a pragma.  But then we're talking about a branch in the
package because that won't be backward compatible with 3.9, will it?

> As you can see the vast majority of tests have a "safety margin" of 10x or
> more between the time the test usually takes and its timeout value.
> Generally speaking, this margin is sufficient to compensate for "other"
> effects that might rightfully delay the completion of the test in time.

I can see that jacking up the timeout may tend reduce the number of
false negatives (at the expense of potentially longer wait times!),
but when they do, we have no useful information whatsoever.  Not even
certainty whether the underlying software is usable or not, because it
could be a false negative.

> If
> you have tests that commonly vary by 10x I'd be interested in finding out
> more about what makes them so unpredictable.

Well, again, it's not just about randomness in the tests but also
about external factors; CPU speed, current system load, etc.

> So if your question is "are my timeouts to tight" one thing we could do is
> to introduce the 10x as a more or less general guideline for executing
> tests,

Ok, with that kind of margin, the message I'm getting from you is that
it does about making a human have to wait.  We just want to make sure
we "get some kind of report?"

>> But, the reason given for the change was not for running tests
>> interactively (the 99% case), rather, all tests form the beginning of
>> time are now saddled with a timeout for the 1% case:
>
> As the data shows, this is already the case. It may be interesting to note
> that so far there were a total of 5 (five) places that had to be adjusted in
> Squeak.

I'm not worried about the built-in tests; recall I acknowledged that I
can "almost understand" a forced timeout in the context of an
open-source project where people are all contributing their portions
and no one else wants to be "held up" because of one persons tests
looping.

My concern is more about the impact to legacy external packages..

>  One was a general place (the default timeout for the decompiler
> tests) and four were individual methods. Considering that computers usually
> don't become slower over time, it seems unlikely that further adjustments
> will be necessary here.

Well, they do..  It's not just a function of time, but who's running
it, and on which machine.  We all have different machines.  Maybe
someone wants to test on an iPhone that might be considerably slower
than the original desktop on which the timeout was specified...

> So the bottom line is that the changes required
> aren't exactly excessive.

That depends on whether, to have an Community Supported Package be
included, how many test methods I have and whether I also want that to
run in 3.9 and whether, to do that, I have to put in a pragma..
(unless I'm mistaken about pragmas working in 3.9).

Bottom line:  Today Magma runs on 3.9 - 4.2 + Pharo.  Some of Magma's
tests necessarily take several minutes.

Question:  Can Magma be a CSP and still retain this wide compatibility?

> These changes are largely intended for automated integration testing. I am
> hoping to automate the tests for community supported packages to a point
> where there will be no user in front of the system.
>
> Even if there were, it's
> not clear whether that person can fix the issue immediately or whether the
> entire process is stuck because someone can momentarily not fix the problem
> at hand and the tests will never run to completion and produce any useful
> result.

Who is "that person" and what is their role?

> begin with. The whole idea of running the tests to catch *unexpected*
> situations and as a consequence there is value of capturing these situations
> instead of hanging and producing no useful result.

To me, "timed out" is what is not useful.  To find a hanging machine
that can be interrogated is much more useful.

>> In that case, the high-level test-controller which spits out the
>> results could and should be responsible for handling "unexpected user
>> input" and/or putting in a timeout, not each and every last test
>> method..
>
> Do you have such a "high-level test-controller"? Or do you mean a human
> being spending their time watching the tests run to completion? If the
> former, I'm curious as to how it would differ from what I did. If the
> latter, are you volunteering? ;-)

I meant the former.  It differs from what you did in that it preserves
legacy compatibilty, and the legacy deterministic property of testing.
 To handle automated test server, I would handle the on-timeout: from
a much higher place, and therefore it would not be for individual
tests, but for the whole suite.  Information about the last running
test would be sufficient for me, especially if we're talking about all
of the other disadvantages I've mentioned for fine-grained timeouts..

 - Chris

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

Casey Ransberger-2
Usually in a test, "false positive" is when the test thinks it found a bug, but there's actually something wrong with the test. "False negative" usually means that a test erroneously passed when it shouldn't have. Of course I am probably speaking a regional dialect which may be somewhat rooted in Seattle, WA test culture:)



On Jun 2, 2010, at 5:09 PM, Chris Muller <[hidden email]> wrote:

> Thanks for clarifying your goals w.r.t. introducing the timeout.  I
> think that's important because, as I've said, legacy tests that live
> in external packages are affected.
>
> I read your whole note a few times, and one part in particular stuck
> out to me as a potentially useful use-case for test-case timeout:
>
>> These changes are largely intended for automated integration testing. I am
>> hoping to automate the tests for community supported packages to a point
>> where there will be no user in front of the system.
>
> If, by this, you mean you want to simply have a headless running
> squeak image which:
>
>  [ true ] whileTrue:
>    [ loadLatestPackageCombinations.
>    runTestSuite.
>    mailResultsToSqueakDev ]
>
> THEN, that brings us down to only haggling about the default timeout,
> although I still would prefer to handle timeout it at a higher level..
>
> If, however, this isn't the goal, then I still don't seem to have
> grasped, what I sense is, some key point.. or that my own concerns
> were properly understood.  If so, let me try one more time.  :)
>
>> done" but the reality is that regardless of what the operation is we never
>> actually wait forever. At some point we *will* give up no matter what you
>> may think. This is THE fundamental point here. Everything else is basically
>> haggling about what the right timeout is.
>
> Of course we would "give up" after an unreasonable amount of time.  In
> either case, there is something to interrogate, either a live looping
> test-runner machine, or a static report of test results with one or
> more that say, "timed out".
>
> In the former case, we have a bevy of useful information, (e.g., which
> test is it trying to run?  How much memory is the test image using
> right now?  Can I Alt+. interrupt it and get even more information?)
>
> In the latter case, there is no choice but to start at square 1:  Try
> to recreate the problem.  (What if it works?)
>
> Personally, I would always prefer to deal with the former case than the latter..
>
>> For the right timeout the second fundamental thing to understand is that if
>> there's a question of whether the operation "maybe" completed, then your
>> timeout is too short. Period. The timeout's value is not to indicate that
>> "maybe" the operation completed, it is there to say unequivocally that
>> something caused it to not complete and that it DID fail.
>
> I didn't understand this.  There is no question about "maybe
> completed".  We know if a test times out then it _didn't_ complete.
> The "maybe" I referred to was about the core question:  whether the
> underlying software being tested can be used or not.  "Maybe" it
> could, then again, maybe it shouldn't.  It sounds like we agree, a
> timeout would *have* to be regarded as a failure.
>
>> Obviously, introducing timeouts will create some initial false positives.
>
> You mean false negatives?  If we are saying that we must treat a
> timeout as failure, and failure is "negative", then a timeout would be
> false negative or a true negative....?
>
>> But it may be interesting to be a bit more precise on what we're talking
>> about. To do this I attributed TestRunner to measure the time it takes to
>> run each test and then ran all the tests in 4.2 to see where that leads us.
>> As you might expect, the distribution is extremely uneven. Out of 2681 tests
>> run  2588 execute in < 500 msecs (approx. 1800 execute with no measurable
>> time);  2630 execute in less than one second, leaving a total of 51 that
>> take more than a second and only three tests actually take longer than 5
>> seconds and they are all tagged as such.
>
> That's fine for the 4.2 tests, but there are hundreds of tests in
> external packages.  With a mere 5-second default, many will need to be
> updated with a pragma.  But then we're talking about a branch in the
> package because that won't be backward compatible with 3.9, will it?
>
>> As you can see the vast majority of tests have a "safety margin" of 10x or
>> more between the time the test usually takes and its timeout value.
>> Generally speaking, this margin is sufficient to compensate for "other"
>> effects that might rightfully delay the completion of the test in time.
>
> I can see that jacking up the timeout may tend reduce the number of
> false negatives (at the expense of potentially longer wait times!),
> but when they do, we have no useful information whatsoever.  Not even
> certainty whether the underlying software is usable or not, because it
> could be a false negative.
>
>> If
>> you have tests that commonly vary by 10x I'd be interested in finding out
>> more about what makes them so unpredictable.
>
> Well, again, it's not just about randomness in the tests but also
> about external factors; CPU speed, current system load, etc.
>
>> So if your question is "are my timeouts to tight" one thing we could do is
>> to introduce the 10x as a more or less general guideline for executing
>> tests,
>
> Ok, with that kind of margin, the message I'm getting from you is that
> it does about making a human have to wait.  We just want to make sure
> we "get some kind of report?"
>
>>> But, the reason given for the change was not for running tests
>>> interactively (the 99% case), rather, all tests form the beginning of
>>> time are now saddled with a timeout for the 1% case:
>>
>> As the data shows, this is already the case. It may be interesting to note
>> that so far there were a total of 5 (five) places that had to be adjusted in
>> Squeak.
>
> I'm not worried about the built-in tests; recall I acknowledged that I
> can "almost understand" a forced timeout in the context of an
> open-source project where people are all contributing their portions
> and no one else wants to be "held up" because of one persons tests
> looping.
>
> My concern is more about the impact to legacy external packages..
>
>> One was a general place (the default timeout for the decompiler
>> tests) and four were individual methods. Considering that computers usually
>> don't become slower over time, it seems unlikely that further adjustments
>> will be necessary here.
>
> Well, they do..  It's not just a function of time, but who's running
> it, and on which machine.  We all have different machines.  Maybe
> someone wants to test on an iPhone that might be considerably slower
> than the original desktop on which the timeout was specified...
>
>> So the bottom line is that the changes required
>> aren't exactly excessive.
>
> That depends on whether, to have an Community Supported Package be
> included, how many test methods I have and whether I also want that to
> run in 3.9 and whether, to do that, I have to put in a pragma..
> (unless I'm mistaken about pragmas working in 3.9).
>
> Bottom line:  Today Magma runs on 3.9 - 4.2 + Pharo.  Some of Magma's
> tests necessarily take several minutes.
>
> Question:  Can Magma be a CSP and still retain this wide compatibility?
>
>> These changes are largely intended for automated integration testing. I am
>> hoping to automate the tests for community supported packages to a point
>> where there will be no user in front of the system.
>>
>> Even if there were, it's
>> not clear whether that person can fix the issue immediately or whether the
>> entire process is stuck because someone can momentarily not fix the problem
>> at hand and the tests will never run to completion and produce any useful
>> result.
>
> Who is "that person" and what is their role?
>
>> begin with. The whole idea of running the tests to catch *unexpected*
>> situations and as a consequence there is value of capturing these situations
>> instead of hanging and producing no useful result.
>
> To me, "timed out" is what is not useful.  To find a hanging machine
> that can be interrogated is much more useful.
>
>>> In that case, the high-level test-controller which spits out the
>>> results could and should be responsible for handling "unexpected user
>>> input" and/or putting in a timeout, not each and every last test
>>> method..
>>
>> Do you have such a "high-level test-controller"? Or do you mean a human
>> being spending their time watching the tests run to completion? If the
>> former, I'm curious as to how it would differ from what I did. If the
>> latter, are you volunteering? ;-)
>
> I meant the former.  It differs from what you did in that it preserves
> legacy compatibilty, and the legacy deterministic property of testing.
> To handle automated test server, I would handle the on-timeout: from
> a much higher place, and therefore it would not be for individual
> tests, but for the whole suite.  Information about the last running
> test would be sufficient for me, especially if we're talking about all
> of the other disadvantages I've mentioned for fine-grained timeouts..
>
> - Chris
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

hernan.wilkinson
In reply to this post by Chris Muller-3
I completely agree 

On Sun, May 30, 2010 at 2:52 PM, Chris Muller <[hidden email]> wrote:
(Copying squeak-dev too).

I'm not sold on the whole test timeout thing.  When I run tests, I
want to know the answer to the question, "is the software working?"

Putting a timeout on tests trades a slower, but definitive, "yes" or
"no" for a supposedly-faster "maybe".  But is getting a "maybe" back
really faster?  I've just incurred the cost of running a test suite,
but left without my answer.  I get a "maybe", what am I supposed to do
next?  Find a faster machine?  Hack into the code to fiddle with a
timeout pragma?  That's not faster..

But, the reason given for the change was not for running tests
interactively (the 99% case), rather, all tests form the beginning of
time are now saddled with a timeout for the 1% case:

 "The purpose of the timeout is to catch issues like infinite loops,
unexpected user input etc. in automated test environments."

If tests are supposed to be quick (and deterministic) anyway, wouldn't
an infinite loop or user-input be caught the first time the test was
run (interactively)?  Seriously, when you make software changes, we
run the tests interactively first, and then the purpose of night-time
automated test environment is to catch regressions on the merged
code..

In that case, the high-level test-controller which spits out the
results could and should be responsible for handling "unexpected user
input" and/or putting in a timeout, not each and every last test
method..

IMO, we want short tests, so let's just write them to be short.  If
they're too long, then the encouragement to shorten them comes from
our own impatience of running them interactively.  Running them in
batch at night requires no patience, because we're sleeping, and
besides, the batch processor should take responsibility for handling
those rare scenarios at a higher-level..

Regards,
 Chris


On Sat, May 29, 2010 at 2:53 AM, stephane ducasse
<[hidden email]> wrote:
> Hi guys
>
> in Squeak andreas introduced the idea of test time out
> Do you think that this is interesting?
>
> Stef
>
> SUnit
> -----
> All test cases now have an associated timeout after which the test is considered failed. The purpose of the timeout is to catch issues like infinite loops, unexpected user input etc. in automated test environments. Timeouts can be set on an individual test basis using the <timeout: seconds> tag or for an entire test case by implementing the #defaultTimeout method.
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] SUnit Time out

David T. Lewis
In reply to this post by Andreas.Raab
On Tue, Jun 01, 2010 at 09:36:48PM -0700, Andreas Raab wrote:
>
> These changes are largely intended for automated integration testing. I
> am hoping to automate the tests for community supported packages to a
> point where there will be no user in front of the system.

I've run into one issue for externally supported packages that need
to work on older images. The <timeout: 30> method annotation works
very well, but is not supported on all images. I put SUnit-dtl.79
in the inbox as a possible solution.

Dave