Smalltalk › Squeak › Squeak - Dev

The Inbox: SUnit-cmm.116.mcz

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

17 messages Options

commits-2

The Inbox: SUnit-cmm.116.mcz

Chris Muller uploaded a new version of SUnit to project The Inbox:
http://source.squeak.org/inbox/SUnit-cmm.116.mcz

==================== Summary ====================

Name: SUnit-cmm.116
Author: cmm
Time: 23 May 2019, 9:39:05.797827 pm
UUID: ecce7092-6268-42a0-87d1-cbdea912ea68
Ancestors: SUnit-pre.115

The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.
Likewise, when a should: of a test which is ragged as an expectedFailure fails, the TestResult itself is considered passed, not failed.
This fixes the SUnit browser to display these TestResults accordingly, to not falsely include these "failed" Results in list because they were already expected to.

=============== Diff against SUnit-pre.115 ===============

Item was changed:
----- Method: TestCase>>assert: (in category 'accessing') -----
assert: aBooleanOrBlock

+ aBooleanOrBlock value = self shouldPass ifFalse: [self signalFailure: 'Assertion failed']
- aBooleanOrBlock value ifFalse: [self signalFailure: 'Assertion failed']
!

Item was changed:
----- Method: TestResult>>unexpectedPasses (in category 'accessing') -----
unexpectedPasses
+ ^ failures select: [:each | each shouldPass not] !
- ^ passed select: [:each | each shouldPass not] !

Jakob Reschke

Re: The Inbox: SUnit-cmm.116.mcz

Am Fr., 24. Mai 2019 um 04:39 Uhr schrieb <[hidden email]>:

----- Method: TestCase>>assert: (in category 'accessing') -----
assert: aBooleanOrBlock

+ aBooleanOrBlock value = self shouldPass ifFalse: [self signalFailure: 'Assertion failed']
- aBooleanOrBlock value ifFalse: [self signalFailure: 'Assertion failed']
!

Hmm, what if a test case has multiple assertions and only one of them fails under certain conditions, which is why the test is expected to fail under these conditions? If only the last of many assertions fails so far, so the complete test method runs, after this change the first assertion will fail right away (because it is true contrary to shouldPass), won't it? It might hide some errors, like exceptions raised in the middle of the assertions.

Chris Muller-3

Re: The Inbox: SUnit-cmm.116.mcz

Right. Expected failures should be put into their own #test.. method
and not mixed with assertions that are expected to pass. Otherwise,
the SUnit domain won't be able to properly express to you the state of
those tests, as you said.

Let me zoom out to the context of why I happened to need
#expectedFailure in the first place. While implementing GraphQL
document validation code, I came across an apparent "self
contradiction" in the spec. In one case, it says to handle scenario
XYZ one way, but in another place, it says to handle XYZ differently.
So I have to choose to implement one way or the other, and whichever
way I *don't* I want to document as an expectedFailure.

It actually ended up being a family of about 3 or 4 these cases, and I
got tired of seeing them in the SUnit browser, distracting me by
making me think they were something I needed to investigate...

On Sun, May 26, 2019 at 9:01 AM Jakob Reschke <[hidden email]> wrote:

>
> Am Fr., 24. Mai 2019 um 04:39 Uhr schrieb <[hidden email]>:
>>
>>
>> ----- Method: TestCase>>assert: (in category 'accessing') -----
>> assert: aBooleanOrBlock
>>
>> + aBooleanOrBlock value = self shouldPass ifFalse: [self signalFailure: 'Assertion failed']
>> - aBooleanOrBlock value ifFalse: [self signalFailure: 'Assertion failed']
>> !
>>
>
> Hmm, what if a test case has multiple assertions and only one of them fails under certain conditions, which is why the test is expected to fail under these conditions? If only the last of many assertions fails so far, so the complete test method runs, after this change the first assertion will fail right away (because it is true contrary to shouldPass), won't it? It might hide some errors, like exceptions raised in the middle of the assertions.
>

Jakob Reschke

Re: The Inbox: SUnit-cmm.116.mcz

Am So., 26. Mai 2019 um 20:56 Uhr schrieb Chris Muller <[hidden email]>:

Right. Expected failures should be put into their own #test.. method
and not mixed with assertions that are expected to pass. Otherwise,
the SUnit domain won't be able to properly express to you the state of
those tests, as you said.

Makes sense. Hope everyone is aware of that for the future use of the feature.

PS.

I came across an apparent "self
contradiction" in the spec.

Whew I tried to read "self contradiction" as a Smalltalk expression at first... :-)

Chris Muller-4

Re: The Inbox: SUnit-cmm.116.mcz

On Sun, May 26, 2019 at 2:20 PM Jakob Reschke <[hidden email]> wrote:

Am So., 26. Mai 2019 um 20:56 Uhr schrieb Chris Muller <[hidden email]>:
Right. Expected failures should be put into their own #test.. method
and not mixed with assertions that are expected to pass. Otherwise,
the SUnit domain won't be able to properly express to you the state of
those tests, as you said.

Makes sense. Hope everyone is aware of that for the future use of the feature.

And for present use, too, since mixing them would present the same issue today as well. As the test developer is required to specify expectedFailures at #test..method level, they'll probably be naturally inclined to put only that group of should:'s in there. They're generally rare, exceptional cases.

PS.

I came across an apparent "self
contradiction" in the spec.

> Whew I tried to read "self contradiction" as a Smalltalk expression at first... :-)

That sounds like an interesting behavior, we should implement that on Object! :-)

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

Hi Chris,

yes, tests should be as modular and concise as possible -- only checking for "one thing" in each test. However, #assert: is *not* the level of granularity one should look at to judge a certain test's modularity. That is, there can be more than one #assert: in a test and still the test be regarded as testing "one thing".

So... I do not quite agree with your statement here ... but I think you mean the right thing:

> Expected failures should be put into their own #test.. method

> and not mixed with assertions that are expected to pass. Otherwise,
> the SUnit domain won't be able to properly express to you the state of
> those tests, as you said.

One should not compose tests according to pass-or-not but rather following domain/project-specific reasons.

Then, I know about some tests that put a guard in the beginning of a test to provide a useful "failure message" to the user before running into some other assertion failures.

Still not so sure about this hook, though, ... hmmm...

Best,

Marcel

Am 26.05.2019 22:01:02 schrieb Chris Muller <[hidden email]>:

On Sun, May 26, 2019 at 2:20 PM Jakob Reschke <[hidden email]> wrote:
Am So., 26. Mai 2019 um 20:56 Uhr schrieb Chris Muller <[hidden email]>:
Right. Expected failures should be put into their own #test.. method
and not mixed with assertions that are expected to pass. Otherwise,
the SUnit domain won't be able to properly express to you the state of
those tests, as you said.

Makes sense. Hope everyone is aware of that for the future use of the feature.

And for present use, too, since mixing them would present the same issue today as well. As the test developer is required to specify expectedFailures at #test..method level, they'll probably be naturally inclined to put only that group of should:'s in there. They're generally rare, exceptional cases.

PS.

I came across an apparent "self
contradiction" in the spec.

> Whew I tried to read "self contradiction" as a Smalltalk expression at first... :-)

That sounds like an interesting behavior, we should implement that on Object! :-)

Chris Muller-3

Re: The Inbox: SUnit-cmm.116.mcz

Hi Marcel,

This change simply fixes the display for exclusion (or inclusion)
of "expectedFailures" in the list, and nothing more. It does *not*
change granularity of assert: at all.

The way tests are written is unchanged, and still up to the test
developer. If you don't use #expectedFailures, you can ignore this
completely. That's the only use-case being fixed here.

> yes, tests should be as modular and concise as possible -- only checking for "one thing" in each test. However, #assert: is *not* the level of granularity one should look at to judge a certain test's modularity. That is, there can be more than one #assert: in a test and still the test be regarded as testing "one thing".
>
> So... I do not quite agree with your statement here ... but I think you mean the right thing:
>
> > Expected failures should be put into their own #test.. method
> > and not mixed with assertions that are expected to pass. Otherwise,
> > the SUnit domain won't be able to properly express to you the state of
> > those tests, as you said.
>
> One should not compose tests according to pass-or-not but rather following domain/project-specific reasons.

Agree, and they still are composed for domain-specific reasons. But
expectedFailures are a subset *within* a domain-specific reason that
not only *should* (IMO), but *need* to be separated out, because SUnit
provides registration of expectedFailures ONLY at the #test... method
level.

So nothing is changed w.r.t. this issue you raise about mixing
expectedFailures with expectedPasses. Doing so today would result in
the same ambiguity within the single test, with or without this
change. Like, what would you think if you came upon this?

testMethodRegisteredAsAnExpectedFailure
self assert: (falseExpression).
self assert: (trueExpression)

The test is registered as an expectedFailure, so regardless of whether
you classify that TestResult as "1 unexpected passes", "1 expected
failure", either way, you have hidden information from yourself about
the OTHER test in that method. This change has nothing to do with
that.

> Then, I know about some tests that put a guard in the beginning of a test to provide a useful "failure message" to the user before running into some other assertion failures.

expectedFailures is a feature of SUnit designed to handle use cases
like the one I described.

> Still not so sure about this hook, though, ... hmmm...

It's not a "hook" as much as a correction to whether a particular
failed assertion should be expected or not.

Best,
Chris

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

Hi Chris,

> testMethodRegisteredAsAnExpectedFailure

> self assert: (falseExpression). "A"
> self assert: (trueExpression). "B"

Implemented like this, one should consider this as *one test* -- regardless of the number of assertions inside. Still, one might want to refactor it and swap the two lines to simplify debugging. However, that's only useful if the semantic order of both assertions got mixed up in the first place. It might have anything to do with the actual cause of the failure.

Still, one could *add* another test that focuses on the failing assertion -- if possible:

testMethodRegisteredAsAnotherExpectedFailure

self assert: (falseExpression). "A"

Makes it easier for the programmer to figure out what to fix first in the future. Both tests would be expected failures, though.

> But expectedFailures are a subset *within* a domain-specific reason that
> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

***

Anyway, what you propose here introduces a *new* feature to SUnit. It does *not fix* an existing one. :-) Regarding that proposal, I think that #shouldPass should not be called during test runs so frequently as in an #assert: call. I consider #shouldPass kind of a documentation used for tools such as in the test runner. Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

Best,

Marcel

Am 27.05.2019 21:07:23 schrieb Chris Muller <[hidden email]>:
Hi Marcel,

This change simply fixes the display for exclusion (or inclusion)
of "expectedFailures" in the list, and nothing more. It does *not*
change granularity of assert: at all.

The way tests are written is unchanged, and still up to the test
developer. If you don't use #expectedFailures, you can ignore this
completely. That's the only use-case being fixed here.

> yes, tests should be as modular and concise as possible -- only checking for "one thing" in each test. However, #assert: is *not* the level of granularity one should look at to judge a certain test's modularity. That is, there can be more than one #assert: in a test and still the test be regarded as testing "one thing".
>
> So... I do not quite agree with your statement here ... but I think you mean the right thing:
>
> > Expected failures should be put into their own #test.. method
> > and not mixed with assertions that are expected to pass. Otherwise,
> > the SUnit domain won't be able to properly express to you the state of
> > those tests, as you said.
>
> One should not compose tests according to pass-or-not but rather following domain/project-specific reasons.

Agree, and they still are composed for domain-specific reasons. But
expectedFailures are a subset *within* a domain-specific reason that
not only *should* (IMO), but *need* to be separated out, because SUnit
provides registration of expectedFailures ONLY at the #test... method
level.

So nothing is changed w.r.t. this issue you raise about mixing
expectedFailures with expectedPasses. Doing so today would result in
the same ambiguity within the single test, with or without this
change. Like, what would you think if you came upon this?

testMethodRegisteredAsAnExpectedFailure
self assert: (falseExpression).
self assert: (trueExpression)

The test is registered as an expectedFailure, so regardless of whether
you classify that TestResult as "1 unexpected passes", "1 expected
failure", either way, you have hidden information from yourself about
the OTHER test in that method. This change has nothing to do with
that.

> Then, I know about some tests that put a guard in the beginning of a test to provide a useful "failure message" to the user before running into some other assertion failures.

expectedFailures is a feature of SUnit designed to handle use cases
like the one I described.

> Still not so sure about this hook, though, ... hmmm...

It's not a "hook" as much as a correction to whether a particular
failed assertion should be expected or not.

Best,
Chris

Chris Muller-4

Re: The Inbox: SUnit-cmm.116.mcz

Hey Marcel,

I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then?

Because I assume we agree on these fundamental requirements of SUnit..

1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good).

2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?).

3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations.

..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.

> But expectedFailures are a subset *within* a domain-specific reason that

> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.

Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures.

LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there.

Please help me to rediscover the capabilities of SUnit that made it actually useful.

- Chris

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

Hi Chris,

if an unexpected pass would be treated as a failure, *you would encourage* tests like this:

testSomething

self assert: 5 / 0.

expectedFailures

^ #(testSomething)

A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually.

The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression.

If you treat expected failures as something more inherent in your test suite, you would encourage tests like the one sketched above. I wouldn't do that.

Best,

Marcel

Am 29.05.2019 00:42:50 schrieb Chris Muller <[hidden email]>:
Hey Marcel,

I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then?

Because I assume we agree on these fundamental requirements of SUnit..

1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good).
2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?).
3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations.

..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.

> But expectedFailures are a subset *within* a domain-specific reason that
> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.

Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures.

LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there.

Please help me to rediscover the capabilities of SUnit that made it actually useful.

- Chris

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

Hi Chris.

> Would you mind helping me out with one then?

What are the challenges exactly you are trying to tackle here? Can you provide more context?

Best,

Marcel

Am 29.05.2019 09:15:54 schrieb Marcel Taeumel <[hidden email]>:
Hi Chris,

if an unexpected pass would be treated as a failure, *you would encourage* tests like this:

testSomething
self assert: 5 / 0.

expectedFailures
^ #(testSomething)

A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually.

The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression.

If you treat expected failures as something more inherent in your test suite, you would encourage tests like the one sketched above. I wouldn't do that.

Best,
Marcel

Am 29.05.2019 00:42:50 schrieb Chris Muller <[hidden email]>:
Hey Marcel,

I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then?

Because I assume we agree on these fundamental requirements of SUnit..

1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good).
2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?).
3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations.

..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.

> But expectedFailures are a subset *within* a domain-specific reason that
> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.

Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures.

LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there.

Please help me to rediscover the capabilities of SUnit that made it actually useful.

- Chris

Frank Shearar-3

Re: The Inbox: SUnit-cmm.116.mcz

In reply to this post by marcel.taeumel

On Wed., May 29, 2019, 00:16 Marcel Taeumel, <[hidden email]> wrote:

Hi Chris,

if an unexpected pass would be treated as a failure, *you would encourage* tests like this:

testSomething
self assert: 5 / 0.

expectedFailures
^ #(testSomething)

A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually.

The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression.

If you treat expected failures as something more inherent in your test suite, you would encourage tests like the one sketched above. I wouldn't do that.

Best,
Marcel

But where is the incentive for a programmer to fix a test and remember to update #expectedFailures?

On the other hand there might be tests that flake, but no one has gotten around to deflaking it and we suppress the test with #expectedFailures, when it passes 80% of the time we want to not fail builds 20% of the time. (Ok really we do, but not many of us are paid to keep the build green.)

frank

Am 29.05.2019 00:42:50 schrieb Chris Muller <[hidden email]>:
Hey Marcel,

I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then?

Because I assume we agree on these fundamental requirements of SUnit..

1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good).
2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?).
3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations.

..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.

> But expectedFailures are a subset *within* a domain-specific reason that
> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.

Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures.

LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there.

Please help me to rediscover the capabilities of SUnit that made it actually useful.

- Chris

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

It seems that the concept of expected failures is hard to grasp. Other communities seem to struggle as well. See StackOverflow. A skip list might be better, maybe via pragma in the test itself. Not sure. Looking at all those #expectedFailures in the image, there are many not-so-temporary uses of it, but platform-specific conditions etc. Oh, well...

- auto skip list that documents platform-specific dependencies? (unix, network, ...)

- manual skip list that indicates work-in-progress tests?

At least this would avoid the discussion about unexpected passes ... sigh...

Best,

Marcel

Am 29.05.2019 18:34:29 schrieb Frank Shearar <[hidden email]>:

On Wed., May 29, 2019, 00:16 Marcel Taeumel, <[hidden email]> wrote:
Hi Chris,

if an unexpected pass would be treated as a failure, *you would encourage* tests like this:

testSomething
self assert: 5 / 0.

expectedFailures
^ #(testSomething)

A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually.

The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression.

If you treat expected failures as something more inherent in your test suite, you would encourage tests like the one sketched above. I wouldn't do that.

Best,
Marcel

But where is the incentive for a programmer to fix a test and remember to update #expectedFailures?

On the other hand there might be tests that flake, but no one has gotten around to deflaking it and we suppress the test with #expectedFailures, when it passes 80% of the time we want to not fail builds 20% of the time. (Ok really we do, but not many of us are paid to keep the build green.)

frank

Am 29.05.2019 00:42:50 schrieb Chris Muller <[hidden email]>:
Hey Marcel,

I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then?

Because I assume we agree on these fundamental requirements of SUnit..

1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good).
2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?).
3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations.

..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.

> But expectedFailures are a subset *within* a domain-specific reason that
> not only *should* (IMO), but *need* to be separated out, because SUnit
> provides registration of expectedFailures ONLY at the #test... method
> level.

Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here.

Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.

Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable:

unexpectedPasses
^ failures select: [:each | each shouldPass not]

What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic.

The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.

This was your initial fallacy for this proposal:

> The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed.

No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure.

You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures.

LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there.

Please help me to rediscover the capabilities of SUnit that made it actually useful.

- Chris

Chris Muller-3

Re: The Inbox: SUnit-cmm.116.mcz

In reply to this post by marcel.taeumel

> if an unexpected pass would be treated as a failure, *you would encourage* tests like this:
>
> testSomething
> self assert: 5 / 0.
>
> expectedFailures
> ^ #(testSomething)

It appears that has already happened already even without this change.
But, nobody noticed. Check out UTF8EdgeCaseTest>>#testOverlongNUL. Is
this a case of misunderstanding what expectedFaillures was for? Or
did it eventually get working and so the test now passes, but we
forgot to remove it? That would mean the "documentation" has been
wrong for a long time. If only we could have gotten some sort of
"notification" whenever an expectedFailure would start working again,
hmmm.... ;)

> A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually.
>
> The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression.

What do fellow programmers do with those expectedFailures then? The
way you make it sound, they're supposed to ignore them. That's
the trouble, they've muscled their way into the tool, disrupting the
lists and tainting the *real* output status fellow programmers
actually care about. These are core SUnit outputs!

The way we're using expectedFailures now is so limited, yet so
intrusvie. We could "document not-yet-working stuff" anywhere...
comments, or an external project-plan document even..

OR, we could tweak SUnit so they integrate seamlessly:

- as a tucked-away documentation list of expected failures,
- with users still reminded of its existence by the totals output ("4
expected failures"),
- and yet, not "in-the-way" of the real work,
- and getting proper green/yellow/red status restored.

Best,
Chris

Chris Muller-3

Re: The Inbox: SUnit-cmm.116.mcz

In reply to this post by marcel.taeumel

> > Would you mind helping me out with one then?
>
> What are the challenges exactly you are trying to tackle here? Can you provide more context?

I had to think about this -- I think,
abstractly, the case comes up when something that "should" (or "should
not") work, is in conflict with whether we want it or expect it to
work. We can look as close as the trunk image for a good example.

SocketTest>>#testSocketReuse

This test is expected to fail on Windows. For sake of argument, let's
pretend that the reason is because there is a security hole in Windows
(surely not! ;) ) and so we, as developers of Squeak, commented out
windows support in the VM because we *absolutely positively want this
to fail on Windows* until we're sure the coast is clear.

This is why it's not necessarily serendipitous for it to unknowingly
start "passing". We want to know, and stay in control.

On the flip side, the situatin could be that we want it to work, but
due to core design of Windows, it won't ever be supported, so we know
it'll _never_ pass.

But that doesn't mean we want to write code that says

isWindows
ifTrue: [ should raise ]
ifFalse: [ self assert: ... ]

because "should raise" expresses we think it "should", when wish it
didn't. We feel it "should work" but it simply doesn't.

For either story, our current implementation of expectedFailures
prevents Windows users from participating in a proper SUnit
experience.

- Chris

David T. Lewis

Re: The Inbox: SUnit-cmm.116.mcz

On Wed, May 29, 2019 at 04:07:11PM -0500, Chris Muller wrote:

> > > Would you mind helping me out with one then?
> >
> > What are the challenges exactly you are trying to tackle here? Can you provide more context?
>
> I had to think about this -- I think,
> abstractly, the case comes up when something that "should" (or "should
> not") work, is in conflict with whether we want it or expect it to
> work. We can look as close as the trunk image for a good example.
>
> SocketTest>>#testSocketReuse
>
> This test is expected to fail on Windows. For sake of argument, let's
> pretend that the reason is because there is a security hole in Windows
> (surely not! ;) ) and so we, as developers of Squeak, commented out
> windows support in the VM because we *absolutely positively want this
> to fail on Windows* until we're sure the coast is clear.

This is a test for platform-specific functions that are known to be
unavailable when running on a Windows VM. We expect it to fail when
running on Windows. SocketTest>>expectedFailures makes this explicit.

If Squeak is running on a Windows VM, this test will fail, and the
failure is expected. It is an expected failure.

That seems reasonable to me.

If it is not clear what an "expected failure" is, then a good method
comment in TestCase>>expectedFailures might help. And if we cannot
agree on the method comment, then it's a sure bet that we should not
be fiddling with the implementation.

Dave

marcel.taeumel

Re: The Inbox: SUnit-cmm.116.mcz

Taking a look at the follwing, existing methods, everything seems to be already there:

TestResult >> #failures

TestResult >> #failureCount

TestResult >> #defects

There is no need to change anything. The message #failures correctly combines #unexpectedFailures and #unexpectedPasses. The message #defects combines #errors and #failures.

The use of the instance variables failures, errors, and passes is an implementation detail that can stay is it is for this use case.

@Chris: Why did it matter that those instance variables refer to tests the current way?

Best,

Marcel

Am 30.05.2019 01:25:14 schrieb David T. Lewis <[hidden email]>:
On Wed, May 29, 2019 at 04:07:11PM -0500, Chris Muller wrote:
> > > Would you mind helping me out with one then?
> >
> > What are the challenges exactly you are trying to tackle here? Can you provide more context?
>
> I had to think about this -- I think,
> abstractly, the case comes up when something that "should" (or "should
> not") work, is in conflict with whether we want it or expect it to
> work. We can look as close as the trunk image for a good example.
>
> SocketTest>>#testSocketReuse
>
> This test is expected to fail on Windows. For sake of argument, let's
> pretend that the reason is because there is a security hole in Windows
> (surely not! ;) ) and so we, as developers of Squeak, commented out
> windows support in the VM because we *absolutely positively want this
> to fail on Windows* until we're sure the coast is clear.

This is a test for platform-specific functions that are known to be
unavailable when running on a Windows VM. We expect it to fail when
running on Windows. SocketTest>>expectedFailures makes this explicit.

If Squeak is running on a Windows VM, this test will fail, and the
failure is expected. It is an expected failure.

That seems reasonable to me.

If it is not clear what an "expected failure" is, then a good method
comment in TestCase>>expectedFailures might help. And if we cannot
agree on the method comment, then it's a sure bet that we should not
be fiddling with the implementation.

Dave