Chris Muller uploaded a new version of SUnit to project The Inbox:
http://source.squeak.org/inbox/SUnit-cmm.116.mcz ==================== Summary ==================== Name: SUnit-cmm.116 Author: cmm Time: 23 May 2019, 9:39:05.797827 pm UUID: ecce7092-6268-42a0-87d1-cbdea912ea68 Ancestors: SUnit-pre.115 The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed. Likewise, when a should: of a test which is ragged as an expectedFailure fails, the TestResult itself is considered passed, not failed. This fixes the SUnit browser to display these TestResults accordingly, to not falsely include these "failed" Results in list because they were already expected to. =============== Diff against SUnit-pre.115 =============== Item was changed: ----- Method: TestCase>>assert: (in category 'accessing') ----- assert: aBooleanOrBlock + aBooleanOrBlock value = self shouldPass ifFalse: [self signalFailure: 'Assertion failed'] - aBooleanOrBlock value ifFalse: [self signalFailure: 'Assertion failed'] ! Item was changed: ----- Method: TestResult>>unexpectedPasses (in category 'accessing') ----- unexpectedPasses + ^ failures select: [:each | each shouldPass not] ! - ^ passed select: [:each | each shouldPass not] ! |
Am Fr., 24. Mai 2019 um 04:39 Uhr schrieb <[hidden email]>:
Hmm, what if a test case has multiple assertions and only one of them fails under certain conditions, which is why the test is expected to fail under these conditions? If only the last of many assertions fails so far, so the complete test method runs, after this change the first assertion will fail right away (because it is true contrary to shouldPass), won't it? It might hide some errors, like exceptions raised in the middle of the assertions. |
Right. Expected failures should be put into their own #test.. method
and not mixed with assertions that are expected to pass. Otherwise, the SUnit domain won't be able to properly express to you the state of those tests, as you said. Let me zoom out to the context of why I happened to need #expectedFailure in the first place. While implementing GraphQL document validation code, I came across an apparent "self contradiction" in the spec. In one case, it says to handle scenario XYZ one way, but in another place, it says to handle XYZ differently. So I have to choose to implement one way or the other, and whichever way I *don't* I want to document as an expectedFailure. It actually ended up being a family of about 3 or 4 these cases, and I got tired of seeing them in the SUnit browser, distracting me by making me think they were something I needed to investigate... On Sun, May 26, 2019 at 9:01 AM Jakob Reschke <[hidden email]> wrote: > > Am Fr., 24. Mai 2019 um 04:39 Uhr schrieb <[hidden email]>: >> >> >> ----- Method: TestCase>>assert: (in category 'accessing') ----- >> assert: aBooleanOrBlock >> >> + aBooleanOrBlock value = self shouldPass ifFalse: [self signalFailure: 'Assertion failed'] >> - aBooleanOrBlock value ifFalse: [self signalFailure: 'Assertion failed'] >> ! >> > > Hmm, what if a test case has multiple assertions and only one of them fails under certain conditions, which is why the test is expected to fail under these conditions? If only the last of many assertions fails so far, so the complete test method runs, after this change the first assertion will fail right away (because it is true contrary to shouldPass), won't it? It might hide some errors, like exceptions raised in the middle of the assertions. > |
Am So., 26. Mai 2019 um 20:56 Uhr schrieb Chris Muller <[hidden email]>: Right. Expected failures should be put into their own #test.. method Makes sense. Hope everyone is aware of that for the future use of the feature. PS. I came across an apparent "self Whew I tried to read "self contradiction" as a Smalltalk expression at first... :-) |
On Sun, May 26, 2019 at 2:20 PM Jakob Reschke <[hidden email]> wrote:
And for present use, too, since mixing them would present the same issue today as well. As the test developer is required to specify expectedFailures at #test..method level, they'll probably be naturally inclined to put only that group of should:'s in there. They're generally rare, exceptional cases.
That sounds like an interesting behavior, we should implement that on Object! :-) |
Hi Chris, yes, tests should be as modular and concise as possible -- only checking for "one thing" in each test. However, #assert: is *not* the level of granularity one should look at to judge a certain test's modularity. That is, there can be more than one #assert: in a test and still the test be regarded as testing "one thing". So... I do not quite agree with your statement here ... but I think you mean the right thing: > Expected failures should be put into their own #test.. method > and not mixed with assertions that are expected to pass. Otherwise, > the SUnit domain won't be able to properly express to you the state of > those tests, as you said. One should not compose tests according to pass-or-not but rather following domain/project-specific reasons. Then, I know about some tests that put a guard in the beginning of a test to provide a useful "failure message" to the user before running into some other assertion failures. Still not so sure about this hook, though, ... hmmm... Best, Marcel
|
Hi Marcel,
This change simply fixes the display for exclusion (or inclusion) of "expectedFailures" in the list, and nothing more. It does *not* change granularity of assert: at all. The way tests are written is unchanged, and still up to the test developer. If you don't use #expectedFailures, you can ignore this completely. That's the only use-case being fixed here. > yes, tests should be as modular and concise as possible -- only checking for "one thing" in each test. However, #assert: is *not* the level of granularity one should look at to judge a certain test's modularity. That is, there can be more than one #assert: in a test and still the test be regarded as testing "one thing". > > So... I do not quite agree with your statement here ... but I think you mean the right thing: > > > Expected failures should be put into their own #test.. method > > and not mixed with assertions that are expected to pass. Otherwise, > > the SUnit domain won't be able to properly express to you the state of > > those tests, as you said. > > One should not compose tests according to pass-or-not but rather following domain/project-specific reasons. Agree, and they still are composed for domain-specific reasons. But expectedFailures are a subset *within* a domain-specific reason that not only *should* (IMO), but *need* to be separated out, because SUnit provides registration of expectedFailures ONLY at the #test... method level. So nothing is changed w.r.t. this issue you raise about mixing expectedFailures with expectedPasses. Doing so today would result in the same ambiguity within the single test, with or without this change. Like, what would you think if you came upon this? testMethodRegisteredAsAnExpectedFailure self assert: (falseExpression). self assert: (trueExpression) The test is registered as an expectedFailure, so regardless of whether you classify that TestResult as "1 unexpected passes", "1 expected failure", either way, you have hidden information from yourself about the OTHER test in that method. This change has nothing to do with that. > Then, I know about some tests that put a guard in the beginning of a test to provide a useful "failure message" to the user before running into some other assertion failures. expectedFailures is a feature of SUnit designed to handle use cases like the one I described. > Still not so sure about this hook, though, ... hmmm... It's not a "hook" as much as a correction to whether a particular failed assertion should be expected or not. Best, Chris |
Hi Chris, > testMethodRegisteredAsAnExpectedFailure > self assert: (falseExpression). "A"> self assert: (trueExpression). "B" Implemented like this, one should consider this as *one test* -- regardless of the number of assertions inside. Still, one might want to refactor it and swap the two lines to simplify debugging. However, that's only useful if the semantic order of both assertions got mixed up in the first place. It might have anything to do with the actual cause of the failure. Still, one could *add* another test that focuses on the failing assertion -- if possible: testMethodRegisteredAsAnotherExpectedFailure self assert: (falseExpression). "A"Makes it easier for the programmer to figure out what to fix first in the future. Both tests would be expected failures, though. > But expectedFailures are a subset *within* a domain-specific reason that > not only *should* (IMO), but *need* to be separated out, because SUnit > provides registration of expectedFailures ONLY at the #test... method > level. Hmm... no. Given that expected failures work at the level of a #test... method, programmers can just work with that. Actually, SUnit relies on the programmer to figure out the best granularity for both passing and failing tests. There is no need to do some analysis that goes sub-method here. *** Anyway, what you propose here introduces a *new* feature to SUnit. It does *not fix* an existing one. :-) Regarding that proposal, I think that #shouldPass should not be called during test runs so frequently as in an #assert: call. I consider #shouldPass kind of a documentation used for tools such as in the test runner. Your proposed change in TestResult >> #unexpectedPasses renders the code practically unreadable: unexpectedPasses ^ failures select: [:each | each shouldPass not] What does it say? Well, somehow, a "pass" got stored in the variable "failures" and if you happen to know that #shouldPass can sort out those hidden passes... way too much magic. This was your initial fallacy for this proposal: > The Result of a Test which is tagged as an #expectedFailure IS a failure if the test passed. No, it is not. :-) If anything, the programmer could be happy that an expected failure passes. It is *not a failure*. Maybe something third, which we don't have yet. Serendipity? Happy accident? Not a failure. Best, Marcel
|
Hey Marcel, I think we either disagree what #expectedFailures is for, or one of us forgot (possibly me). When I "zoomed out" earlier in the thread to explain my real world use-case, it was an implicit invitation for alternative solutions. Would you mind helping me out with one then? Because I assume we agree on these fundamental requirements of SUnit.. 1) allow user to know the result of running tests via a gross glance from across the room (e.g., if I see "green", I know all is good). 2) Be able to interrogate the result of running tests *programmatically* (e.g., "myResult failures isEmpty"...?). 3) Be able to document functionality *expected* to fail (independent of "should"), while keeping the above requirements in line with these expectations. ..but by saying that expectedFailures is just "an FYI list off-to-the-side for documentation" -- and treating them the same as regular failures basically neuters all the above basic capabilities of SUnit. We're forcing it to report false negatives *100% of the time*, which means it'll *never report green*.
Despite the "no", I actually agree with everything you said after that... I just think that "figuring out the best granularity" involves doing "sub-method" analysis automatically, without even realizing it.
The fact you put "pass" in quotes, above, means, deep down, I think you agree with me, :) that going against the programmers *declared expectations* is NOT a pass, which is why it's rightly found in the failures collection.
You can't possibly believe that and articulate a rational reason for the existence of #expectedFailures. LOL, and it wouldn't be "happy" because you wouldn't even know it happened! The SUnit browser would just be the same yellow it always is, with a same-looking smattering of "expectedFailures" listed at the top, maybe even with one REAL failure mixed in, so that the size of the list looks just right, and so you won't even bother checking manually the actual testMethodNames which failed to make sure they're truly ALL the "expected" ones. It could be until years later, when another programmer would find it sitting in "expectedFailures" and get confused why its in there. Please help me to rediscover the capabilities of SUnit that made it actually useful. - Chris |
Hi Chris, if an unexpected pass would be treated as a failure, *you would encourage* tests like this: testSomething self assert: 5 / 0. expectedFailures ^ #(testSomething) A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually. The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression. If you treat expected failures as something more inherent in your test suite, you would encourage tests like the one sketched above. I wouldn't do that. Best, Marcel
|
Hi Chris. > Would you mind helping me out with one then? What are the challenges exactly you are trying to tackle here? Can you provide more context? Best, Marcel
|
In reply to this post by marcel.taeumel
On Wed., May 29, 2019, 00:16 Marcel Taeumel, <[hidden email]> wrote:
But where is the incentive for a programmer to fix a test and remember to update #expectedFailures? On the other hand there might be tests that flake, but no one has gotten around to deflaking it and we suppress the test with #expectedFailures, when it passes 80% of the time we want to not fail builds 20% of the time. (Ok really we do, but not many of us are paid to keep the build green.) frank
|
It seems that the concept of expected failures is hard to grasp. Other communities seem to struggle as well. See StackOverflow. A skip list might be better, maybe via pragma in the test itself. Not sure. Looking at all those #expectedFailures in the image, there are many not-so-temporary uses of it, but platform-specific conditions etc. Oh, well... - auto skip list that documents platform-specific dependencies? (unix, network, ...) - manual skip list that indicates work-in-progress tests? At least this would avoid the discussion about unexpected passes ... sigh... Best, Marcel
|
In reply to this post by marcel.taeumel
> if an unexpected pass would be treated as a failure, *you would encourage* tests like this:
> > testSomething > self assert: 5 / 0. > > expectedFailures > ^ #(testSomething) It appears that has already happened already even without this change. But, nobody noticed. Check out UTF8EdgeCaseTest>>#testOverlongNUL. Is this a case of misunderstanding what expectedFaillures was for? Or did it eventually get working and so the test now passes, but we forgot to remove it? That would mean the "documentation" has been wrong for a long time. If only we could have gotten some sort of "notification" whenever an expectedFailure would start working again, hmmm.... ;) > A programmer's goal should always be to write passing tests. An expected failure should only be temporary. If that expected failure passes at some point without the programmer noticing and updating #expectedFailures, that's not a bad thing. It's progress. The programmer will update #expectedFailures eventually. > > The list of expected failures is just an add-on to your test suite to document not-yet-working stuff so that fellow programmers do not treat those failing tests as some new regression. What do fellow programmers do with those expectedFailures then? The way you make it sound, they're supposed to ignore them. That's the trouble, they've muscled their way into the tool, disrupting the lists and tainting the *real* output status fellow programmers actually care about. These are core SUnit outputs! The way we're using expectedFailures now is so limited, yet so intrusvie. We could "document not-yet-working stuff" anywhere... comments, or an external project-plan document even.. OR, we could tweak SUnit so they integrate seamlessly: - as a tucked-away documentation list of expected failures, - with users still reminded of its existence by the totals output ("4 expected failures"), - and yet, not "in-the-way" of the real work, - and getting proper green/yellow/red status restored. Best, Chris |
In reply to this post by marcel.taeumel
> > Would you mind helping me out with one then?
> > What are the challenges exactly you are trying to tackle here? Can you provide more context? I had to think about this -- I think, abstractly, the case comes up when something that "should" (or "should not") work, is in conflict with whether we want it or expect it to work. We can look as close as the trunk image for a good example. SocketTest>>#testSocketReuse This test is expected to fail on Windows. For sake of argument, let's pretend that the reason is because there is a security hole in Windows (surely not! ;) ) and so we, as developers of Squeak, commented out windows support in the VM because we *absolutely positively want this to fail on Windows* until we're sure the coast is clear. This is why it's not necessarily serendipitous for it to unknowingly start "passing". We want to know, and stay in control. On the flip side, the situatin could be that we want it to work, but due to core design of Windows, it won't ever be supported, so we know it'll _never_ pass. But that doesn't mean we want to write code that says isWindows ifTrue: [ should raise ] ifFalse: [ self assert: ... ] because "should raise" expresses we think it "should", when wish it didn't. We feel it "should work" but it simply doesn't. For either story, our current implementation of expectedFailures prevents Windows users from participating in a proper SUnit experience. - Chris |
On Wed, May 29, 2019 at 04:07:11PM -0500, Chris Muller wrote:
> > > Would you mind helping me out with one then? > > > > What are the challenges exactly you are trying to tackle here? Can you provide more context? > > I had to think about this -- I think, > abstractly, the case comes up when something that "should" (or "should > not") work, is in conflict with whether we want it or expect it to > work. We can look as close as the trunk image for a good example. > > SocketTest>>#testSocketReuse > > This test is expected to fail on Windows. For sake of argument, let's > pretend that the reason is because there is a security hole in Windows > (surely not! ;) ) and so we, as developers of Squeak, commented out > windows support in the VM because we *absolutely positively want this > to fail on Windows* until we're sure the coast is clear. This is a test for platform-specific functions that are known to be unavailable when running on a Windows VM. We expect it to fail when running on Windows. SocketTest>>expectedFailures makes this explicit. If Squeak is running on a Windows VM, this test will fail, and the failure is expected. It is an expected failure. That seems reasonable to me. If it is not clear what an "expected failure" is, then a good method comment in TestCase>>expectedFailures might help. And if we cannot agree on the method comment, then it's a sure bet that we should not be fiddling with the implementation. Dave |
Taking a look at the follwing, existing methods, everything seems to be already there: TestResult >> #failures TestResult >> #failureCount TestResult >> #defects There is no need to change anything. The message #failures correctly combines #unexpectedFailures and #unexpectedPasses. The message #defects combines #errors and #failures. The use of the instance variables failures, errors, and passes is an implementation detail that can stay is it is for this use case. @Chris: Why did it matter that those instance variables refer to tests the current way? Best, Marcel
|
Free forum by Nabble | Edit this page |