Bert Freudenberg wrote:
> I actually side with Ralph on this one. It is very satisfying to see > the test runner turn green. With tests in the image that you cannot > fix, you will never get this satisfaction. Yes you can, you select "all standard tests" and you select the filter, "known issues", run the tests and you get all green for the tests that are supposed to be green. There are also filters for tests that are not expected to work on this platform/vm release etc I personally would like to fill the (or should I say an) image with 100's of test stubs saying , we need a test for this and for that. Keith ___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html |
In reply to this post by Bert Freudenberg
I agree too. This was not our intention to ship 3.9 with broken tests
but we had to stop and nobody really stepped up to help (except one person but I do not remember and we tried to harvest his fixes). Now in VW a guy in our team extended SUnitToo to support failing tests and I think that this is a nice alternative. PS: I would avoid to put code on mantis as cs but publish packages on squeaksource since this is easier to deal with them. Create one package FailingTest and this way if someone wants to have a look to fix a test he can load all of them in one shot. On 29 janv. 07, at 18:23, Bert Freudenberg wrote: > I actually side with Ralph on this one. It is very satisfying to > see the test runner turn green. With tests in the image that you > cannot fix, you will never get this satisfaction. |
In reply to this post by keith1y
>
> I personally would like to fill the (or should I say an) image with > 100's of test stubs saying , we need a test for this and for that. it would be a nice way to invite people participate to the effort. This is now years that we are pushing test writing... and we should continue. Stef |
In reply to this post by Bert Freudenberg
On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote: > I actually side with Ralph on this one. It is very satisfying to > see the test runner turn green. With tests in the image that you > cannot fix, you will never get this satisfaction. True, the green bar is very satisfying. But Ralph's reasons go a bit deeper. The purpose of a test suite is to provide immediate feedback on during development. Running the suite is like asking "how am I doing?" A green bar means "fine," and a red bar means, "STOP! You just introduced a bug!" The value of a test suite is that it can let us know that we've introduced a bug at the moment it happens. That's the best time to fix it, because the person who understands it best is right there and has all the information he needs. On the other hand, a test suite is *not* a bug database. That's what Mantis is for. "Visibility" of known bugs doesn't get them fixed, it just makes them overshadow the unknown bugs we'll introduce going forward. Colin |
In reply to this post by stephane ducasse
stephane ducasse a écrit :
> I agree too. This was not our intention to ship 3.9 with broken tests > but we had to stop and nobody really stepped up to help (except one person > but I do not remember and we tried to harvest his fixes). > > Now in VW a guy in our team extended SUnitToo to support failing tests > and I think that this is a nice alternative. > > PS: I would avoid to put code on mantis as cs but publish packages on > squeaksource since this is easier to deal with them. > Create one package FailingTest and this way if someone wants to have a > look to fix a test he can load all of them in one shot. > > > On 29 janv. 07, at 18:23, Bert Freudenberg wrote: > >> I actually side with Ralph on this one. It is very satisfying to see >> the test runner turn green. With tests in the image that you cannot >> fix, you will never get this satisfaction. > If I follow dominating logic in this thread, the process to incorporate a patch could be: 1) make a subclass of KnownBug instead of TestCase for the bug test. 2) write or Load a patch. 3) check non regression of all TestCase suite (green bar). 4) reject the patch or goto 2) if TestCase suite bar is red 5) check correction of targeted KnownBug 6) reject the patch or goto 2) if bug test red 7) accept the patch and move test from KnownBug to TestCase hierarchy As you can see, you will need an image holding both KnwonBugs and TestCase during patch process. In this image, KnownBugs must not turn TestCase suite bar red. So the SUnit framework SHALL deal with this feature. Alternative is to use two images, one for for bugfix and the other for non regression test, but with the risk of having images diverging. Whatever you use, protocol or subclass or any other flag somewhere in the image, you just need a classification that you can change easily. If the SUnit framework deal with this, the question raised is why should we remove KnownBugs from the image? - for cosmetic reasons? (it cannot be a commercial reason: as already said, knowing some big company success, it's obviously not so hard to sell bugs) - for cleaning reasons? having a small image... I cannot believe that KnwoBugs suite is as big as TestCase suite, so you gain relatively few. Alternatively, why should we keep KnownBugs in the image? - for encouraging people to fix them? people look more often in their image than in mantis don't they? Scanning the mailing list, you will see that a lot of people tried to make these light turn green in 3.9 process, not always successfully that's true, some bugs are terse. If we keep it in the image, it must be a package maintained with SqueakMap and Monticello so that people can share. I totally agree with Steph. If we do not, we have Mantis database. Mantis is rich but information is rather scattered... It contains old bugs already fixed, complete and incomplete fixes, enhancements, feature requests and non bug. It is not convenient enough and we need an improved way to load all known bugs at once. I am sure Steph and Marcus can confirm that. However, if one wants to fix a bug, he should better keep the link on mantis database as it is the place to read other programmers discussion, bug tests and eventual (partial) fix attempts... So this is the only thing that annoy me in Steph proposition. How to re-concile these two forms? It must be noted that a side effect of 3.9 red light has been that bugs were reported several times in mantis with several patches uncompatible or uncomplete... Use an automaton with some special tags inserted in mantis comments processed within Squeak? Human usable Hyperlink to mantis from within squeak? Nicolas |
In reply to this post by Colin Putney
On 2007 January 29 16:58, Colin Putney wrote:
> On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote: > > I actually side with Ralph on this one. It is very satisfying to > > see the test runner turn green. With tests in the image that you > > cannot fix, you will never get this satisfaction. > > True, the green bar is very satisfying. But Ralph's reasons go a bit > deeper. The purpose of a test suite is to provide immediate feedback > on during development. Running the suite is like asking "how am I > doing?" A green bar means "fine," and a red bar means, "STOP! You > just introduced a bug!" The value of a test suite is that it can let > us know that we've introduced a bug at the moment it happens. Yes, also I think this is important when test are automated (e.g using a test server) - add a change to the image (install a new version of MCZ, load a changeset) and run tests: If all succeed, good, if something fails, a bug was introduced. Unless something in the system keeps track of bugs that are "expected to fail" and compares them to what just happened, it is important to have all tests succeed for auto-testing to work. Milan > That's > the best time to fix it, because the person who understands it best > is right there and has all the information he needs. > > On the other hand, a test suite is *not* a bug database. That's what > Mantis is for. "Visibility" of known bugs doesn't get them fixed, it > just makes them overshadow the unknown bugs we'll introduce going > forward. > > Colin |
Milan Zimmermann wrote:
> On 2007 January 29 16:58, Colin Putney wrote: > >> On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote: >> >>> I actually side with Ralph on this one. It is very satisfying to >>> see the test runner turn green. With tests in the image that you >>> cannot fix, you will never get this satisfaction. >>> >> True, the green bar is very satisfying. But Ralph's reasons go a bit >> deeper. The purpose of a test suite is to provide immediate feedback >> on during development. Running the suite is like asking "how am I >> doing?" A green bar means "fine," and a red bar means, "STOP! You >> not fulfil this goal. The solution which already exists! Is to categorise the tests, so that a suite of short tests can be run more frequently in less than 2 minutes. If you have this categorisation then it is trivial to have a category of 'known issues', which you simply do not run if you want to see your green light. The improved TestRunner, times each test, and can automatically sort the long from the short, the network using tests from the non-network using tests etc. Not forgetting that there are other categories of tests needed in order to get that all hallowed green bar, such as: tests that I would not expect to work on my platform, or tests that will not work on the vm that I am using, and tests that will not work in this version of the image. Pulling your known issues tests into a separate class, is not a good solution because then you loose the ability to subclass from TestCase sensibly. Pulling your known issues into a separate package is not a good solution, because then you loose the context for those tests, since they belong in the same context as those tests which pass. You need that context if you are ever going to fix them. Finally, breaking things up physically, rather than tagging things 'mentally' so to speak ruins any kind of smooth workflow. Write test fix test, becomes write test in one place, when it works move it to another place, debug it again in the new context. To try the improved TestRunner, try Installer fixBug: 5639. best regards Keith ___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html |
> Pulling your known issues tests into a separate class, is not a good
> solution because then you loose the ability to subclass from TestCase > sensibly. > > Pulling your known issues into a separate package is not a good > solution, because then you loose the context for those tests, since they > belong in the same context as those tests which pass. You need that > context if you are ever going to fix them. > > Finally, breaking things up physically, rather than tagging things > 'mentally' so to speak ruins any kind of smooth workflow. Write test fix > test, becomes write test in one place, when it works move it to another > place, debug it again in the new context. I have done this for a long time. Putting nonworking tests into separate classes and packages is in fact perfectly fine. My experience is that your arguments are wrong. Moving code around in Smalltalk is very easy. It is almost as easy to move a method from one class to another as it is to press a button. It takes MUCH longer to figure out whether you want to move it. Tests should be independent of each other, so moving it should make no difference to debugging. The big advantage of SUnit is that it is extremely simple. Simple things work. Sometimes complicated things work, too, but it is harder! -Ralph Johnson |
In reply to this post by Ralph Johnson
Ralph,
I agree that simpler is often better, which is precisely why many of my (Dolphin) tests are not simple, in the sense that they use helper methods, many of which appear to belong in the TestCase hierarchy. If I were to follow your suggestion, I would have to duplicate those methods to get the tests to run at all. Dolphin's SUnitBrowser is a great tool, but I don't use it very often. It is great for figuring out whether all is well, or for finding failing tests. For work on a given failing test or group of tests, I use doits. Most of my methods have a comment that looks something like ThisOrThatTestCase prod:#testSomething. #prod: arranges for setup and teardown, and either goes straight to a walkback on error/halt, or does _nothing_ if all is well. Such comments frequently find there way into domain classes, so it becomes easy to restart a failing test. Finally, I have TestCase class>>runTestsReferencing: allowing scripts such as TestCase runTestsReferencing:#toddsBug. TestCase runTestsReferencing:#realizeGizmo:using:. ... where #toddsBug is a symbol strewn throughout the code for my cash cow. Todd reported it, so it was only fair to immortalize him. #realizeGizmo:using: is ficticious, there are several like-named methods that I test along with Todd's bug. I fixed it a couple of years ago, but still find the tests useful to ensure that it stays fixed. I am confident that you could adapt or extend this idea to give you the flexibilty you seek while leaving the test case heirarchy, packaging, etc. in tact. Bill Ralph Johnson: > Finally, breaking things up physically, rather than tagging things > 'mentally' so to speak ruins any kind of smooth workflow. Write test fix > test, becomes write test in one place, when it works move it to another > place, debug it again in the new context. I have done this for a long time. Putting nonworking tests into separate classes and packages is in fact perfectly fine. My experience is that your arguments are wrong. Moving code around in Smalltalk is very easy. It is almost as easy to move a method from one class to another as it is to press a button. It takes MUCH longer to figure out whether you want to move it. Tests should be independent of each other, so moving it should make no difference to debugging. Wilhelm K. Schwab, Ph.D. University of Florida Department of Anesthesiology PO Box 100254 Gainesville, FL 32610-0254 Email: [hidden email] Tel: (352) 846-1285 FAX: (352) 392-7029 |
Bill Schwab wrote:
> Ralph, > > I agree that simpler is often better, which is precisely why many of my > (Dolphin) tests are not simple, in the sense that they use helper > methods, many of which appear to belong in the TestCase hierarchy. If I > were to follow your suggestion, I would have to duplicate those methods > to get the tests to run at all. > On my last big project, I had 500 tests for each simulation. With 3 or 4 subsequent versions of the simulation. The versioning of the tests were easily implemented by subclassing with modified tests for modified behaviour. Not only that but it was extremely common for one test to invoke another, the first test might log the user in, the second might log the user in and change the configuration, the third might login and trigger an alarm test.. etc. following a decision tree as to what a user might do with the system. I dont see why tests should be independant of one another? It may be an ideal, but I guess it depends upon what it is you are testing exactly. I think that there are scenarios, such as tracing a deep decision tree, where it is a good idea for tests to be dependant upon each other, even just for pragmatic reasons. With so much in the class hierarchy, including extra functionality such as time-outs in my TestCasePlus class moving an arbitrary test out to another class just would not work. Tests are rarely just isolated items. Of course you can argue that they should be, but that's not the way it goes in practice for me at least. > Most of my methods have a comment that looks something like > > ThisOrThatTestCase prod:#testSomething. > I did that too. Wouldn't it be good if the browser new how to invoke a test with a button. > class>>runTestsReferencing: allowing scripts such as > > TestCase runTestsReferencing:#toddsBug. > TestCase runTestsReferencing:#realizeGizmo:using:. > I settled for having TestCases explicitly declare membership in a test suite, then being able to run a suite. TestCase suite: #tl1Version1Suite > where #toddsBug is a symbol strewn throughout the code for my cash cow. > Todd reported it, so it was only fair to immortalize him. > I simply used 'kph todo' and 'kph mod' etc etc strewn through out my code. > I am confident that you could adapt or extend this idea to give you the > flexibilty you seek while leaving the test case heirarchy, packaging, > etc. in tact. > > Bill > The TestRunner improvements that I table, allow definition of suites by. a) method name match b) method category match c) method source containing literal symbol or flag. so we got your ideas covered. (Just need to add pragmas!) best regards Keith ___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html |
In reply to this post by Ralph Johnson
Keith,
Running tests from the browser is a fine idea. Can a test case be a member of more than one suite in your approach? I suspect that most of the time, I run more tests than I really need to run, which is fine to a point. Early in trying to fix something, I will lock in on one test and #prod: that until it passes anyway, so there is little harm in snaring some extra tests. With that said, some bugs work down to a struggle among some stubborn test methods (e.g. any three will pass but not all of them), at which point it is nice to be able to tweak the set that runs. I got started doing this kind of testing in part because Dolphin's sunit browser did not handle selections as well as I wanted. It would run selected tests, but the selection would get clobbered with the resets, and (my fault I realize) it was too easy to run all vs. running the selected methods. It started as a compromise, but I quickly grew to like using the browser for getting the big picture and then using doits to drive debugging of failing tests. I _think_ the runner I see in Squeak does a good job of selection handling, but I have not used it enough to give a good judgement. Bill Keith Hodges: With so much in the class hierarchy, including extra functionality such as time-outs in my TestCasePlus class moving an arbitrary test out to another class just would not work. Tests are rarely just isolated items. Of course you can argue that they should be, but that's not the way it goes in practice for me at least. > Most of my methods have a comment that looks something like > > ThisOrThatTestCase prod:#testSomething. > I did that too. Wouldn't it be good if the browser new how to invoke a test with a button. > class>>runTestsReferencing: allowing scripts such as > > TestCase runTestsReferencing:#toddsBug. > TestCase runTestsReferencing:#realizeGizmo:using:. > I settled for having TestCases explicitly declare membership in a test suite, then being able to run a suite. TestCase suite: #tl1Version1Suite > where #toddsBug is a symbol strewn throughout the code for my cash cow. > Todd reported it, so it was only fair to immortalize him. > I simply used 'kph todo' and 'kph mod' etc etc strewn through out my code. > I am confident that you could adapt or extend this idea to give you the > flexibilty you seek while leaving the test case heirarchy, packaging, > etc. in tact. > > Bill > The TestRunner improvements that I table, allow definition of suites by. a) method name match b) method category match c) method source containing literal symbol or flag. so we got your ideas covered. (Just need to add pragmas!) best regards Keith Wilhelm K. Schwab, Ph.D. University of Florida Department of Anesthesiology PO Box 100254 Gainesville, FL 32610-0254 Email: [hidden email] Tel: (352) 846-1285 FAX: (352) 392-7029 |
Dear Bill,
you ask whether a test can be in more than one suite, absolutely. First of all classes explicitly define and publish their own suites. You could have: #allStandardTests defined as all methods matching 'include:test*' #testsBeingWorkedOn defined as methods in method category 'include@wip' thus some tests may be members of both groups. I shall append the relevant information from the class description for you Keith ------------------ More flexible suite building API: Asking an abstract class for #suite: will build the entire suite for all of its concrete subclasses. Asking a concrete class for #suite: will just return the suite for that one class. Suites are defined as class methods for a testcase class. example: MyTestCase>>myTestSuite "all 'test*' methods, but not those in category 'long tests', or tests flagged with #BrokenTest" ^ #( 'include:test*' 'exclude@long tests' 'exclude#BrokenTest' ) Suites are obtained via either a) a selector which references a defined suite or b) an explicit string defining a match c) an array of a & b. TestCase suite: #myTestSuite. TestCase suite: 'include:test*'. TestCase suite: #( 'include:test*' 'include:longtest*'). The array can be used to combine other suites example: myTestSuiteWithLongTests ^ #( #myTestSuite 'include@long tests' ) Published Suites API #publishedSuites provides a list of test suites published by each TestCase class that can be picked in the TestRunner. This provides a natural UI for selecting from a range of testing scenarios. e.g. tests for different products, releases, platforms, performance monitoring tests, long tests, tests needing additional services (db) Published Filters API #published filters provides a list of filters that can be picked in the TestRunner. This provides a natural UI for including/excluding groups of tests from a suite. publishedFilters ^#( #filterOutBrokenTests ) filterOutBrokenTests ^ 'exclude#BrokenTest' ___________________________________________________________ Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html |
In reply to this post by Elod Kironsky
Couldn't we remove the failing tests and put them in a sort of "unstable",
since presumably they are like a task list of things that we want to work at some point? There should be no failing tests in a stable release. If you *want* your test to fail, then just reverse the assertion so it is a green test when it works the way it should. And for tests that fail if something is not present, can't the test check if what ever it is actually is present first? Of course you could keep the failing tests in the stable image if you can put them in a selection like "future work" or something that doesn't show up unless you turn it on. Otherwise deleting sounds ok to me. For the sanity of any release team, red *has to* mean broken. >From: Elod Kironsky <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: The general-purpose Squeak developers >list<[hidden email]> >Subject: Re: improving the quality of the image >Date: Mon, 29 Jan 2007 11:47:14 +0100 > >Philippe Marschall wrote: >>2007/1/29, Elod Kironsky <[hidden email]>: >>>Philippe Marschall wrote: >>> > 2007/1/26, Bert Freudenberg <[hidden email]>: >>> >> >>> >> On Jan 26, 2007, at 16:03 , Philippe Marschall wrote: >>> >> >>> >> > 2007/1/26, Ralph Johnson <[hidden email]>: >>> >> >> One of my goals for 3.10 is to improve the quality of the image. >>>Our >>> >> >> first release (coming soon!) will have only green tests, and each >>> >> >> following release will have only green tests. >>> >> > >>> >> > How does removing failing tests improve the quality? >>> >> >>> >> Woa, where does that hostility come from? There is another way to >>> >> ensure all tests are green, besides removing the failing ones. >>> > >>> > What hostility? I could not see why this improves the quality because >>> > to me the first step to fix a problem is to admit that you have a >>> > problem. Failing tests are pointer to problems for me. Removing >>> > failing tests because they can not be fixed today or tomorrow looked >>> > to me like an attempt to hide hide a problem. So I asked and now I >>> > know the reason why it was done. >>> > >>> > Philippe >>> > >>>Philippe, where did you read that failing tests will be removed? "First >>>release will have >>>only green tests" means, that all tests remain and will pass, not fail. >>>There will be no >>>test removal at all! I'm, pretty sure you misunderstood something. >> >>http://bugs.impara.de/view.php?id=5527 >> >>Philippe >> >Sorry Philippe, then I have to agree with you and join to Goran's >proposition >to classify the test, removing them is not a good solution I think. > >Elod > _________________________________________________________________ Check out all that glitters with the MSN Entertainment Guide to the Academy Awards® http://movies.msn.com/movies/oscars2007/?icid=ncoscartagline2 |
> when it works the way it should. And for tests that fail if > something is not present, can't the test check if what ever it is So a "to do", or a "known issue" test is a test that fails when the code that works correctly is not present. ;-) > actually is present first? So it is only a specific instance of a more generalised case Categorisation handles all of this an more, and at least in the beginning my early implementation used less methods and code than the existing hard coded mechanism. Keith ___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html |
Free forum by Nabble | Edit this page |