improving the quality of the image

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

keith1y
Bert Freudenberg wrote:
> I actually side with Ralph on this one. It is very satisfying to see
> the test runner turn green. With tests in the image that you cannot
> fix, you will never get this satisfaction.
Yes you can, you select "all standard tests" and you select the filter,
"known issues", run the tests and you get all green for the tests that
are supposed to be green. There are also filters for tests that are not
expected to work on this platform/vm release etc

I personally would like to fill the (or should I say an) image with
100's of test stubs saying , we need a test for this and for that.

Keith

       
       
               
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

stephane ducasse
In reply to this post by Bert Freudenberg
I agree too. This was not our intention to ship 3.9 with broken tests  
but we had to stop and nobody really stepped up to help (except one  
person
but I do not remember and we tried to harvest his fixes).

Now in VW a guy in our team extended SUnitToo to support failing  
tests and I think that this is a nice alternative.

PS: I would avoid to put code on mantis as cs but publish packages on  
squeaksource since this is easier to deal with them.
Create one package FailingTest and this way if someone wants to have  
a look to fix a test he can load all of them in one shot.


On 29 janv. 07, at 18:23, Bert Freudenberg wrote:

> I actually side with Ralph on this one. It is very satisfying to  
> see the test runner turn green. With tests in the image that you  
> cannot fix, you will never get this satisfaction.


Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

stephane ducasse
In reply to this post by keith1y
>
> I personally would like to fill the (or should I say an) image with  
> 100's of test stubs saying , we need a test for this and for that.

it would be a nice way to invite people participate to the effort.
This is now years that we are pushing test writing... and we should  
continue.

Stef

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Colin Putney
In reply to this post by Bert Freudenberg

On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote:

> I actually side with Ralph on this one. It is very satisfying to  
> see the test runner turn green. With tests in the image that you  
> cannot fix, you will never get this satisfaction.

True, the green bar is very satisfying. But Ralph's reasons go a bit  
deeper. The purpose of a test suite is to provide immediate feedback  
on during development. Running the suite is like asking "how am I  
doing?" A green bar means "fine," and a red bar means, "STOP! You  
just introduced a bug!" The value of a test suite is that it can let  
us know that we've introduced a bug at the moment it happens. That's  
the best time to fix it, because the person who understands it best  
is right there and has all the information he needs.

On the other hand, a test suite is *not* a bug database. That's what  
Mantis is for. "Visibility" of known bugs doesn't get them fixed, it  
just makes them overshadow the unknown bugs we'll introduce going  
forward.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Nicolas Cellier-3
In reply to this post by stephane ducasse
stephane ducasse a écrit :

> I agree too. This was not our intention to ship 3.9 with broken tests
> but we had to stop and nobody really stepped up to help (except one person
> but I do not remember and we tried to harvest his fixes).
>
> Now in VW a guy in our team extended SUnitToo to support failing tests
> and I think that this is a nice alternative.
>
> PS: I would avoid to put code on mantis as cs but publish packages on
> squeaksource since this is easier to deal with them.
> Create one package FailingTest and this way if someone wants to have a
> look to fix a test he can load all of them in one shot.
>
>
> On 29 janv. 07, at 18:23, Bert Freudenberg wrote:
>
>> I actually side with Ralph on this one. It is very satisfying to see
>> the test runner turn green. With tests in the image that you cannot
>> fix, you will never get this satisfaction.
>

If I follow dominating logic in this thread, the process to incorporate
a patch could be:
1) make a subclass of KnownBug instead of TestCase for the bug test.
2) write or Load a patch.
3) check non regression of all TestCase suite (green bar).
4) reject the patch or goto 2) if TestCase suite bar is red
5) check correction of targeted KnownBug
6) reject the patch or goto 2) if bug test red
7) accept the patch and move test from KnownBug to TestCase hierarchy

As you can see, you will need an image holding both KnwonBugs and
TestCase during patch process.
In this image, KnownBugs must not turn TestCase suite bar red.
So the SUnit framework SHALL deal with this feature.

Alternative is to use two images, one for for bugfix and the other for
non regression test, but with the risk of having images diverging.

Whatever you use, protocol or subclass or any other flag somewhere in
the image, you just need a classification that you can change easily.

If the SUnit framework deal with this, the question raised is why should
we remove KnownBugs from the image?
- for cosmetic reasons? (it cannot be a commercial reason: as already
said, knowing some big company success, it's obviously not so hard to
sell bugs)
- for cleaning reasons? having a small image... I cannot believe that
KnwoBugs suite is as big as TestCase suite, so you gain relatively few.

Alternatively, why should we keep KnownBugs in the image?
- for encouraging people to fix them? people look more often in their
image than in mantis don't they? Scanning the mailing list, you will see
that a lot of people tried to make these light turn green in 3.9
process, not always successfully that's true, some bugs are terse.

If we keep it in the image, it must be a package maintained with
SqueakMap and Monticello so that people can share. I totally agree with
Steph.

If we do not, we have Mantis database.
Mantis is rich but information is rather scattered...
It contains old bugs already fixed, complete and incomplete fixes,
enhancements, feature requests and non bug.

It is not convenient enough and we need an improved way to load all
known bugs at once. I am sure Steph and Marcus can confirm that.

However, if one wants to fix a bug, he should better keep the link on
mantis database as it is the place to read other programmers discussion,
bug tests and eventual (partial) fix attempts...

So this is the only thing that annoy me in Steph proposition.
How to re-concile these two forms?

It must be noted that a side effect of 3.9 red light has been that bugs
were reported several times in mantis with several patches uncompatible
or uncomplete...

Use an automaton with some special tags inserted in mantis comments
processed within Squeak?
Human usable Hyperlink to mantis from within squeak?

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Milan Zimmermann-2
In reply to this post by Colin Putney
On 2007 January 29 16:58, Colin Putney wrote:

> On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote:
> > I actually side with Ralph on this one. It is very satisfying to
> > see the test runner turn green. With tests in the image that you
> > cannot fix, you will never get this satisfaction.
>
> True, the green bar is very satisfying. But Ralph's reasons go a bit
> deeper. The purpose of a test suite is to provide immediate feedback
> on during development. Running the suite is like asking "how am I
> doing?" A green bar means "fine," and a red bar means, "STOP! You
> just introduced a bug!" The value of a test suite is that it can let
> us know that we've introduced a bug at the moment it happens.

Yes, also I think this is important when test are automated (e.g using a test
server) - add a change to the image (install a new version of MCZ, load a
changeset) and run tests: If all succeed, good, if something fails, a bug was
introduced. Unless something in the system keeps track of bugs that
are "expected to fail" and compares them to what just happened, it is
important to have all tests succeed for auto-testing to work.

Milan

> That's
> the best time to fix it, because the person who understands it best
> is right there and has all the information he needs.
>
> On the other hand, a test suite is *not* a bug database. That's what
> Mantis is for. "Visibility" of known bugs doesn't get them fixed, it
> just makes them overshadow the unknown bugs we'll introduce going
> forward.
>
> Colin

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

keith1y
Milan Zimmermann wrote:

> On 2007 January 29 16:58, Colin Putney wrote:
>  
>> On Jan 29, 2007, at 9:23 AM, Bert Freudenberg wrote:
>>    
>>> I actually side with Ralph on this one. It is very satisfying to
>>> see the test runner turn green. With tests in the image that you
>>> cannot fix, you will never get this satisfaction.
>>>      
>> True, the green bar is very satisfying. But Ralph's reasons go a bit
>> deeper. The purpose of a test suite is to provide immediate feedback
>> on during development. Running the suite is like asking "how am I
>> doing?" A green bar means "fine," and a red bar means, "STOP! You
>>    
Yes except that the current suite of tests take ages to run, and so does
not fulfil this goal.

The solution which already exists! Is to categorise the tests, so that a
suite of short tests can be run more frequently in less than 2 minutes.

If you have this categorisation then it is trivial to have a category of
'known issues', which you simply do not run if you want to see your
green light.

The improved TestRunner, times each test, and can automatically sort the
long from the short, the network using tests from the non-network using
tests etc.

Not forgetting that there are other categories of tests needed in order
to get that all hallowed green bar, such as: tests that I would not
expect to work on my platform, or tests that will not work on the vm
that I am using, and tests that will not work in this version of the image.

Pulling your known issues tests into a separate class, is not a good
solution because then you loose the ability to subclass from TestCase
sensibly.

Pulling your known issues into a separate package is not a good
solution, because then you loose the context for those tests, since they
belong in the same context as those tests which pass. You need that
context if you are ever going to fix them.

Finally, breaking things up physically, rather than tagging things
'mentally' so to speak ruins any kind of smooth workflow. Write test fix
test, becomes write test in one place, when it works move it to another
place, debug it again in the new context.

To try the improved TestRunner, try

Installer fixBug: 5639.

best regards

Keith



       
       
               
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Ralph Johnson
> Pulling your known issues tests into a separate class, is not a good
> solution because then you loose the ability to subclass from TestCase
> sensibly.
>
> Pulling your known issues into a separate package is not a good
> solution, because then you loose the context for those tests, since they
> belong in the same context as those tests which pass. You need that
> context if you are ever going to fix them.
>
> Finally, breaking things up physically, rather than tagging things
> 'mentally' so to speak ruins any kind of smooth workflow. Write test fix
> test, becomes write test in one place, when it works move it to another
> place, debug it again in the new context.

I have done this for a long time.  Putting nonworking tests into
separate classes and packages is in fact perfectly fine.  My
experience is that your arguments are wrong.

Moving code around in Smalltalk is very easy.  It is almost as easy to
move a method from one class to another as it is to press a button.
It takes MUCH longer to figure out whether you want to move it.  Tests
should be independent of each other, so moving it should make no
difference to debugging.

The big advantage of SUnit is that it is extremely simple.  Simple
things work.  Sometimes complicated things work, too, but it is
harder!

-Ralph Johnson

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Schwab,Wilhelm K
In reply to this post by Ralph Johnson
Ralph,

I agree that simpler is often better, which is precisely why many of my
(Dolphin) tests are not simple, in the sense that they use helper
methods, many of which appear to belong in the TestCase hierarchy.  If I
were to follow your suggestion, I would have to duplicate those methods
to get the tests to run at all.

Dolphin's SUnitBrowser is a great tool, but I don't use it very often.
It is great for figuring out whether all is well, or for finding failing
tests.  For work on a given failing test or group of tests, I use doits.
 Most of my methods have a comment that looks something like

    ThisOrThatTestCase prod:#testSomething.

#prod: arranges for setup and teardown, and either goes straight to a
walkback on error/halt, or does _nothing_ if all is well.  Such comments
frequently find there way into domain classes, so it becomes easy to
restart a failing test.  Finally, I have TestCase
class>>runTestsReferencing: allowing scripts such as

  TestCase runTestsReferencing:#toddsBug.
  TestCase runTestsReferencing:#realizeGizmo:using:.
  ...

where #toddsBug is a symbol strewn throughout the code for my cash cow.
 Todd reported it, so it was only fair to immortalize him.
#realizeGizmo:using: is ficticious, there are several like-named methods
that I test along with Todd's bug.  I fixed it a couple of years ago,
but still find the tests useful to ensure that it stays fixed.

I am confident that you could adapt or extend this idea to give you the
flexibilty you seek while leaving the test case heirarchy, packaging,
etc. in tact.

Bill




Ralph Johnson:
> Finally, breaking things up physically, rather than tagging things
> 'mentally' so to speak ruins any kind of smooth workflow. Write test
fix
> test, becomes write test in one place, when it works move it to
another
> place, debug it again in the new context.

I have done this for a long time.  Putting nonworking tests into
separate classes and packages is in fact perfectly fine.  My
experience is that your arguments are wrong.

Moving code around in Smalltalk is very easy.  It is almost as easy to
move a method from one class to another as it is to press a button.
It takes MUCH longer to figure out whether you want to move it.  Tests
should be independent of each other, so moving it should make no
difference to debugging.



Wilhelm K. Schwab, Ph.D.
University of Florida
Department of Anesthesiology
PO Box 100254
Gainesville, FL 32610-0254

Email: [hidden email]
Tel: (352) 846-1285
FAX: (352) 392-7029


Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

keith1y
Bill Schwab wrote:
> Ralph,
>
> I agree that simpler is often better, which is precisely why many of my
> (Dolphin) tests are not simple, in the sense that they use helper
> methods, many of which appear to belong in the TestCase hierarchy.  If I
> were to follow your suggestion, I would have to duplicate those methods
> to get the tests to run at all.
>  
On my last big project, I had 500 tests for each simulation. With 3 or 4
subsequent versions of the simulation. The versioning of the tests were
easily implemented by subclassing with modified tests for modified
behaviour.

Not only that but it was extremely common for one test to invoke
another, the first test might log the user in, the second might log the
user in and change the configuration, the third might login and trigger
an alarm test.. etc. following a decision tree as to what a user might
do with the system.

I dont see why tests should be independant of one another? It may be an
ideal, but I guess it depends upon what it is you are testing exactly. I
think that there are scenarios, such as tracing a deep decision tree,
where it is a good idea for tests to be dependant upon each other, even
just for pragmatic reasons.

With so much in the class hierarchy, including extra functionality such
as time-outs in my TestCasePlus class moving an arbitrary test out to
another class just would not work. Tests are rarely just isolated items.
Of course you can argue that they should be, but that's not the way it
goes in practice for me at least.

>  Most of my methods have a comment that looks something like
>
>     ThisOrThatTestCase prod:#testSomething.
>  
I did that too. Wouldn't it be good if the browser new how to invoke a
test with a button.

> class>>runTestsReferencing: allowing scripts such as
>
>   TestCase runTestsReferencing:#toddsBug.
>   TestCase runTestsReferencing:#realizeGizmo:using:.
>  
I settled for having TestCases explicitly declare membership in a test
suite, then being able to run a suite.

TestCase suite: #tl1Version1Suite
> where #toddsBug is a symbol strewn throughout the code for my cash cow.
>  Todd reported it, so it was only fair to immortalize him.
>  
I simply used 'kph todo' and 'kph mod' etc etc strewn through out my code.
> I am confident that you could adapt or extend this idea to give you the
> flexibilty you seek while leaving the test case heirarchy, packaging,
> etc. in tact.
>
> Bill
>  
The TestRunner improvements that I table, allow definition of suites by.

a) method name match
b) method category match
c) method source containing literal symbol or flag.

so we got your ideas covered. (Just need to add pragmas!)

best regards

Keith
 

       
       
               
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

Schwab,Wilhelm K
In reply to this post by Ralph Johnson
Keith,

Running tests from the browser is a fine idea.  Can a test case be a
member of more than one suite in your approach?  I suspect that most of
the time, I run more tests than I really need to run, which is fine to a
point.  Early in trying to fix something, I will lock in on one test and
#prod: that until it passes anyway, so there is little harm in snaring
some extra tests.  With that said, some bugs work down to a struggle
among some stubborn test methods (e.g. any three will pass but not all
of them), at which point it is nice to be able to tweak the set that
runs.

I got started doing this kind of testing in part because Dolphin's
sunit browser did not handle selections as well as I wanted.  It would
run selected tests, but the selection would get clobbered with the
resets, and (my fault I realize) it was too easy to run all vs. running
the selected methods.  It started as a compromise, but I quickly grew to
like using the browser for getting the big picture and then using doits
to drive debugging of failing tests.  I _think_ the runner I see in
Squeak does a good job of selection handling, but I have not used it
enough to give a good judgement.

Bill



Keith Hodges:
With so much in the class hierarchy, including extra functionality such

as time-outs in my TestCasePlus class moving an arbitrary test out to
another class just would not work. Tests are rarely just isolated
items.
Of course you can argue that they should be, but that's not the way it

goes in practice for me at least.

>  Most of my methods have a comment that looks something like
>
>     ThisOrThatTestCase prod:#testSomething.
>  
I did that too. Wouldn't it be good if the browser new how to invoke a

test with a button.

> class>>runTestsReferencing: allowing scripts such as
>
>   TestCase runTestsReferencing:#toddsBug.
>   TestCase runTestsReferencing:#realizeGizmo:using:.
>  
I settled for having TestCases explicitly declare membership in a test

suite, then being able to run a suite.

TestCase suite: #tl1Version1Suite
> where #toddsBug is a symbol strewn throughout the code for my cash
cow.
>  Todd reported it, so it was only fair to immortalize him.
>  
I simply used 'kph todo' and 'kph mod' etc etc strewn through out my
code.
> I am confident that you could adapt or extend this idea to give you
the
> flexibilty you seek while leaving the test case heirarchy,
packaging,
> etc. in tact.
>
> Bill
>  
The TestRunner improvements that I table, allow definition of suites
by.

a) method name match
b) method category match
c) method source containing literal symbol or flag.

so we got your ideas covered. (Just need to add pragmas!)

best regards

Keith




Wilhelm K. Schwab, Ph.D.
University of Florida
Department of Anesthesiology
PO Box 100254
Gainesville, FL 32610-0254

Email: [hidden email]
Tel: (352) 846-1285
FAX: (352) 392-7029


Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

keith1y
Dear Bill,

you ask whether a test can be in more than one suite, absolutely.

First of all classes explicitly define and publish their own suites.

You could have:

#allStandardTests defined as all methods matching 'include:test*'
#testsBeingWorkedOn defined as methods in method category 'include@wip'

thus some tests may be members of both groups.

I shall append the relevant information from the class description for you

Keith

------------------
More flexible suite building API:

Asking an abstract class for #suite: will build the entire suite for all
of its concrete subclasses.
Asking a concrete class for #suite: will just return the suite for that
one class.

Suites are defined as class methods for a testcase class.
example:

MyTestCase>>myTestSuite
        "all 'test*' methods, but not those in category 'long tests', or
tests flagged with #BrokenTest"
        ^ #( 'include:test*' 'exclude@long tests' 'exclude#BrokenTest' )
 
Suites are obtained via either
    a) a selector which references a defined suite or
    b) an explicit string defining a match
    c) an array of a & b.

    TestCase suite: #myTestSuite.        
    TestCase suite: 'include:test*'.
    TestCase suite: #( 'include:test*' 'include:longtest*').

The array can be used to combine other suites
example:
    myTestSuiteWithLongTests
        ^ #( #myTestSuite 'include@long tests' )

Published Suites API

#publishedSuites provides a list of test suites published by each
TestCase class that can be picked in the TestRunner. This provides a
natural UI for selecting from a range of testing scenarios. e.g. tests
for different products, releases, platforms, performance monitoring
tests, long tests, tests needing additional services (db)

Published Filters API

#published filters provides a list of filters that can be picked in the
TestRunner. This provides a natural UI for including/excluding groups of
tests from a suite.
 
publishedFilters
    ^#( #filterOutBrokenTests )

filterOutBrokenTests
    ^ 'exclude#BrokenTest'



               
___________________________________________________________
Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

J J-6
In reply to this post by Elod Kironsky
Couldn't we remove the failing tests and put them in a sort of "unstable",
since presumably they are like a task list of things that we want to work at
some point?

There should be no failing tests in a stable release.  If you *want* your
test to fail, then just reverse the assertion so it is a green test when it
works the way it should.  And for tests that fail if something is not
present, can't the test check if what ever it is actually is present first?

Of course you could keep the failing tests in the stable image if you can
put them in a selection like "future work" or something that doesn't show up
unless you turn it on.  Otherwise deleting sounds ok to me.  For the sanity
of any release team, red *has to* mean broken.


>From: Elod Kironsky <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: improving the quality of the image
>Date: Mon, 29 Jan 2007 11:47:14 +0100
>
>Philippe Marschall wrote:
>>2007/1/29, Elod Kironsky <[hidden email]>:
>>>Philippe Marschall wrote:
>>> > 2007/1/26, Bert Freudenberg <[hidden email]>:
>>> >>
>>> >> On Jan 26, 2007, at 16:03 , Philippe Marschall wrote:
>>> >>
>>> >> > 2007/1/26, Ralph Johnson <[hidden email]>:
>>> >> >> One of my goals for 3.10 is to improve the quality of the image.  
>>>Our
>>> >> >> first release (coming soon!) will have only green tests, and each
>>> >> >> following release will have only green tests.
>>> >> >
>>> >> > How does removing failing tests improve the quality?
>>> >>
>>> >> Woa, where does that hostility come from? There is another way to
>>> >> ensure all tests are green, besides removing the failing ones.
>>> >
>>> > What hostility? I could not see why this improves the quality because
>>> > to me the first step to fix a problem is to admit that you have a
>>> > problem. Failing tests are pointer to problems for me. Removing
>>> > failing tests because they can not be fixed today or tomorrow looked
>>> > to me like an attempt to hide hide a problem. So I asked and now I
>>> > know the reason why it was done.
>>> >
>>> > Philippe
>>> >
>>>Philippe, where did you read that failing tests will be removed? "First
>>>release will have
>>>only green tests" means, that all tests remain and will pass, not fail.
>>>There will be no
>>>test removal at all! I'm, pretty sure you misunderstood something.
>>
>>http://bugs.impara.de/view.php?id=5527
>>
>>Philippe
>>
>Sorry Philippe, then I have to agree with you and join to Goran's
>proposition
>to classify the test, removing them is not a good solution I think.
>
>Elod
>

_________________________________________________________________
Check out all that glitters with the MSN Entertainment Guide to the Academy
Awards®   http://movies.msn.com/movies/oscars2007/?icid=ncoscartagline2


Reply | Threaded
Open this post in threaded view
|

Re: improving the quality of the image

keith1y

>  when it works the way it should.  And for tests that fail if
> something is not present, can't the test check if what ever it is
So a "to  do", or a "known issue" test is a test that fails when the
code that works correctly is not present. ;-)
> actually is present first?
So it is only a specific instance of a more generalised case

Categorisation handles all of this an more, and at least in the
beginning my early implementation used less methods and code than the
existing hard coded mechanism.

Keith

       
       
               
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html

12