Who Tests The Tests?

Mar 12, 2007 tdd testing suggest edit

Leon Bambrick (aka SecretGeek) has started a series on Agile methodologies and Test Driven Development (TDD) in which he brings up his own various hidden objections to TDD in order to see if his prejudices can be overcome.

One of the questions he asks is an age old argument against TDD. Who Tests the Tests?Leon sees potential for a stack overflow since, given that the tests are code, and that according to TDD, code should be tested, shouldn’t there be tests for the tests?

The short answer is that the code tests the tests, and the tests test the code.

Huh?

Testing Atomic Clocks

Let me start with an analogy. Suppose you are travelling with an atomic clock. How would you know that the clock is calibrated correctly?

One way is to ask your neighbor with an atomic clock (because everyone carries one around) and compare the two. If they both report the same time, then you have a high degree of confidence they are both correct.

If they are different, then you know one or the other is wrong.

So in this situation, if the only question you are asking is, “Is my clock giving the correct time?”, then do you really need a third clock to test the second clock and a fourth clock to test the third? Not if all. Stack Overflow avoided!

Principle of Triangulation

This really follows from the principle of triangulation. Why do sailors without electronic navigation systems bring three sextants with them on board a ship?

With one sextant, you could rely on the manafacture testing to assume its measurements are correct, but wear and tear over time(not much unlike the wear and tear a codebase suffers over time) might make the measurements slightly off.

If you take measuremnts with two sextants, then you have enough information to decide if both are measuring accurately or if one is not. However in this situation, we need to know exactly which measurement is correct.

So we take a third sextant out. The two sextants that take measurements most closely together are most likely correct. Accurate enough to cross the Atlantic.

Found a typo or mistake in the post? suggest edit

Comments

15 responses

Rob Conery • March 12th, 2007
It's asymptotic - the more tests you write, the better chances of catching and limiting bugs. I don't think you can triangulate this kind of thing since you'll never have a datum - a complete reference that you can base your other assumptions on (no North-going Zax to counter your South-bound one).
The best you can do, IMHO, is to mix your tests up with your leftover spam n' eggs, throw them at the bottom of your life raft, and hope they fill as many holes as possible before you sink under the weight of ridiculous analogies.
Haacked • March 12th, 2007
> I don't think you can triangulate this kind of thing
Well that's the thing with all analogies. They only go so far. Still, the point of analogies is to look for the truth and ignore the noise.
It isn't exactly like triangulation, but it's very similar. Focus on my atomic clock example. If your unit tests fail, then you know something is wrong. If they both past, you have more confidence things are happening smoothly. But you don't know.
Also, another point I forgot to mention is that tests overlap somewhat too. For example, if you have method A that calls Method B, and you wrote unit tests for both methods, then in a manner of speaking, the test for method B sort of serves as a check for Method A. IE if method B's test fails and Method A's test doesn't you have something to double check.
DotNetKicks.com • March 12th, 2007
You've been kicked (a good thing) - Trackback from DotNetKicks.com
rams • March 12th, 2007
I don't understand all this opposition to TDD. Does it stem from the fact that it's "new", not many people know it or practice it?
Whenever I pitch TDD to fellow developers, the immediate response is "I can write tests that pass, so what good is it". I just respond "You have missed the point" and move on.
TDD is not an end-all soultion to preventing bugs in your software. It's just another tool that aids in improving the confidence in your code. In a typical waterfall model, you have to write and document all unit test cases before you code. And nobody does that. Also, you have to manually run through the scenarios.
Only a few people write perfect code. For the rest of us, its write, run, test, bomb, debug, refactor. Each refactor will need to be run through all the scenarios. If the scenarios are in your head, it's highly likely that 2 weeks later you wont remember half of them. This is where TDD helps. And yes TDD does add to your schedule.
I am all for any methodology that helps me improve my code and increase the confidence level, provided it does not come with too much overhead.
Yes, seeing the green lights is not the end, your code still has to work when it's all put together as a system.
J. Irvine • March 12th, 2007
To extend the analogy to the point of breaking, checking a second atomic clock might be good enough in most cases, but sometimes weird things like, say, the Daylight Savings Time laws changing, will screw the whole thing up. Thanks to the recent change here in the US, I had to update all three of my atomic clocks by hand, and I'll have to change them again in three weeks when they try to jump ahead another hour.
(Of course, you could argue that the guys who did the firmware for the clocks are to blame for this, not the laws changing.)
The point being that you can't always account for weird edge cases your TDD tests. You might just get results that look good but that are actually wrong. In other words, just because the returns look like the proper values doesn't mean the code arrived at it in the right way. You still need to do system testing.
Haacked • March 12th, 2007
@rams: Good point. The alternative to TDD was that developers were supposed to write up a spec and manually unit test every method. supposed to but never did.
@J. Irvine - good point. TDD is not a panacea as rams says, and no respectable TDD person claims that. I can't stress this enough. There are 2 primary benefits of TDD.
1. Improve the design and quality of your code
2. Repeatable automated regression tests of your code.
That doesn't mean #2 will catch everything. But believe me, if you find a bug, you write a test that should pass but fails because of the bug, and then fix the bug. You'll pretty much never encounter that bug again.
lb • March 13th, 2007
One possible question that can be asked of this approach is:
"Why not just make a more reliable sextant in the first place?"
In which case you see amazing things when you look at the economics of the situation.
Say the textant costs ten dollars to manufacture at its current level of quality.
To strengthen, reinforce, improve, lighten, and refine every aspect of the sextant. Yet the reliability might only double.
But by using two or three sextants at once -- the cost obviously goes up in a linear way. While the reliability goes up by an order of magnitude for every new sextant.
Say the unreliability was 1/1000 --> for two sextants the unreliability is 1/1000 * 1/1000.... for three you have one in a billion.... to get these same results by improving just one sextant would be giga-normously insanely prohibitively expensive.
Haacked • March 13th, 2007
That's a really great point!
Not to mention, how exactly would ensure the reliability of your extremely reliable sextant? Wouldn't you compare it to another sextant?
That's closer to software dev, no? We're building the sextant. How do we know the sextant is working correctly. Compare it with another.
But we've really strectched this analogy too far. ;)
Dan Hounshell • March 13th, 2007
As I was reading along I thought you were going to bring your triangulation analogy to the point of other types of tests - possibly acceptance testing in the form of Fit(nesse). I think the code, basic unit tests, and client defined acceptance testing really form the trifecta we're aiming for.
Additionally, the "triangulation" principle applies to almost every field imaginable. Having recently completed some small construction jobs around the house, I well know that two points may make a straight line but it might not be the straight line you intended it to be. Especially when cutting drywall or plywood, always opt for that third point of reference!
michaud • March 13th, 2007
The test you write in TDD should (mostly) be based on the information you get from the information analist. of course you extrapolate atomic values and states within the implementation but in the end the tests that are based on that information is the test for the test.
Joshua Flanagan • March 21st, 2007
The TDD process "red, green, refactor" includes calibration steps to specifically address this problem.
1) Red: Write a test that checks for the desired functionality. Run it against code that does not yet implement that functionality. The test should fail, verifying that the test can recognize an invalid state.
2) Green: Implement just enough code so that the test passes - "green". This can mean something as simple as returning a hardcoded value. This step is to ensure your test will correctly recognize a valid state.
3) Refactor: Modify the implementation of the code to be more flexible. Continue re-running the test as you change the implementation, making sure that you always stay "green".
It is important to note that you are only changing your implementation code in these steps - not your test code. If you change your test code, you will need to re-calibrate.
Jeremy D. Miller -- The Shade • March 22nd, 2007
Ian Cooper recently wrote about the Ratio of test code to production code . Ian is saying that he&#39;s
Forman Jony • April 3rd, 2007
https://haacked.com/archive/...
Buy phentermine diet pill. • May 21st, 2007
Buy phentermine. Buy phentermine cod. Buy phentermine online. Mg buy phentermine. Buy cheap phentermine online. Buy cheap phentermine.
Miggro • December 25th, 2010
Hi Phil,
Being in a situation with just half a QA on the project, a limited amount of unit-tests and limited resources I totaly get "The Glow" on "lean" and "triangulation".
Keeping in mind the goal of the sailing-trip one could argue that all sextants (unit-tests) should agree. Failure of any one of them should't distract from the main goal... just get there.
To identify the failing unit-test one could test the bigger picture... try navigating on stars, wind and actually seeing on-route islands should indicate which sextant is the best one.
One could argue that the correct test for a unit-test is the test that validates the outcome of all it's parts. This principle could then be chained on a higher level to validate the ultimate goal of the tests.
Still this doesn't validate the ultimate goal... While some tests may report success some sub-tests may still report results that are incorrect, however upto now there results have been sufficient to meet business-needs.
Conclusion: One could test a test for eternaty but only the latest insight can validate it all.