I don’t measure test coverage

I believe that test coverage is an unhelpful measure. I have two main reasons behind this belief: Firstly, test coverage often gives false positives and false negatives; and secondly, too many badly written tests will slow a project down (that is, higher coverage can mean lower habitability).

It is relatively easy to write tests that exercise code without actually checking it for correctness. So high test coverage does not necessarily correlate with correctness and completeness. Conversely, low automated test coverage does not imply that the software is untested, nor does imply the presence of defects.

It is also relatively easy to write tests that enshrine bad design. In codebases with high coupling and low cohesion, high test coverage probably means that those poor designs are difficult to improve without breaking tests. For example, in code that is procedural, incohesive and highly coupled, tests that simply “mock out” every one of an object’s dependencies (read: use a mock library to stub out the dependencies) will likely increase the cost of changing the design later; whereas tests that have been used to drive roles into the design will ultimately help to make the code easier to change — but only if the fundamental design structures in the code mirror those in the domain.

So it is my belief that good design trumps test coverage, and I therefore prefer to focus on design measures first. (More on those later…) Note that I am not saying that automated testing is bad, nor indeed that high automated test coverage is bad. I am saying that high test coverage is not a good measure of software habitability.

Advertisements

8 thoughts on “I don’t measure test coverage

  1. I agree. I stopped relying on code coverage a long time ago when I found mutation analysis. Mutation analysis is a far stronger and more stable metric than code coverage because it:
    1. Subsumes code coverage criteria (especially statement and branch)
    2. Can show if I’m missing any tests
    3. And of the tests that I have, it can tell me if they are adequate and what quality they have.

    Have you tried mutation analysis?

    • Yes, I use mutation testing when I have to ensure complete automated coverage, for example when I write a gem that I think other people will want to use or maintain. But I still don’t use it as a general measure of any quality of the code itself.

  2. I believe that exercise is helpful, but not necessarily a measure of good health. An intense focus on exercise might detract from other elements of good health, like eating well or seeing the doctor. You can also hurt yourself if you do exercise incorrectly or too often. People have been hit by buses while jogging, you know! Therefore, I’ll still exercise, sometimes, when I think it’s good, but I’ll do the other things well. Note, I’m not saying exercise is bad, nor indeed that regular exercise is bad. I’m just saying it’s not a measure of good health.

    That’s what I hear from you. You’re taking one aspect of an overall goal, and then saying that one aspect isn’t **everything** and there are more things to consider. But this insight does not change any of the facts – exercise is good, and testing your code is good, too.

    You can come up with a lot of edge cases where tests aren’t necessarily great, just as you can come up with lots of edge cases where exercise isn’t needed… but these facts do not change the overall (correct) generalization that when a programmer starts to write code, it would be better to test the code than not. The “actionable” result from the fact gathering is: You’ll Probably Need To Write Those Tests.

    I’ll agree with one respect: I don’t see a lot of value in test coverage as a metric, but only because I tend to work with developers that test their code. If you have a group of programmers who test their code, the question of test coverage never comes up. Our focus is on delivery and preventing errors and regressions, and we only get into test coverage when we do not succeed. I’ve found that debates about the value of test coverage come up in places where programmers don’t want to write tests, and they’re grasping for reasons why.

    • @Darren: I’m only saying that test coverage is a misleading metric, and giving a few reasons why I’ve seen that to be the case in real life. Testing is good. And it can be done badly. Just like exercise :)

      • Misleading? I think it depends on what you’re looking for.

        Example: Let’s say I have a mission-critical class in my app, and the test coverage tells me that there are NO tests over it. My impression is… there are no tests on that class. It sounds like the test coverage is giving me some accurate information there.

        If there are tests over it, I assume that the tests executed the code. It doesn’t mean the code is well-written, that the design is sound, or that the tests are good. It just means the code was executed, which has to give me **some** sort of confidence in it, right? Versus not running the code at all? At least that I’m not throwing a debugging exception or something else that is obviously broken? That’s what I’d take from the test coverage… more facts! :)

        I know there are nuts everywhere, but I think most people who take testing seriously have the same impression of test coverage. We don’t think it claims what you seem to think it claims, we just take it as a fact and act accordingly.

  3. I agree that measuring coverage can be harmful, and prefer not to do it unless I have a specific use for the information (e.g. detecting zero-coverage areas for later review.)
    I’ve seen low-maturity projects where coverage measurements were given fairly high profile, so developers started pursuing high coverage as a goal. This resulted in a lot of rapidly-produced tests that weren’t very well thought out, often weren’t great tests, and often locked in the existing bad designs, exactly as Kevin describes.

  4. Pingback: Choosing what to measure | silk and spinach

  5. I agree in part, since code coverage do not guarantee good test quality (worst case, I can get 100 percent coverage without a single *assertion*). But I think that structural code coverage is at least useful to detect not executed portions of code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s