I don’t measure test coverage

I believe that test coverage is an unhelpful measure. I have two main reasons behind this belief: Firstly, test coverage often gives false positives and false negatives; and secondly, too many badly written tests will slow a project down (that is, higher coverage can mean lower habitability).

It is relatively easy to write tests that exercise code without actually checking it for correctness. So high test coverage does not necessarily correlate with correctness and completeness. Conversely, low automated test coverage does not imply that the software is untested, nor does imply the presence of defects.

It is also relatively easy to write tests that enshrine bad design. In codebases with high coupling and low cohesion, high test coverage probably means that those poor designs are difficult to improve without breaking tests. For example, in code that is procedural, incohesive and highly coupled, tests that simply “mock out” every one of an object’s dependencies (read: use a mock library to stub out the dependencies) will likely increase the cost of changing the design later; whereas tests that have been used to drive roles into the design will ultimately help to make the code easier to change — but only if the fundamental design structures in the code mirror those in the domain.

So it is my belief that good design trumps test coverage, and I therefore prefer to focus on design measures first. (More on those later…) Note that I am not saying that automated testing is bad, nor indeed that high automated test coverage is bad. I am saying that high test coverage is not a good measure of software habitability.

TDD: three easy mistakes

I run introductory training in test-driven development quite frequently these days. And each time I do, I find the same basic mistakes cropping up every time, even among teams who already claim to practice TDD. Here are the three mistakes I see most often:

1. Starting with edge cases (empty string, null parameter) or error cases:

Imagine you have to test-drive the design of an object that can count the number of occurrences of every word in a string. I will often see some or all of these tests written before any others:

public class WordCounterTest {

  public void wordCounterCanBeCreated() {
    assertNotNull(new WordCounter());

  public void nullInputCausesExeption() {
    new WordCounter().count(null);

  public void emptyInputGivesEmptyOutput() {
    Map<String, Integer> actual = new WordCounter().count("");
    assertEquals(new HashMap<String, Integer>(), actual);


These feel easy to write, and give a definite feeling of progress. But that is all they give: a feeling of progress. These tests really only prove to ourselves that we can write a test.

When written first like this, they don’t deliver any business value, nor do they get us closer to validating the Product Owner’s assumptions. When we finally get around to showing him our work and asking “Is this what you wanted?”, these tests turn out to be waste if he says “Actually, now I see it, I think I want something different”.

And if the Product Owner decides to continue, that is the time for us to advise him that we have some edge cases to consider. Very often it will turn out to be much easier to cope with those edge cases now, after the happy path is done. Some of them may now already be dealt with “for free”, as it were, simply by the natural shape of the algorithm we test drove. Others may be easy to implement by adding a Decorator or modifying the code. Conversely, if we had started with the edge cases, chances are we had to work around them while we built the actual business value — and that will have slowed us down even more.

So start with tests that represent business value:

public void singleWordIsCounted() {
  Map<String, Integer> expected = new HashMap<String, Integer>();
  expected.put("happy", 2);
  assertEquals(expected, new WordCounter().count("happy happy"));

This way you will get to ask the Product Owner that vital question sooner, and he will invest less before he knows whether he wants to proceed. And you will have a simpler job to do, both while developing the happy path, and afterwards when you come to add the edge cases.

2. Writing tests for invented requirements:

You may think that your solution will decompose into certain pieces that do certain things, and so you begin by testing one of those and building upwards from there.

For example, in the case of the word counter we may reason along the following lines: “We know we’ll need to split the string into words, so let’s write a test to prove we can do that, before we continue to solve the more difficult problem”. And so we write this as our first test:

public void countWords() {
  assertEquals(2, new WordCounter().countWords("happy monday"));

No-one asked us to write a method that counts the words, so yet again we’re wasting the Product Owner’s time. Equally bad, we’ve invented a new requirement on our object’s API, and locked it in place with a regression test. If this test breaks some time in the future, how will someone looking at this code in a few months’ time cope with that: A test is failing, but how does he know that it’s only a scaffolding test, and should have been deleted long ago?

So start at the outside, by writing tests for things that your client or user actually asked for.

3. Writing a dozen lines of code in order to get the next test to pass:

When the bar is red and the path to green is long, TDD beginners often soldier on, writing an entire algorithm just to get one test to pass. This is highly risky, and also highly stressful. It is also not TDD.

Suppose you have these tests:

public void singleWordIsCounted() {
  assertEquals("happy=1", new WordCounter().counts("happy"));

public void repeatedWordIsCounted() {
  assertEquals("happy=2", new WordCounter().counts("happy happy"counts));

And suppose you have been writing the simplest possible thing that works, so your code looks like this:

public class WordCounter {
  public String counts(String text) {
    if (text.contains(" "))
      return "happy=2";
    return "happy=1";

Now imagine you picked this as the next test:

public void differentWords() {
  assertEquals("happy=1 monday=1", new WordCounter().counts("happy monday"));

This is a huge leap from the current algorithm, as any attempt to code it up will demonstrate. Why? Well, the code duplicates the tests at this point (“happy” occurs as a fixed string in several places), so we probably forgot the REFACTOR step! It is time to remove the duplication before proceeding; if you can’t see it, try writing a new test that is “closer” to the current code:

public void differentSingleWordIsCounted() {
  assertEquals("monday=1", new WordCounter().counts("monday"));

We can now make this simpler set of tests pass easily, effectively by removing the duplication between the code and the tests:

public class WordCounter {
  public String counts(String text) {
    String[] words = text.split(" ");
    return words[0] + "=" + words.length;

After making this relatively simple change, we have now test-driven part of the algorithm with which we struggled earlier. At this point we can try the previous test again; and this time if it is still too hard, we may wish to ask whether our chosen result format is helping or hindering…

So if you notice that you need to write or change more than 3-4 lines of code in order to get to green, STOP! Revert back to green. Now either refactor your code in the light of what just happened, so as to make that test easier to pass, or pick a test closer to your current behaviour and use the new test to force you to do that refactoring.

The step from red bar to green bar should be fast. If it isn’t, you’re writing code that is unlikely to be 100% tested, and which is prone to errors. Choose tests so that the steps are small, and make sure to refactor ALL of the duplication away before writing the next test, so that you don’t have to code around it whilst at the same time trying to get to green.

A testing strategy

The blog post Cucumber and Full Stack Testing by @tooky sparked a very interesting Twitter conversation, during the course of which I realised I had fairly clear views on what tests to write for a web application. Assuming an intention to create (or at least work towards creating) a hexagonal architecture, here are the tests I would ideally aim to have at some point:

  • A couple of end-to-end tests that hit the UI and the database, to prove that we have at least one configuration in which those parts join up. These only need to be run rarely, say on CI and maybe after each deployment.
  • An integration test for each adapter, proving that the adapter meets its contract with the domain objects AND that it works correctly with whatever external service it is adapting. This applies to the views-and-controllers pairings too, with the service objects in the middle hexagon stubbed or mocked as appropriate. These will each need to run when the adapters or the external services change, which should be infrequent once initial development of an adapter has settled out.
  • Unit tests for each object in the middle hexagon, in which commands issued to other objects are mocked (see @sandimetz‘s testing rules, for which I have no public link). And for every mocked interaction, a contract test proving that the mocked object really would respond as per the mocked interaction (see @jbrains‘s Integrated tests are a scam articles). These will be extremely fast, and will be run every few seconds during the TDD cycle.

I’ve never yet reached this goal, but that’s what I’m striving for when I create tests. It seems perfectly adequate to me, given sufficient discipline around the creation of the contract tests. Have I missed anything? Would it give you confidence in your app?

Why I don’t use spork

Spork is great. And so is guard and its family of plugins. Early this year I spent a while converting all of my rails projects to use spork, and we even had a team standard tmux setup that ran spork in one of the start-up screens. So every time we saved a file, guard/spork ran some of our specs. We even had growl/notify pop up a little message telling us the result, so we didn’t have to go to the trouble of switching screen to find out. How very efficient!

But now I’ve turned spork off, and here’s why: Spork solves the wrong problem.

We enabled guard/spork because our specs were slow, and during the few months we had spork, those specs became even slower. But we never noticed. Most of the time guard/spork guessed correctly which specs to run, and we became quite good at configuring guard to re-load Rails when critical files changed. So we were reasonably comfortable, and our specs ran quickly most of the time. But our design was getting worse every day.

Slow specs are a sign of too much coupling, and in the case of a Rails app that usually means not enough separation between the domain objects and the adapters. So we’ve stopped running spork, so that we can feel the pain of the 15-second Rails load time, and so that we can feel the pain of all that coupling. And slowly we are making the app’s specs faster. Many spec files now run in under a hundredth of a second, and more will follow as we gradually peel domain code out from the Rails infrastructure. We don’t need to test the Rails components, and we have a very good suite of cucumber features that act as end-to-end integration tests. So our specs should be fast, and only need to tell us that our own objects each do their thing. That’s what we are now working towards, and that’s the pot of gold that spork was hiding from us.

Faster Rails controller specs

One of the Rails apps I work on has this:

$ rspec spec
Finished in 61.82 seconds
475 examples, 0 failures

61 seconds!  (And on top of that I have to wait another 15 seconds for Rails load; that’s a whole other story, and I hope to come back to that in a future post.) A quick dig into the 61 seconds reveals this:

$ rspec spec/controllers
Finished in 22.82 seconds
114 examples, 0 failures

It happens that almost every controller action in our app needs the user to be logged in, so every controller spec needs that too. We’re using Devise, so every spec has something equivalent to this:

before :each do
  sign_in Factory(:user)

This implies hits on the database to create the user object, record the IP address and last sign-in time, etc etc. We can do much better by using a mock_model for the User, and stubbing out Warden‘s authentication completely.

First, in the interests of DRYness, I made some helper methods to do some of the jobs that factory_girl does:

module RandomHelpers
  def random_number() SecureRandom.random_number(10000); end
  def random_id() random_number; end
  def random_name() SecureRandom.hex(20); end

Next, a method that creates a test double representing the User who will be logged in:

module ControllerHelpers
  def user_double(attrs = {})
    user_attrs = {
      :first_name => random_name,
      :last_name => random_name,
      :authenticatable_salt => 'x'
    mock_model(User, user_attrs)

This is the place to fill the User double with default attribute values. Our app requires every User to have a first and last name, so I’ve stubbed these with random strings; they can be overridden by the caller if specific values are needed for any test. In your own app, change this method to set any mandatory attributes your user model needs. Note also the stubbed authenticatable_salt() method, which wasn’t required when mocking earlier versions of Devise.

I also need a method that logs a user in:

module ControllerHelpers
  def stub_sign_in_with(user)
    request.env['warden'] = double(Warden,
                                   :authenticate => user,
                                   :authenticate! => user,
                                   :authenticate? => true)
    return user

This replaces Warden with a test double, and Devise is none the wiser.
I also create a method to glue these together for the simplest (and most common) case:

module ControllerHelpers
  def stub_sign_in(attrs = {})
    stub_sign_in_with user_double(attrs)

(Note that this returns the User double; you’ll see why that’s useful in a moment.)

Finally, I configure the specs so they have access to these helper methods. To achieve that, spec/spec_helper.rb needs to look something like this:

Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}

RSpec.configure do |config|
  config.include RandomHelpers
  config.include Devise::TestHelpers, :type => :controller
  config.include ControllerHelpers, :type => :controller

Putting all this together allows me to write controller specs with:

before do

or, if I need access to the user double:

let(:user) { stub_sign_in }

And the payoff?

$ rspec spec/controllers
Finished in 7.36 seconds
114 examples, 0 failures

I saved as whole 15 seconds! There’s still some work to do in my specs, but stubbing out Warden has made a massive difference to every test run that involves this project’s controller microtests.

Note that the above has been tested only with the following gem versions:

  • rails 3.0.3
  • devise 1.3.4
  • warden 1.0.4
  • rspec-core 2.6.4
  • rspec-mocks 2.6.0
  • rspec-rails 2.6.1
  • factory_girl 1.3.3

A condensed gist for the above code is here.

microtests for exceptions

Over on the Dev@Pulse blog Jim raises some interesting points about how and when to microtest the messages in thrown exceptions. Jim’s discussion got me thinking about how I solve the same problem myself, and on reflection I discovered I use a slightly different tactic than Jim’s:

I write one test for each important piece of information that must be conveyed by the exception. For example, there might be a test that checks the exception’s message for the incorrect business value that caused the error; and maybe another that checks for the name of the throwing object. Etc etc.

Here’s a simple example, based on some of Reek‘s microtests:

context 'when the parser fails' do
  before :each do
    @parser_error = 'Error message'
    parser = mock('parser')
    @source_name = 'source/file/path.rb'
    @src = SourceCode.new('def unused() end', @source_name, parser)
  it 'raises a SyntaxError' do
    lambda {@src.syntax_tree}.should raise_exception(SyntaxError)
  it 'reports the source name' do
    lambda {@src.syntax_tree}.should raise_exception(SyntaxError, /#{@source_name}/)
  it 'reports the parser error message' do
    lambda {@src.syntax_tree}.should raise_exception(SyntaxError, /#{@parser_error}/)

This scheme has two advantages: First, there are fewer direct dependencies on incidental text in the error message, which means the microtests are less brittle, and hence less likely to be affected by simple changes of wording. And second, the tests clearly indicate which parts of the error are “important” – so if one of them later fails I get a reminder that I’ve lost information, and thus a gentle prod to re-consider the most recent code change.

Everyone writes code that throws exceptions, so I’m sure these aren’t the only tactics for microtesting them; what do you do?