Where do conditionals come from?

Given that we want to reduce the number of conditional branches in our code, I wonder whether it is possible to catalogue all of the reasons they exist, so that we might then in turn list a few ways to eliminate each category.

So to that end I’ve listed out those origins of conditionals that I could think of. Is this a fool’s errand? Possibly, but let’s give it a try anyway. Please comment if you can spot any that I have missed, and let’s see if we can grow a “complete” list…

  1. Checking a value returned to me from code I own
    This is the essence of the Null Check smell, and more generally Connascence of Meaning. For example, see this (intentionally bad) code from William C. Wake’s Refactoring Workbook:

    public class Report {
      public static void report(Writer out, List<Machine> machines, Robot robot) throws IOException
      {
        //...
    
        out.write("Robot");
        if (robot.location() != null)
          out.write(" location=" + robot.location().name());
    
        if (robot.bin() != null)
          out.write(" bin=" + robot.bin());
    
        //...
      }
    }
    
  2. Checking a value returned to me from code I do not own
    For example, in Rails projects we often see code of this kind:

    if User.exists?(:user_id => current_user.id)
      # ...
    else
      # ...
    end
    
  3. Checking a parameter passed to me by code I own
    Whenever I write a method that accepts a Boolean parameter, I open myself up to the Control Couple smell. Here’s an example from Piotr Solnica’s blog:

    def say(sentence, loud = false)
      if loud
        puts sentence.upcase
      else
        puts sentence
      end
    end
    
  4. Checking a parameter passed to me by code I do not own
    Here’s an example from the jQuery documentation:

    <script>
    var xTriggered = 0;
    $( "#target" ).keypress(function( event ) {
      if ( event.which == 13 ) {
         event.preventDefault();
      }
      xTriggered++;
      var msg = "Handler for .keypress() called " + xTriggered + " time(s).";
      $.print( msg, "html" );
      $.print( event );
    });
    </script>
    
  5. Checking my own state or attributes
    Here’s an (intentionally bad) example from Martin Fowler’s Refactoring:

    class Employee {
      private int _type;
      static final int ENGINEER = 0;
      static final int SALESMAN = 1;
      static final int MANAGER = 2;
    
      Employee (int type) {
        _type = type;
      }
    
      int payAmount() {
        switch (_type) {
          case ENGINEER:
            return _monthlySalary;
          case SALESMAN:
            return _monthlySalary + _commission;
          case MANAGER:
            return _monthlySalary + _bonus;
          default:
            throw new RuntimeException("Incorrect Employee");
        }
      }
    }
    
  6. Checking a value I set previously in this method
    For example:

    public String format(List&lt;String&gt; items) {
      StringBuffer s = new StringBuffer();
      boolean first = true;
      for (String item : items) {
        if (first)
          first = false;
        else
          s.append(", ");
        s.append(item);
      }
      return s.toString();
    }
    
  7. Checking the type of an exception thrown by code I own
    Here’s an example from the PHP language documentation:

    <?php
    function inverse($x) {
        if (!$x) {
            throw new Exception('Division by zero.');
        }
        return 1/$x;
    }
    
    try {
        echo inverse(5) . "\n";
        echo inverse(0) . "\n";
    } catch (Exception $e) {
        echo 'Caught exception: ',  $e->getMessage(), "\n";
    }
    
    // Continue execution
    echo "Hello World\n";
    ?>
    
  8. Checking the type of an exception thrown by code I do not own
    We often find code such as this in Rails controllers:

    def delete
      schedule_id = params[:scheduleId]
      begin
        Schedules.delete(schedule_id)
      rescue ActiveRecord::RecordNotFound
        render :json => "record not found"
      rescue ActiveRecord::ActiveRecordError
        # handle other ActiveRecord errors
      rescue # StandardError
        # handle most other errors
      rescue Exception
        # handle everything else
      end
      render :json => "ok"
    end
    

Is this list anything like complete? Can it ever be?

I wonder if a useful next step might be to write down some general strategies for removing each of these kinds of conditonal…?

A problem with Primitive Obsession

By an odd coincidence, a number of the teams I work with have been looking at the Primitive Obsession code smell recently as part of the software habitability and craftsmanship training programmes I am currently running. As we have explored legacy code looking for examples of the smell, I’ve noticed a tendency to enthusiastically label every primitive in any method signature as smelly. This certainly demonstrates Primitive Obsession, but mostly on the part of the reader of the code, rather than in the code itself.

I so wish Fowler and Beck hadn’t used this particular name, because it seems to spread as much confusion as enlightenment. In my book I tried renaming it to Open Secret, but that was a very small voice lost in the noise. So hear my plea: please, please stop using the term Primitive Obsession — let’s talk about Connascence of Meaning instead.

I think it is important to recognise that not every primitive in a method signature represents (bad) Connascence of Meaning: Firstly, the problem is significantly weaker when the method is private (ie. it can only be called from inside the same object). That’s because connascence becomes stronger with distance. Inside a single class we expect a lot of connascence, because otherwise the class is probably not a cohesive unit of responsibility. It’s a fairly weak smell when two methods in a class both know how to interpret the meaning of a primitive.

And secondly, recall that this smell is mostly about the way domain concepts are represented and understood. If the primitive is passed straight through our code without being parsed in any way, again the smell is much weaker because we are not attempting to intuit any Meaning for ourselves. For example, if you’re working in a domain in which time is always given to you as a long, and you never do anything with it except store it or pass it on, then there is certainly Connascence of Meaning between the libraries you’re using, but I would argue that there’s probably none in your own code.

In general I’m finding that connascence is better than code smells as a tool for discussing coupling. And in particular, it seems to me that Primitive Obsession creates the wrong impression and makes the concept harder to learn.

On boolean externalities

This week @avdi wrote a really interesting blog post in which he expounds the woe induced by trying to debug a stack of methods that return booleans to each other. I couldn’t get Disqus to load on the post itself, so my response is below. Go ahead and read Avdi’s article now; take your time, and think about what you would change in his code; I’ll wait here until you’re ready.

A lot of the comments on That Twitter suggest that this can be “fixed” by returning things that are more complex than simple booleans, including switching to a different language to make that easier. I think most of them are missing the point. I think this is a representation problem. Let me explain.

The code as it stands uses several booleans — I counted around six, and I suspect there are more in the code branches we weren’t shown. In order to figure out whether the user is allowed to view an episode, the code effectively rummages through the user’s past, looking for a confluence of events that combine together to give the user that permission. And so when the code doesn’t do what was expected, poor Avdi then has to do the same, only using the debugger or logging statements. The problem therefore, it seems to me, is that the current state of the user is only represented indirectly.

It seems to me that this code breaks the second rule of Simple Design, in that the domain is not represented faithfully by these booleans. I would be willing to bet that not every one of the 2n possible combinations of values of these booleans is possible. Indeed, it seems likely that there are only a handful of states in the lifecycle of a user. So to use a group of booleans to represent these states is to misrepresent the structure of the domain: The representation permits many more possibilities than can actually occur, and leaves the job of figuring out the current state to a bunch of other methods scattered through a couple of different objects. Another way to express this might be to say that the code shows Connascence of Meaning (or Algorithm), because numerous methods need to know something about how the user’s state is represented.

So my solution would be to create an explicit state transition model for users, and then to represent that in the code using a state field on the user. It would then be possible to produce errors such as “User X cannot do Y because s/he is in state Z”. It also opens up the possibility of using the State pattern, so that the user would hold a reference to an object representing its current state. Such an object might know how and when the user arrived in this state, and have code that validates the user’s state changes. It might even provide the valid onward transitions as, say, Command objects.

So that’s my approach, and I’ve been finding it more and more useful recently to solve modelling problems such as this. The question is: is this applicable in Avdi’s case; does the code we haven’t seen support this design…?

Sinatra/Heroku microservices

I spent some of this weekend working on a side project. It began as a monolithic Sinatra app, then became two apps, and then finally the design settled on five microservices. At some point in this evolutionary journey I had a separate git repository for each microservice. But some of them wanted to share code, so I also had yet another git repository for each shared library, with the shared code included via git submodules.

This was a nightmare to work with, particularly as all of these git repositories were evolving simultaneously. So I wondered whether I could have a single git repository containing the code for all five microservices. But then how could I deploy them as separate Heroku apps? The answer lies in this Heroku article about managing multiple environments:

For simplicity, imagine we have a single git repository containing two Sinatra apps, stored in app_a.rb and app_b.rb. First, create a heroku app for each:

$ heroku create --remote app_a
...
$ heroku create --remote app_b
...

Now set an environment variable in each, telling it which of the two apps it is:

$ heroku config:set WHICH_APP=app_a --remote app_a
$ heroku config:set WHICH_APP=app_b --remote app_b

And finally in config.ru I configure Rack to start the app that matches the environment variable in each instance:

require 'rubygems'

service = ENV['WHICH_APP']

if ['app_a', 'app_b'].include?(service)
  require File.join(File.dirname(__FILE__), service
  run Sinatra::Application
else
  abort &quot;Unknown microservice '#{service}'&quot;
end

Now, when I push to Heroku:

$ git push app_a master
$ git push app_b master

each Heroku app will launch a different Sinatra app.

Cyclomatic method complexity is a scam

Thomas McCabe’s 1976 paper A Complexity Measure [pdf] suggests that we can “measure” the complexity of a program by counting the branch points. Later authors “show” that McCabe’s number correlates strongly with program size, and that it is therefore worthless. Still later authors [pdf] show that McCabe complexity is much lower (typically halved) for programs written in object-oriented languages. And although they don’t describe or assess the “quality” of their code samples, this latest “result” does suggest that complexity might be a useful tool in evaluating the habitability of a program.

(The ironic quote marks are there as a shorthand, to indicate that I question the science in these papers. In particular, no account was taken of design or coding style. In terms of habitability, though, the last study does offer a little chink of light.)

However, something irks me about the complexity measuring tools I’ve tried: they all work at the method level.

McCabe’s work describes program complexity. Not method (or function) complexity. His reasoning, if it is valid at all, is valid only for the entire call graph of a program. Such a thing is difficult to calculate in programs written using modern languages, due to late binding, polymorphism and/or dynamic typing, and may even be impossible without dynamic analysis. Indeed, a running program is likely to have different call graphs in different test scenarios, rendering the cyclomatic complexity of the entire program a moot point. So it is quite natural to re-cast the measure in terms of methods, purely in order to get something potentially useful.

But here’s the rub: how should we then roll those method complexities up so that we have an idea of the complexity of bigger things, such as classes or programs? I contend that we can’t, meaningfully.

Note that each method has at least one path through it, but two methods do not necessarily mean there are two code paths. Suppose I have this Ruby method, which has cyclomatic complexity 1:

def total_price
  base_price * 1.20
end

I may wish to extract a method to wrap the tax value:

def total_price
  base_price * tax_rate
end

def tax_rate
  1.20
end

Now I have two methods, each of which has “cyclomatic” complexity 1. The total complexity appears to have increased, and yet the program’s call graph is no more complex. Thus any tool that sums method complexities will penalise the ExtractMethod refactoring, because the tool is now using invalid graph theory.

True cyclomatic complexity is meaningful only (if at all) in the context of the call graph of the entire program. Anything that counts branch points for individual methods (and then adds 1) is not calculating “cyclomatic” complexity, and runs a high risk of penalising useful refactorings.

How to measure habitability

I have argued that in order to measure the habitability of a codebase, we should focus on attempting to measure how well the code conforms to the Once And Only Once rule. So what does that rule mean? It means that every important concept in the domain is represented (named) explicitly — not buried inside another concept (Once), and not spread out in the space between several other concepts (Only Once).

Once:

How can we write a tool that will detect when a concept is buried inside another? Well, let’s flip that question around: How can we tell when a concept contains another buried inside it? I’m not sure that pure size is a reliable guide, because design concepts have no standard size. Some will require only a few lines of code, while others will need dozens. But it is probably not unreasonable to assume that a file which is significantly larger than the others in a codebase is probably harbouring too much responsibility.

What about complexity? It is usually true that a class with many conditionals or many levels of indentation is asking to be simplified. And very often the best response to over-use of branching is to break out the Strategy pattern, which gives explicit names to two or more of the responsibilities hidden inside the original class.

Lopez et al, in Relevance of the Cyclomatic Complexity Threshold for the Java Programming Language [pdf], propose that the McCabe complexity of object-oriented software is lower (less than half) that of procedural software. While their study does not appear to take the degree of habitability into account, it seems suggestive to me that some measure of branching might be a reasonable reflection of whether a class is pregnant with unnamed concepts.

Efferent coupling [pdf] may also be indicative here. It seems intuitively correct that a class that has to import/require many others is probably doing too many things.

Only Once:

How could we write a tool that will detect when a concept is spread out in the space between two or more others? Well, again the presence of branches may be an indication of Connascence of Meaning (think of the Null Check and Special Case code smells). Primitive Obsession is also a symptom of Connascence of Meaning, but I suspect we could only detect that automatically in statically typed languages (without actually running the code).

Again, efferent coupling may be indicative. When two classes both import/require another, there may be some Connascence of Algorithm going on — or at least a missing abstraction wrapping the imported stuff.

So while I have no evidence, my instincts tell me that we may be able to get a decent hint at habitability by finding some workable combination of coupling and branching measures. Soon I will begin exploring these ideas with code samples in a variety of languages, to see whether I can work up a metric (or a system of metrics) that can predict whether I will like the code…

Choosing what to measure

Suppose we want metrics that help us write “better” code — what does “better” mean? I’m going to fall back on Extreme Normal Form, more commonly known as the four rules of Simple Design. These say that our priorities, in order, when choosing what to improve are that the code should:

  1. be correct and complete
  2. express the design clearly
  3. express every concept once and only once
  4. be small

Let’s agree to call code that meets these criteria “habitable code”, in that it is easier to understand and maintain than uninhabitable code. How can we design a metric that rewards the creation of habitable code?

Now it seems to me that most teams have processes in place for ensuring correctness and completeness. These might be requirements traceability procedures, suites of (manual or automated) tests, QA procedures, user testing, focus groups, etc etc. The first rule of simple design helps to ensure that we are only working on valid code, but doesn’t help us differentiate “better” code. Only the remaining three rules can help us with that. In a sense, the first rule can be taken as a “given” when our projects are set up with appropriate QA processes and procedures in place. I will therefore not consider test coverage when searching for a metric (indeed I believe that test coverage is an unhelpful measure in the context of code habitability).

Let’s move on to the second rule of Simple Design. Joe Rainsberger re-casts this rule as No bad names, because in the huge majority of current programming languages the names we give to variables, functions, classes etc are our primary (often only) means of expression in the code. I believe it is impossible to directly measure how well code expresses intent, or whether the names chosen are “good” in any aesthetic sense. But the beauty, coherence and honesty of the names chosen by the programmer is only one aspect of applying this rule. Implicit in the statement “No bad names” is the notion that the programmer has chosen to give names to the “correct” set of entities. That is, choosing the things to reify and name in the code is at least as important as, if not more important than, choosing the best name to give those things.

This is where rules 2 and 3 begin to trip over each other (and perhaps why some authors place them in the opposite order). Consider for example the Primitive Obsession code smell. This occurs when two different classes communicate using (messages containing parameters of) primitive types (string, integer, array etc). The primitive(s) used for the message parameter(s) represent some domain concept that must be known to both classes; but this code is smelly because both classes must also understand the particular representation chosen for that concept in this particular message. The two classes exhibit Connascence of Meaning, and the third rule of Simple Design is violated because the representation of the domain concept is understood in two different places in the code. But we could also consider this to be a violation of rule 2, because that domain concept has not been reified into a class and named — we have a missing name.

Based solely on this type of logic, I am therefore going to boldly assert that any metric that gives a good indication of the degree to which rule 3 is followed, will also be a good indication of how well rule 2 is followed (allowing for the fact that we cannot measure aesthetics). So as a proxy measure I claim we can focus solely on the Once And Only Once rule as a measure of the habitability of our code.

I don’t measure test coverage

I believe that test coverage is an unhelpful measure. I have two main reasons behind this belief: Firstly, test coverage often gives false positives and false negatives; and secondly, too many badly written tests will slow a project down (that is, higher coverage can mean lower habitability).

It is relatively easy to write tests that exercise code without actually checking it for correctness. So high test coverage does not necessarily correlate with correctness and completeness. Conversely, low automated test coverage does not imply that the software is untested, nor does imply the presence of defects.

It is also relatively easy to write tests that enshrine bad design. In codebases with high coupling and low cohesion, high test coverage probably means that those poor designs are difficult to improve without breaking tests. For example, in code that is procedural, incohesive and highly coupled, tests that simply “mock out” every one of an object’s dependencies (read: use a mock library to stub out the dependencies) will likely increase the cost of changing the design later; whereas tests that have been used to drive roles into the design will ultimately help to make the code easier to change — but only if the fundamental design structures in the code mirror those in the domain.

So it is my belief that good design trumps test coverage, and I therefore prefer to focus on design measures first. (More on those later…) Note that I am not saying that automated testing is bad, nor indeed that high automated test coverage is bad. I am saying that high test coverage is not a good measure of software habitability.

Not on the backlog: a response

Ron Jeffries recently wrote a great article in which he recommends how teams should dig their way out of thickets of technical debt:

We take the next feature that we are asked to build, and instead of detouring around all the weeds and bushes, we take the time to clear a path through some of them. Maybe we detour around others. We improve the code where we work, and ignore the code where we don’t have to work.

I wholeheartedly agree with this approach, and with the advice and analysis in Ron’s article. Ron says “Simples!” but I don’t agree. For most teams there are a few significant hurdles to overcome:

  • The team has to recognise the situation it is in, and that a new approach is required henceforward.
  • The team needs to discover the courage to take the time to invest in this change of approach.
  • The team needs to develop an understanding of what better code looks like.
  • The team needs to learn how to create better code out of bad code.

None of these things is trivial. All of them require the whole team to go on a journey, probably to learn new skills, and above all to have the courage to make the change.

Don’t put refactoring stories on the backlog. Instead, invest in learning what habitable code looks like, how to write habitable code, and how to improve the habitability of existing code. That’s what will make you go fast; that’s what will get you out of the thickets.

TDD: three easy mistakes

I run introductory training in test-driven development quite frequently these days. And each time I do, I find the same basic mistakes cropping up every time, even among teams who already claim to practice TDD. Here are the three mistakes I see most often:

1. Starting with edge cases (empty string, null parameter) or error cases:

Imagine you have to test-drive the design of an object that can count the number of occurrences of every word in a string. I will often see some or all of these tests written before any others:

public class WordCounterTest {

  @Test
  public void wordCounterCanBeCreated() {
    assertNotNull(new WordCounter());
  }

  @Test(expected=IllegalArgumentException.class)
  public void nullInputCausesExeption() {
    new WordCounter().count(null);
  }

  @Test
  public void emptyInputGivesEmptyOutput() {
    Map<String, Integer> actual = new WordCounter().count("");
    assertEquals(new HashMap<String, Integer>(), actual);
  }

}

These feel easy to write, and give a definite feeling of progress. But that is all they give: a feeling of progress. These tests really only prove to ourselves that we can write a test.

When written first like this, they don’t deliver any business value, nor do they get us closer to validating the Product Owner’s assumptions. When we finally get around to showing him our work and asking “Is this what you wanted?”, these tests turn out to be waste if he says “Actually, now I see it, I think I want something different”.

And if the Product Owner decides to continue, that is the time for us to advise him that we have some edge cases to consider. Very often it will turn out to be much easier to cope with those edge cases now, after the happy path is done. Some of them may now already be dealt with “for free”, as it were, simply by the natural shape of the algorithm we test drove. Others may be easy to implement by adding a Decorator or modifying the code. Conversely, if we had started with the edge cases, chances are we had to work around them while we built the actual business value — and that will have slowed us down even more.

So start with tests that represent business value:

@Test
public void singleWordIsCounted() {
  Map<String, Integer> expected = new HashMap<String, Integer>();
  expected.put("happy", 2);
  assertEquals(expected, new WordCounter().count("happy happy"));
}

This way you will get to ask the Product Owner that vital question sooner, and he will invest less before he knows whether he wants to proceed. And you will have a simpler job to do, both while developing the happy path, and afterwards when you come to add the edge cases.

2. Writing tests for invented requirements:

You may think that your solution will decompose into certain pieces that do certain things, and so you begin by testing one of those and building upwards from there.

For example, in the case of the word counter we may reason along the following lines: “We know we’ll need to split the string into words, so let’s write a test to prove we can do that, before we continue to solve the more difficult problem”. And so we write this as our first test:

@Test
public void countWords() {
  assertEquals(2, new WordCounter().countWords("happy monday"));
}

No-one asked us to write a method that counts the words, so yet again we’re wasting the Product Owner’s time. Equally bad, we’ve invented a new requirement on our object’s API, and locked it in place with a regression test. If this test breaks some time in the future, how will someone looking at this code in a few months’ time cope with that: A test is failing, but how does he know that it’s only a scaffolding test, and should have been deleted long ago?

So start at the outside, by writing tests for things that your client or user actually asked for.

3. Writing a dozen lines of code in order to get the next test to pass:

When the bar is red and the path to green is long, TDD beginners often soldier on, writing an entire algorithm just to get one test to pass. This is highly risky, and also highly stressful. It is also not TDD.

Suppose you have these tests:

@Test
public void singleWordIsCounted() {
  assertEquals("happy=1", new WordCounter().counts("happy"));
}

@Test
public void repeatedWordIsCounted() {
  assertEquals("happy=2", new WordCounter().counts("happy happy"counts));
}

And suppose you have been writing the simplest possible thing that works, so your code looks like this:

public class WordCounter {
  public String counts(String text) {
    if (text.contains(" "))
      return "happy=2";
    return "happy=1";
  }
}

Now imagine you picked this as the next test:

@Test
public void differentWords() {
  assertEquals("happy=1 monday=1", new WordCounter().counts("happy monday"));
}

This is a huge leap from the current algorithm, as any attempt to code it up will demonstrate. Why? Well, the code duplicates the tests at this point (“happy” occurs as a fixed string in several places), so we probably forgot the REFACTOR step! It is time to remove the duplication before proceeding; if you can’t see it, try writing a new test that is “closer” to the current code:

@Test
public void differentSingleWordIsCounted() {
  assertEquals("monday=1", new WordCounter().counts("monday"));
}

We can now make this simpler set of tests pass easily, effectively by removing the duplication between the code and the tests:

public class WordCounter {
  public String counts(String text) {
    String[] words = text.split(" ");
    return words[0] + "=" + words.length;
  }
}

After making this relatively simple change, we have now test-driven part of the algorithm with which we struggled earlier. At this point we can try the previous test again; and this time if it is still too hard, we may wish to ask whether our chosen result format is helping or hindering…

So if you notice that you need to write or change more than 3-4 lines of code in order to get to green, STOP! Revert back to green. Now either refactor your code in the light of what just happened, so as to make that test easier to pass, or pick a test closer to your current behaviour and use the new test to force you to do that refactoring.

The step from red bar to green bar should be fast. If it isn’t, you’re writing code that is unlikely to be 100% tested, and which is prone to errors. Choose tests so that the steps are small, and make sure to refactor ALL of the duplication away before writing the next test, so that you don’t have to code around it whilst at the same time trying to get to green.