Conditionals on the edge

As you know, I have a thing about conditional code. Most conditionals are duplicates, and the only “genuine” conditionals are at the system boundaries, where they test external state and input information. But I discovered recently that, even at the edges, not every conditional is necessary…

Here’s (a drastically simplified version of) some code I wrote a while back. (It’s from a Rails app, but the message in this post applies to any technology and any kind of system input.)

class Users::RegistrationsController < Devise::RegistrationsController
  def create
    user = create_new_user_account
    if params[:invitation_token]
      sign_in_as(user)
      redirect_to Invitation.find(params[:invitation_token]).shared_content
    else
      redirect_to new_user_session_path
    end
  end
end

This route handles incoming user registrations, and it has a conditional branch to cater for the case in which the new user was invited, via email, by an existing user. If the new user arrived by clicking a link in an email we do one set of things; whereas we do a different set of things if the user got here via the ‘Register now’ button on the home page. Perfectly reasonable, I thought. Sure, it’s conditional code; but it’s sitting on the system boundary and therefore it’s fine, yes? No.

This conditional check represents duplication. Somewhere else in my application already knew, at some time in the recent past, which route should be taken through this method when it was finally invoked. When the app sent that email invitation, it was in a sense delegating to the recipient the task of supplying user account details. The conditional branch that handles the invitation is a callback from across a technology boundary, and the inviting code already knew to expect it.

The duplication also has a nasty side-effect: high complexity. This method is an order of magnitude harder to understand and to test(-drive) because of the conditional construct. (And you’re reading a version from which I’ve elided all manner of error handling.) Oh, and by forcing two routes into the same method I also made it violate the Single Responsibility Principle.

Happily the duplication is really easy to remove. The code that sends the email invitation knows which path it wants the registration code to take, so I just need to create a new input route for “invitation acceptance” and change the email invitation to link to it. As a result I’ll have two controller methods, each with simpler logic than their predecessor:

class Users::RegistrationsController < Devise::RegistrationsController
  def create
    user = create_new_user_account
    redirect_to new_user_session_path
  end
end

 

class InvitationAcceptancesController < Devise::RegistrationsController
  def create
    user = create_new_user_account
    sign_in_as(user)
    redirect_to Invitation.find(params[:invitation_token]).shared_content
    end
  end
end

So I have now removed the duplication, and the controller code is simpler as a result. These controller methods are also much easier to test(-drive) than the earlier code. And as a final bonus, I can design the user interface to this route with more knowledge, maybe adding some details of the invitation so that the incoming potential user feels a little more at home. In short, now that each code path has its own method (and class) I can pay it more attention, and it will flourish as a result.

Now, having refactored this code, I take away a few points for the future:

  • Where appropriate and possible, model system outputs as delegations into a space that your code can’t reach or control; and model the resulting future input as a response.
  • Where a system input is due to the system having earlier delegated a responsibility across its boundary, make sure the return route is differentiated from all others.
  • Differentiated incoming routes have code that is easier to read and simpler to test.
  • When an input route has a single use, the user experience can be tailored and enriched.
  • Not every conditional at the system boundary tests external state.

the wrong duplication

Here’s something I’ve seen many times in code right across the spectrum, and which I caught myself doing just the other day. It seems to occur particularly often in codebases that weren’t TDD’d, and which haven’t been kept lean with regular refactoring…

Imagine a codebase in which one particular text string occurs frequently. The string may be some kind of message header, for example, or an XML tag (both of these examples are usually accompanied by another repeated string, representing the message footer or the closing tag). As the codebase grows, you notice the string popping up more and more frequently, until the “duplication” bell goes off. What to do?

It turns out we’ve all been taught the answer: Create a manifest constant (a #define or a static final String or whatever, depending on the language), thereby giving the common string a name, and replace all instances of the string by use of the constant. There’s even a name for this refactoring in Martin Fowler’s catalogue: Replace Magic Number With Symbolic Constant. Duplication sorted, right? Wrong!
Continue reading

duplicated expressions again

A couple of weeks ago I wrote about my search for tools for detection of duplicated expressions. In the interim, I’m pleased to report that Neil at Integility has fixed the Simian plugin for Eclipse. Thanks Neil!

Also Hugh Sasse wrote and pointed me at the ruby quiz he submitted to solve a similar problem: to write a Ruby program that will analyse a text for duplicate strings and then create another Ruby program that will output that original text. Not quite what I was looking for, but some of the solutions to Hugh’s challenge are neat. Thanks for the link, Hugh.

I think, though, that I’m looking more for something that understands (ie. parses) the source code, so that it can make reasonable decisions about what constitutes an expression and what constitutes duplication. And wouldn’t it be great if it actually performed ExtractMethod on the duplicates it found…!

keep the tests simple and readable

Last week I posted about using tests as a failsafe. Silly me, I should have known just how many people would comment out the failing tests! So what makes it easier to say “oh, let’s just comment them out so we can get on with our real work”, instead of fixing the tests? I’ve seen it happen many times, and I’ve done it myself – but why?

I think time is the driving force: if we try to fix the tests we’ll miss our deadline. Why? In every case I’ve seen, those failing tests were also refactoring no-go areas. The test code is in such a mess that either it would take an age to fix, or there is no-one left on the project who understands it. So it becomes much easier (and therefore quicker) to comment them out than it would be to fix them.

Why? How did the tests get into that state? In general, it seems to me that there are two main reasons: either the team didn’t know they had to keep the tests adaptable and maintainable; or the “unit” tests actually exercise chunks that are way too big.

Why? What causes the tests to be too chunky? I think in most cases the tests were written after the production code, when its too late to make the design decisions that lead to de-coupling and testability. In fact, whenever I see big chunky unit tests, or tests that are not malleable, I view them as a signpost to smelly production code: too much coupling, or leaked responsibilities, or inappropriate abstractions.

So it pays to work hard on the “ilities” of the test code. Keep the tests simple to setup; keep them independent from each other; remove duplication from them; and have each test check a very few things. (Oh, and don’t over-use mocks … but that’s for another day.) Refactor them just as you would refactor production code, and just as frequently. Because then they are less likely to break unexpectedly in response to application changes that are apparently unrelated. And then maybe you’ll be able to use them as a quality control…

detection of duplicated expressions

This week I’ve been looking at tools for detecting code duplication – specifically Simian and CPD. I didn’t try CPD in the end, because although it supports Ruby, CPD is bundled with PMD, which doesn’t. I did try Simian, and was impressed by the results. (I’m sure I would have been even more impressed if the alleged Eclipse plugin was still available.)

But I’m still searching for the tool I really wanted: something that can detect duplicated expressions. For example, here’s part of a class from William Wake’s Refactoring Workbook (ch14):

hexagonal soup

So if we can completely remove all duplicate conditionals, what will our code be like? Well the only remaining conditionals (if statements, switch statements etc) will be those that test something that’s set or defined outside of the system, ie. something that’s part of the runtime environment. The two or more code branches selected by such a conditional will tend to consist entirely of the creation/execution of different sets of objects. Each of these sets of objects must by definition ‘know’ what event or state occurred, probably because it was passed parameters that embody the environment’s state or prior decisions. And so within those invoked object sets there need be no conditional code at all.

This situation now dovetails neatly with Alistair Cockburn’s hexagonal architecture pattern…

The conditional statements themselves, because they form part of the response of the system to it’s environment, should be in the outer hexagon, in what Alistair calls the ‘transformers’ (and what I tend to call ‘adapters’). And if there really is no duplication of conditional statements, the inner hexagon (ie. the application and domain objects in Alistair’s model) will contain no conditional logic whatever! The inner hexagon is purely a soup of objects waiting to be invoked and parameterised in response to external stimuli.

So here’s a very simple rule you can apply to any system: Look at every conditional statement and check that it’s sitting in one of the adaptors connecting to the environment. If it isn’t it must be duplicating some knowledge that the system already holds elsewhere…

if…

A couple of years ago I was working on a project in which a significant portion (say 30%) of the codebase was legacy code. By which I mean that it was written by a chap who had just left the team. He was a ‘clever’ programmer – his code did the job, and the rest of us were afraid to touch it. The coding style (this was a Java project) was intensely algorithmic. It involved a small number of huge classes, each method of which took parameters to configure their behaviour. No-one could touch it because the inter-relationships between the (large) pieces were too opaque, and none of us was brave enough to even begin the process of refactoring towards a more ‘object-oriented’ solution. (One thing that would now ring alarm bells with me: the code failed Mike Feathers’ test, in that it was well-nigh impossible to write unit tests for any piece of the code. In fact, most of the classes couldn’t be instantiated outside of this sub-system’s highly complex nutrient soup.)
Continue reading

avoid boolean parameters

You are tempted to create a method with a boolean argument. The caller will set the argument to true or false depending on which code path he wishes the called method to take. This is an extremely convenient way to add a small degree of configurability to an algorithm, and means that both paths through the called method can share setup and cleanup code.

class MyClass

    void dosomething(boolean followPathA) {
        // setup
        if (followPathA)
            // pathA
        else
            // pathB
        // cleanup
    }

However, the use of a boolean argument reduces readability. For example, in a method such as
workspace.createFile(name, true), what does “false” mean? It is much more intention-revealing to call a method such as workspace.createTempFile(name). Furthermore, boolean arguments always create duplication, either of code or of responsibility, and often of a very insidious nature. This is because the called method contains a runtime test to decide between two different code paths, despite thet fact that the calling method already knows which branch it requires! The duplication not only wastes a little runtime, it also spreads knowledge and responsibility further than is necessary. The resulting code is more closely coupled than the design it represents. These effects are magnified exponentially each time the boolean argument is passed on in a further method call.

Therefore avoid the need for a boolean argument by creating two versions of the method, in which the meaning of the boolean is reflected in the names of the new methods. For example, replace

    File createFile(String name, boolean istemp)

by

    File createFile(String name)
    File createTempFile(String name)

If the resulting new methods have similar logic, extract the duplicate code into shared methods that can be called by each.

(Here’s a fun story on a related theme.)