Connascence of Meaning

In the previous article I wrote a test, made it pass, and then refactored away the strongest coupling. That coupling took the form of some Connascence of Value between the test and the Checkout. Later, after the excitement of publishing the post had died away, I realised there was still some non-trivial connascence in the code. Today it’s time to fix that.

Here is the test I finished with last time:

public class CheckoutTests {

  @Test
  public void basicPrices() {
    int price = randomPrice();
    Checkout checkout = new Checkout();
    checkout.scan("A", price);
    assertEquals(price, checkout.currentBalance());
  }
}

First, let’s review the connascence in this code. (And please let me know if I have missed any this time too!)

  1. Connascence of Name: The test knows the names of the methods to call on the checkout object. This is level 1 (of 9) on the connascence scale — the weakest and least damaging form of coupling.
  2. Connascence of Type: The test knows which class to instantiate, and it knows the types of the various method parameters. This is level 2 on the scale, and is thus also relatively benign.
  3. Connascence of Meaning: Both classes know that we are representing monetary values using ints, and products using strings. This is level 3 on the scale; while still relatively harmless, it is worth removing before the code gets too much bigger.

I think it is worth taking a moment to examine why I feel that this is Connascence of Meaning and not purely Connascence of Type. Yes, the test and the Checkout have to agree on the types of the parameters and return values, but I think there’s more: both also have to agree on what those values mean. Does that int represent pence, pounds, euros? Could that product string in future become a guid, or a barcode? The test and the Checkout are coupled by their shared knowledge of how those domain concepts (monetary values and product codes) are represented and interpreted. If I let that knowledge proliferate, it could become difficult to change either of them. I may incur subtle bugs if either is interpreted or represented in a different way somewhere in the application in future. This is the Ariane 5 bug in waiting.

(I clearly remember the pain of encountering a large .Net financial application in which monetary values were represented as ints, decimals or floats in various areas of the code. The compiler “helped” by casting between these representations, so the inevitable bugs were only discoverable at runtime.)

So although I have “only” Connascence of Meaning, I want to nip it in the bud early. The cost of doing that is low right now, and the cost of not doing it could escalate quickly later. So, on with the motley…

I’ll deal with the money first.

cov3

As before, I can weaken the connascence by bringing the “ends” of the coupling together in one place. But this time I cannot easily inject a value into the Checkout, because the problem here is one of types. So instead, I create a new type and hide the int inside it. Money will be the only class that knows how monetary values are stored:

cov4

The change to the test is simple:

public class CheckoutTests {

  @Test
  public void basicPrices() {
    Money price = randomPrice();
    Checkout checkout = new Checkout();
    checkout.scan("A", price);
    assertEquals(price, checkout.currentBalance());
  }

  private Money randomPrice() {
    int pence = new Random().nextInt(1000);
    return Money.fromPence(pence);
  }
}

Similarly the checkout now uses Money objects instead of ints:

public class Checkout {
  private Money balance;

  public Checkout scan(String sku, Money price) {
    balance = price;
  }

  public Money currentBalance() {
    return balance;
  }
}

And finally, here is the new Money class itself:

public class Money {
  private int pence;

  private Money(int pence) {
    this.pence = pence;
  }

  public static Money fromPence(int pence) {
    return new Money(pence);
  }
}

Note that I clearly document the meaning of the parameter to Money’s factory method, while the actual constructor is hidden from view. This gives me more control over how Money objects are created, and helps keep the internal representation private.

Next, I will move on to do something very similar with product codes. But before I do that I just want to expand on (ie. recycle) the current test so that it can cater for multiple items:

@Test
public void basicPrices() {
  Money priceOfA = randomPrice();
  Checkout checkout = new Checkout();
  Money priceOfB = randomPrice();
  checkout.scan("A", priceOfA).scan("B", priceOfB);
  assertEquals(priceOfA.add(priceOfB), checkout.currentBalance());
}

This simple change forces Money to acquire a wee bit more richness:

public class Money {
  public static final Money ZERO = new Money(0);
  private int pence;

  private Money(int pence) {
    this.pence = pence;
  }

  public static Money fromPence(int pence) {
    return new Money(pence);
  }

  public Money add(Money other) {
    return new Money(pence + other.pence);
  }

  @Override
  public boolean equals(Object other) {
    Money m = (Money) other;
    return pence == m.pence;
  }

  @Override
  public int hashCode() {
    return new Integer(pence).hashCode();
  }
}

I like this. By fixing some Connascence of Meaning I have “discovered” a domain concept, and it has quickly fleshed out with a perfectly sensible set of behaviours. This would be perfect if only Java allowed operator overloading…

Now I take a look at those strings that represent the products. Arguably the Connascence of Meaning here is less severe, because as yet the Checkout isn’t using the product code passed to it. I could leave this one until I have tests that force Checkout to check that string. I decide to do just that, because I can’t predict when that will be, or how the code will turn out in that uncertain future.

So, to summarise, I fixed Connascence of Meaning by hiding the choice of data representation for monetary values. If I later decide to switch to pounds with decimal places representing the pence, I only have to change the Money class. The rest of the application is isolated from such change.

This feels like a lot of work for just one test. Was it justified? Well, first I note that it is two tests, because I recycled one. Secondly, I have named a domain concept and the very simple test I have has already caused it to flesh out with some behaviours. I am happy with this code, and I am comfortable that it satisfies Extreme Normal Form. I also expect that some people will disagree…

Connascence of Value

Connascence is a way of describing the coupling between different parts of a codebase. And because it classifies the relative strength of that coupling, connascence can be used as a tool to help prioritise what should be refactored first.

For example, let’s tackle @pragdave‘s  classic Back to the Checkout kata in Java. My first test checks that we can scan a single item and calculate the total correctly:

public class CheckoutTests {

  @Test
  public void basicPrices() {
    Checkout checkout = new Checkout();
    checkout.scan("A");
    assertEquals(50, checkout.currentBalance());
  }
}

Now I make it pass in the simplest way I can think of:

class Checkout {
  public int currentBalance() {
    return 50;
  }

  public Checkout scan(String item) { }
}

Clearly these two classes are now coupled (if they weren’t, the test wouldn’t pass). But is that coupling good or bad?

I can see three four kinds of connascence between the test and the production code:

  1. Connascence of Name, because the test knows the names of the methods to call on the checkout object. This is level 1 (of 9) on the connascence scale — the weakest and least damaging form of coupling.
  2. Connascence of Type, because the test knows which class to instantiate. This is level 2 on the scale, and is thus also relatively benign.
  3. Connascence of Meaning, because both classes know that we are representing monetary values using ints. (I missed this first time around — d’oh!)
  4. Connascence of Value, because both the test and the Checkout know the price of item “A”:

cov1

The Connascence of Value here means that the tests will break if I change the price of item “A”; I definitely wouldn’t want to release this into production.

Connascence of Value is level 8 on the scale of 9 types of connascence. The scale defines seven weaker forms of coupling, and only one more serious kind. I can use that model to help me prioritise the Refactor step in my TDD cycle: Connascence of Value is a serious problem, and should be fixed before I do anything else. The only question is: how?

The first thing I note is that connascence is weaker with proximity, which means that either of the following options would be preferable:

cov2

Thus, if I can move knowledge of the price of “A” so that only one of my classes has it, then the effects of the coupling are greatly diminished.

I can get some help from SOLID here, because the Dependency Inversion Principle also tells me that this code has a problem. The DIP says that we should depend on abstractions, not on details. And yet here I have a test that only works due to its knowledge of one of the details inside the production code.

The DIP (and @jbrains) also tells me what to do next: I should move the detail up towards the tests. That means I need to change the Checkout so that the test injects the value 50 via a parameter. I could pass it in via the scan method:

public class CheckoutTests {

  @Test
  public void basicPrices() {
    Checkout checkout = new Checkout();
    checkout.scan("A", 50);
    assertEquals(50, checkout.currentBalance());
  }
}

Alternatively I could inject it via  the Checkout’s constructor:

public class CheckoutTests {

  @Test
  public void basicPrices() {
    Checkout checkout = new Checkout(50);
    checkout.scan("A");
    assertEquals(50, checkout.currentBalance());
  }
}

Either way, I have now removed the Connascence of Value between the Checkout and the test: I can change the price of item “A” by changing only one method.

The worst of the coupling is now gone, but I can do better. There is still Connascence of Value, albeit very localized, within that test method. Is it worth fixing?

I like my tests to be expressive and easy to read. I wouldn’t want to extract the value 50 to a constant, for example, because I would then have to scan up and down through the test class to discover exactly what the test was doing. But equally, that magic value 50 makes me a little nervous. Does it have business significance? Not in this case, and a new team member might not pick that up.

In cases such as this I like my tests to use random values, to help ensure that the code under test hasn’t made any unfortunate assumptions. So I replace that 50 with a call to a random price generator:

public class CheckoutTests {
  @Test
  public void basicPrices() {
    Checkout checkout = new Checkout();
    checkout.scan("A", randomPrice());
    assertEquals(randomPrice(), checkout.balance());
  }
}

But now the test is broken again, and that Connascence of Value is the guilty party, telling us that the two values need to be the same. I fix it by replacing the Connascence of Value by Connascence of Name:

public class CheckoutTests {
  @Test
  public void basicPrices() {
    Checkout checkout = new Checkout();
    int priceOfA = randomPrice();
    checkout.scan("A", priceOfA);
    assertEquals(priceOfA, checkout.balance());
  }
}

To summarise, I find connascence useful in guiding my refactoring efforts during the TDD cycle. In this case, I weakened the coupling between the code and test by pushing details up the call stack; then I removed the Connascence of Value altogether by replacing it with Connascence of Name.

In the next post I tackle more of the connascence in this code.

Date arithmetic

This morning I got up at eight minutes past six. So what, you ask? Well, that means I got out of bed at 06:08 10/12/14*, which is a very nice arithmetic progression. That is, today’s date is a series of numbers with a constant difference (in this case, the constant difference is 2).

Question: Which dates (and times, if you wish) next year will form arithmetic progressions? And which, if any, will form a geometric progression (in which each term after the first is found by multiplying its predecessor by a fixed constant)?

*Unless you live in the US — in which case, pretend today is October 12th.

TDD for teams

I strongly suspect that TDD for teams is different than TDD for individuals.

There’s a proverb in software development to the effect that:

“TDD is a design technique, not a testing technique”

I agree. But that doesn’t mean it’s the only design technique we need. And it doesn’t also mean that everyone will use it equivalently or get the same results with it. For example, take a look at the approaches used by Seb Rose, Ron Jeffries and Alistair Cockburn to solving the Letter Diamond kata. (Click on their names to read their blog posts, then hop back here if you still have any energy left.) They each tackled the same set of requirements in completely different ways. Each used TDD, and yet their designs were completely different.

In fact, while I was drafting this post, George Dinwiddie had a go too, and Ron made a second attempt. So now we have 5 different design approaches to one toy kata from 4 developers, all in the space of one weekend. (And there are probably more that I haven’t heard about.) Of course, it would have been weird if they had all produced identical designs. We are all different, and that’s definitely a good thing. But I worry that this can cause problems for teams doing TDD.

A couple of years ago I remember doing a performance kata in which I paired with Mark Kirschstein to tackle the supermarket checkout kata. My role was to write the tests and Mark’s was to make them pass. At the beginning of the session I made the bold claim that I could get Mark to implement the checkout in a way he had never seen before, with a design that he had not conceived. The audience, and Mark, were skeptical. They were used to thinking of the problem as framed by the tests in Dave Thomas’ original problem statement. And so they were expecting the first test to be something like this:

public class CheckoutTests {
  @Test
  public void oneA() {
    Checkout checkout = new Checkout();
    checkout.scan("A");
    assertEquals(30, checkout.total());
  }
}

But in fact my first test was this:

public class CheckoutTests2 implements ScannerListener {
  int priceReported = 0;
  String productReported = null;

  @Test
  public void oneA() {
    Scanner scanner = new Scanner(this);
    scanner.scan(new SKU("A"));
    assertEquals(30, priceReported);
    assertEquals("A", productReported);
  }

  public void itemScanned(String product, int price) {
    productReported = product;
    priceReported = price;
  }
}

(expressed using the SelfShunt pattern, as anyone who has attended any of my training courses will recognise immediately). Mark, to his surprise, was gradually led to creating a checkout implementation based on notifications and listeners, and with no getters or setters.

[Supplementary challenge: implement the supermarket checkout kata without conditionals!]

While this was (I hope) great theatre, there’s a deeper message in all of this when it comes to whole teams working on non-trivial problems: If a team is to produce software that exhibits internal consistency and is understandable by all of its developers, then somehow these individual differences must be surrendered or homogenized in some way. Somehow the team must create designs — and a design style — that everyone agrees on.

Does it make sense for teams to operate as isolated pairs, each programming away on their specific tasks, without regard for how their designs will integrate with those of their team-mates? I see too many teams doing just that; ditching design sessions on the basis of reading TDD books and blogs in which a single person or pair did all of the thinking. I see far too many codebases in which personal style is the major design force; where the same domain concept is implemented in two or more radically different ways; where duplication is thus very hard to spot, and even harder to remove.

Perhaps we need more published examples of team-based TDD, showing techniques for creating and sharing Just Enough Design Up FrontTM.

XP includes the key practices of Coding Standard and System Metaphor; are they enough to solve the problem? How can pairs Refactor Mercilessly if there is no team consensus as to what constitutes “good” and “consistent”?

What does your team do to balance the needs of “enough design” and “too much design up front”?

Happy numbers again: spoilers

If you’ve tried the Happy numbers kata you may have noticed a couple of things about the algorithm. Firstly,

13 => 32 + 12 => 10 => 1

is the same as

31 => 12 + 32 => 10 => 1.

That is, happiness and unhappiness are preserved under all permutations of the digits. So if you have calculated the status of 134, you also know the status of 143, 314, 341, 413 and 431!

And secondly, zeroes have no effect on the outcome. So 103 is just as happy as 13; and similarly 1300, 1000300000 and 30010 are all happy.

In a sense, that means it is not meaningful to ask how many numbers can be examined in 5 seconds, say — because the answer is always infinite! So let’s redefine the kata:

Write a program to report the number of happy numbers in the range [1, n]. What is the largest n you can deal with in 5 seconds?

Some interesting questions now come to mind:

  • Where does the time go? Is it faster to calculate the happiness of a number than it is to create all permutations of its digits?
  • What is the best data structure for holding the results? Can a recursive or functional algorithm compete with a procedural approach?
  • What is the best way to deal with numbers containing zeroes?
  • Are the answers to these questions dependent on programming language?
  • Is this really an algorithm on the integers, or is it an algorithm on lists of digits?

Finally, something quite bizarre. The observations above regarding zeroes and digit permutations mean that we can acquire the answer to the challenge while examining fewer integers. How many fewer?

My rough calculations suggest that we only need to examine 54 of the integers below 100 in order to have complete information about every number in the range 1-100. Thats 54%. But we only need to examine 229 of the first 1000, which is 22.9%. To cover 1-10000 we only need to examine 714 numbers, and for 1-100000 we only need to examine 2002 numbers — ie. 2% of them! It would appear that this algorithm gets significantly faster as we move to larger and larger data sets. How much difference does this make to what your implementation can achieve?

The happy numbers kata

This weekend my 9 year old son was given the following homework:

Choose a two-digit number (eg. 23), square each digit and add them together. Keep repeating this until you reach 1 or the cycle carries on in a continuous loop. If you reach 1 then the number you started with is a “happy number”.

Can you find all the happy numbers between 1 and 100?

For example, 31 is happy because

31 => 32 + 12 => 10 => 12 + 02 => 1

But 4 is not happy because

4 => 16 => 37 => 58 => 89 => 145 => 42 => 20 => 4

and we have a cycle.

I thought this might make an interesting code kata, so while he beavered away with pencil and paper I set to writing a little program to do the job. Why not give it a try yourself?

  • What tests, if any, did you write? How many times did your stack overflow? What data structure(s) do you need?
  • Did you use recursion? Or loops? Or filters? Or something else?
  • Did you manage to eliminate all of the duplication? Can you eliminate all of the conditionals? Is any of this altered by your choice of programming language?
  • How long does it take to run? At what range size (eg. 1-100, 1-1000, 1-10000) does it take more than, say, 5 seconds to run? Does it run in linear time? Polynomial time? What mathematical tricks can you find to speed it up?
  • What will you do differently next time?

First time through, I wrote no tests. My stack overflowed three times. I used a dictionary to hold the results, with a simple array for cycle detection and a recursive algorithm. I made no attempt to eliminate either duplication or conditionals, and I used none of the arithmetic tricks that my son found while doing the exercise by hand.

Next time, however, will be different…

Supplementary (for the mathematicians among you):

Are there any numbers for which the series neither terminates at 1 nor eventually repeats forever?

On paperboys, newsagents and exceptions

A few days ago I asked whether an exception should be thrown when the customer cannot pay the paper-boy. The responses (thank you!) ranged widely across the spectrum between Yes and various shades of No, and are well worth taking a few minutes to read before we go any further. What follows is my take on the code…

TL;DR — replace the exception by giving the paper-boy a coping strategy

Let’s think about the real-world (domain) model for a minute:

Firstly, I note that I have been that customer on many occasions. I’m frequently not at home when the paper-boy or the Avon lady or the window-cleaner comes around (honest!). I would expect that non-payment is statistically likely to occur on pretty much every collection round.

Now, presumably the paper-boy is on his round, calling on the newsagent’s customers. He may be only collecting money, or he may also be delivering papers. As he goes from house to house everything runs smoothly, in that every customer so far has paid their bill; he has a bulging bag of money, and has marked ‘Paid’ against every customer on the first page of his little Accounts book. But the next customer doesn’t have the funds to pay! The paper-boy hasn’t been told what to do, and wasn’t expecting this eventuality at all. He panics and runs screaming back to the newsagent.

The paper-boy runs into the arms of the newsagent, who sees that someone couldn’t pay. Thankfully the boy still has the money he has collected thus far. But he also has hold of the last customer’s wallet. And in his panic he is so incoherent that he can’t say which customer couldn’t pay [note that the exception has no constructor parameters]. At this point we have to make some assumptions about where the newsagent might be standing:

If he is back at the shop, then the paper-boy has run so far that his round must now be assumed to be abandoned; the newspapers (and the bills) of any remaining customers will need to be dealt with on another round. Alternatively, the newsagent might be accompanying the boy on his round, telling him which house to visit next; in this case he can simply calm the boy down, deal with the matter himself, and the pair can then proceed with the round.

Neither of these alternatives seems entirely satisfactory. In the first case, it seems unreasonable that neither the boy nor the newsagent would learn how to cope with non-payment (which must surely happen regularly). Even if the boy wasn’t told about the possibility before his first ever round, he wouldn’t continue behaving that way forever. And no sensible newsagent would be happy to have to re-start the boy’s rounds after each non-payment.

But in the second case, why have a dog and bark yourself? Why is the newsagent accompanying the boy, unless this is a training exercise? Again, not a scenario that is likely to be oft repeated.

I assume therefore that, in the real world, the boy is sent on the round because the newsagent wants to stay and mind the store. The newsagent wants to boy to collect his customers’ dues (and perhaps deliver papers too), without the newsagent having to break off his work (and potentially finish the round himself, or send another boy to do that later). It’s likely therefore that the boy will have been given instructions such as “If anyone isn’t at home, or can’t pay, just make a note in the book. We’ll see them next week, or send an invoice, but don’t you worry about that for now.” That is, the newsagent gives the boy a coping strategy. The boy’s responsibility is to mark ‘Paid’ or ‘Not paid’ against each customer, and to collect money where possible. The newsagent will reconcile these notes against his accounts later in the day and decide what to do, both with the money and with the non-payers.

So, back to the code. Throwing this exception forces some other part of our code (presumably the Newsagent) to cope with it; the catch clause will be an example of a “type 7″ conditional in my little catalogue. I don’t believe it models the domain at all faithfully. Alternatively, we could avoid the exception if the paper-boy returned a status code. But that seems to me to be pretty much the same as the scenario in which the newsagent accompanies the boy on his round (and it leaves the Newsagent having to implement a “type 1″ conditional too).

Instead, I agree with Paul D’Ambra: I think we should give the Paperboy a coping strategy: Equip him with code to call when the customer cannot pay. And in the interests of symmetry we might go further, having the boy update the accounts directly as he goes.

There are numerous ways to express this in code. For example, we could implement this using the Observer pattern, either with a directly coupled listener:

class Paperboy {
  private int collected_amount;
  private Newsagent newsagent;

  public void collect_money(Customer customer, int due_amount) {
    int amount_received = customer.payWhatYouCanAfford(due_amount);
    collected_amount += amount_received;
    newsagent.customerPaid(amount_received, customer);
  }
}

Or with a decoupled event notification mechanism:

class Paperboy
  def collect_money(customer, due_amount)
    amount_received = customer.pay_what_you_can_afford(due_amount)
    @collected_amount += amount_received
    EventBus.announce :customerPaid, amount: amount_received, customer: customer
  end
end

Another conditional avoided. (And a domain more faithfully modelled, in my opinion.)

Designing an error case

The other day I stumbled upon a seven year old blog post by @dan_menges called Misunderstanding the Law of Demeter. It’s a great post, and it includes a nice code sample for the classic “paperboy” example of Demeter violation:

class Wallet
  attr_accessor :cash
end

class Customer
  has_one :wallet
end

class Paperboy
  def collect_money(customer, due_amount)
    if customer.wallet.cash < due_ammount
      raise InsufficientFundsError
    else
      customer.wallet.cash -= due_amount
      @collected_amount += due_amount
    end
  end
end

I love examples like this, and I like Dan’s improved design later in the post (go read that now). The reason I like this example is because it contains more meat, and therefore more context, than the standard “isn’t customer.getWallet().getCash(amount) terrible!” examples we often see (I’m as guilty of that as anyone). And in that extra context lies an interesting design question…

First though, I’m going to translate Dan’s code from Ruby into Java, so that we can see past some of the syntactic sugar to what’s really happening:

class Wallet {
  public int cash;
}

class Customer {
  public Wallet wallet;
}

class Paperboy {
  private int collected_amount;

  public void collect_money(Customer customer, int due_amount) {
    if (customer.wallet.cash < due_amount)
      throw new InsufficientFundsError();
    customer.wallet.cash -= due_amount;
    collected_amount += due_amount;
  }
}

(This version isn’t identical to the Ruby, which has getter and setter methods for the cash field in the Wallet, but it’s close enough for our purposes.)

Anyway, back to the point. What interests me most about Dan’s code is the InsufficientFundsException. As soon as I saw that exception being thrown, I stopped to think about this code for a good few minutes. And so I have a question for you:

Would you throw that exception there?
If you would, why?
And if you wouldn’t, why not?

Feel free to answer in the comments here, or write your own post and link to it from the comments. I’ll post my thoughts in a few days.

Where is the database?

I have just watched an interesting conversation between Martin Fowler and Badri Janakiraman about #hexagonalrails, and in particular about the role of databases. The central question in the discussion is whether the database should be considered outside or inside the domain. While watching, I realised I had had similar thoughts in 2005!

In recent years I have considered databases to be always outside the domain. I can definitely see the attraction of an “always present” domain model, but I think it is conflating different points of view, and misunderstanding the point of Hexagonal Architecture. I was wrong in 2005 :)

The comments on the video are very interesting, particularly those by Alistair Cockburn. Specifically he makes two key points:

  1. [There is no] debate of whether the persistence is in or out in HA, it is out. So you should say you chose not to use that piece of HA, not that you used it but brought the db inside.
  2. the purpose of HA is the configurable link t0 db

By forcing ourselves to keep to database outside of the domain we respect the hexagonal symmetry , and this is the only way to guarantee complete separation of concerns. The choice of Active Record or Data Mapper then becomes a decision about how to implement the “configurable database” port/adapter.