Yesterday Corey Haines posted yet another great video interview with JB Rainsberger. This time the topic under discussion was the Primitive Obsession code smell — what it is, how to recognise it, why it is a problem, what to do about it, and what you’ll see when you’ve fixed it.
I agree with JB — this is a difficult concept to communicate when teaching, and I think part of that comes from the name Primitive Obsession. In fact, after a vast amount of deliberation, Bill Wake and I proposed the alternative name Open Secret in our book Refactoring in Ruby. Our reasoning is that, as JB points out in the video, the essence of this smell is the fact that the lack of an encapsulating abstraction pushes knowledge and responsibilities and behaviour out into the open; whereas the representation of that knowledge should be a secret. It’s not so much the primitives that are the problem, but that they represent a domain abstraction — and that one chosen representation of that concept is visible to, and necessarily understood by, all and sundry.
Does that rename help explain this code smell?
A couple of weeks ago I discovered I had missed dozens of smells in a piece of code I thought I knew well. And today I discovered yet another smell in the same code, this time a really big one. What’s wrong with me?
I have a bunch of code, code that I wrote a while back. I revisited it a couple of weeks ago, because there was a smell in the the back of my mind and I wanted to scratch it. (Equating problems in code with bad smells has always struck me as an awkward metaphor, and somehow I can sense this post being more awkward than most.) However, the remedy for the smell that had been gnawing at me was large and complex, so before tackling it I thought I would look for anything simpler to tackle first. Low-hanging fruit to get my coding muscles warmed up. I put on my code reviewer’s hat and within a few minutes I had found around a dozen other smells. As is my wont, I marked them in the code with <SMELL> comments, and pretty soon there were long sections of the code with a comment on every line!
I had begun this exercise feeling fairly confident that the smell I knew about was pretty much the only smell in the code. But I came away wondering why I had missed all those others. One person, one set of knowledge about smells; two different hats, two very different perceptions of the code. Why?
I now believe this is a case, not of specific anosmia (the inability to detect certain smells, aka “smell blindness”), but of not listening to the code. I had one specific refactoring in mind, and my entire view of the code was distorted by the lens of that goal. But the code wanted something different; and many of these newly-discovered smells were clearly more serious and more urgent than mine. Note to self: let the code set the direction.
And so I thought that was that. I had a list of smells to tackle, and I sat down today to pick one and fix it. But as I was reading through, looking for a candidate, yet another smell hit me square between the eyes. This one wasn’t in the comments, because I hadn’t seen it during my review. And yet it was everywhere (I reckon 10-15% of the code is infected). What’s going on? Again, it can’t be anosmia, because I did detect it – albeit third time around. And I don’t believe this time I was blinded by any preconceived ideas, because I really had no idea what I would be tackling today.
So it must be some kind of layering effect: different smells are visible at different mental distances from the code. Today I was more distant, more detached, than on either previous occasion, and so perhaps I saw more. Maybe by placing the smell comments in the code, and out of my head, I had gained some detachment. Perhaps in doing that I had freed my mind from some of its preconceptions; and the two week break had then allowed it to see even more. When I had the goal smell in my mind it seemed like knowledge, but when I was able to devolve to the code itself, only then did the true situation become apparent.
I believe the moral of this story is that I’m a better programmer when I have Beginner’s Mind.
Martin Fowler’s Mocks aren’t Stubs article presents a balanced comparison of two different styles of unit testing. According to Martin’s taxonomy I’m the type of chap who prefers “state-based” testing over “interaction-based” testing – that is, I prefer using stubs instead of mocks. And I increasingly encounter a downside of mocks that isn’t mentioned by Martin:
Suppose you’re working with legacy code. There are only a few tests; and the tests you already have are all large and complex, because the objects you’re testing were never designed with de-coupling in mind. You have to make a change to one or more of these objects, and you want to proceed test-first. The approach I’ve seen with increasing frequency (probably due to the easy availability of NMock) is to “mock out” a few of the closely-coupled objects. This allows the TDDer to get on with the work, but does nothing to alleviate the underlying problem of the smelly design.
Here, mocks are being misused, as “smart” stubs. I would much prefer to have seen the TDDer spend a little extra time refactoring the production code so that these tests – and all future tests – became easier and quicker to write. (The many excellent techniques in Mike Feathers’ Working Effectively with Legacy Code will help here.) So when I see mock objects being used as the easy way out of a tight corner, I seek instead to refactor away those nasty design smells.
So I’m sitting pairing for the first time on a project that’s been going for a couple of years. We tweak a little code and then run the tests. I’m impressed that they all run in ten seconds or so. But some of them fail, and the bar turns red. “That’s ok,” says my partner. “Some of the tests fail at random every so often.”
We hit Run again, and again we get a red bar. “Like I said. Must be another random failure.” He’s smiling and trying to be confident, but the expression on my face is making him a little unsure.
So we open up the test list, which has a little coloured dot alongside each test. Most of the dots are green, but three are red. My partner scans the list. “Yep, those are the ones that always fail.”
“Sorry – these are the three that randomly fail occasionally. Nothing to worry about.” He moves to get on with the next little refactoring step on our to-do list.
“How do you know they failed for the reason you expect? How do you know we didn’t break them in a new way? Worse still: how do you know whether those expected failures are masking something we just broke, but which isn’t being exercised when the randon failure occurs?”
“We don’t. But everyone’s looked at those tests. They’re impossible to understand, and no-one knows how to fix them. We’d be here all day if we tried, and we’ve got this other stuff to do.”
In My Favourite Smells Chris Wheeler tells an all-too-familiar story of code in which the use of primitives (such as
strings) obscures the true underlying design. Then in Primitive Obsession James Shore echoes the same thought:
“Primitive Obsession is one of my favorite smells as well: it’s easy to spot, easy to fix, and yields some really great designs when thoroughly stamped on.”
Well I just thought I’d add my “me too!” to the list…
I actually view this smell as a subclass of feature envy, in the sense that a primitive obsession often becomes apparent when a section of code spends too much time manipulating a primitive. When I said recently that my approach to refactoring legacy code is based on removing feature envy, I subconsciously included fixing primitive obsession too.
I wonder how much of what I write here is based on mental models that I haven’t surfaced yet? Probably most of it…
In Pseudo-OO Design Dadi Ingolfsson talks about that all-too-common phenomenon – a design in which the code and the data are separate.
“This relates to the Microsoft-style of a DAL (Data Access Layer), a BLL (Business Logic Layer) and a UI, where the data that travels through the layers might be, for instance, dumb Customer objects or DataSet objects. The objects in these different layers are very procedural in nature as they usually only operate on data in objects that are sent to them rather than the data being their own. It´s what Fowler called an Anemic Domain Model. This is not OO design even though many programmers seem to think code is OO if it has classes and objects.”
By coincidence, this week I found myself workshopping agile programming with a team whose legacy C++ system is just like this. My approach to gradually improving the situation is to look for instances of the feature envy smell. Bit by bit, as we find each of these smells, we move the code into the objects where it belongs. Bit by bit, encapsulation improves. Bit by bit, the domain model emerges in the code. And very often, as we introduce abstractions to improve testability, bit by bit a hexagonal architecture emerges too.
Why is this important? Why is it bad to separate the code from the data? Because it stops the code changing at the pace of the business. Distributed behaviour and state mean that each piece of business knowledge is also distributed throughout the system. Which means that any new or changed requirement will cost more to implement.