thoughts on feature envy

I’m currently debating with myself on how and where to take Reek next. Specifically, I’m thinking about the FeatureEnvy / UtilityFunction class of code smell and their relation with classes and class methods. I’ve begun unravelling some thoughts over on the ruby-reek mailing list, and I thought I’d seek your opinion too. Much of what follows may turn out to be complete tripe…

There are always two views of any code, imo: specification-time and run-time. Code smells are, I think, (very nearly) 100% in the specification-time space; they deal with the code as written: does this name communicate well, is this code fragment coherent with those around it, will these two fragments change at different times, and so on; aspects of the specification that don’t affect behaviour. Classes are the means by which we describe the behaviour of runtime objects. It’s nice and sexy that in Ruby we also have runtime objects representing those classes, but I don’t necessarily see them as being the same beasts. One is a specification, the other is a runtime mechanism.

So one view of FeatureEnvy and UtilityFunction is that they help to improve the cohesiveness of the specification, by encouraging us to move code fragments closer to the state/data on which they operate. And this is where it begins to get hazy for me…

Consider a “helper” method that is only called by the methods of a single class, but which makes no use of the state of that class’s instances; where should its specification live? Shoulld it live with the class that uses it, or should it move towards the data on which it operates? Usually such methods exist in order to help simplify the caller’s interaction with something — I’ll posit that such methods are usually Adapters. So which is worse — have the adapter live with the adaptee or with the only client? Should it only move to (or towards) the adaptee when a second client needs it, perhaps?

One might argue that moving it to the adaptee creates a case of SpeculativeGeneralisation; in which case it should remain with the only client until it is duplicated by some other class. On the other hand, moving the fragment towards the adaptee can help to encapsulate the adaptee better, often allowing it to expose less to the outside world. All of which seems to imply that we can’t determine the best response to seeing Feature Envy without looking at other smells, current and future.

One approach would be to not treat FeatureEnvy / UtilityFunction as smells, but instead to look at the adaptee and call foul if we see a getter or setter; and in the absence of getters and setters, wait for duplication to show us what code needs to be shared, and hence moved. I think I may try living with that approach for a month and see what happens…

Reflections on shotgun surgery

I’ve just finished refactoring the code of Reek to (almost) eliminate a large dose of Shotgun Surgery, so I thought the time was ripe for a little reflection on what occurred and why.

What is Shotgun Surgery?

This is a code smell that occurs when you have to change loads of existing code in order to make a single change to the overall codebase.

In this case I wanted to change the reports generated by Reek’s spec matchers so that test failures are described more succinctly. To do so I had to change 7 classes, adding 4 methods (three of which all had the same name) and modifying several others. I pushed quickly to GREEN, at which point the code looked like I had driven a bulldozer through it from one end to the other and back again.

target-335029_1280

Ok, that happens all the time. Why is it a problem?

Because it creates drag. I had known for a while that certain kinds of change in this codebase would be painful to make, and the pain I imagined was subconsciously steering me away from making those kinds of change. I had been spending my time making changes that I knew were easier, putting off tackling the no-go areas of the codebase.

Furthermore, as I was tackling this smell I discovered a few defects in the code. They were edge cases, but they were definitely bugs. They had festered there in the dark corners where I had been afraid to go; and for all I know they had tripped up some new user of Reek and put them off forever.

And perhaps worst of all, it indicates that there’s a single design decision whose implementation (and consequences) is distributed throughout the codebase. So any similar or related decisions are going to be just as difficult and messy, and pretty soon they’ll all pile up together and render the code completely unmaintainable.

Why did it get so bad?

I think the main reason is that I’m not particularly well tuned to noticing Shotgun Surgery. (I believe I’m not alone in this.) We write a test, and it takes a bulldozer to get to green. We get to green, and we make a few localized refactorings to tidy up some of the methods we touched. But we often don’t see the bigger picture. “Hey, I just had to touch 6 classes to get that done; there’s a design issue here.” I’m sure I had previously failed to notice this or related smells on several previous occasions.

I had also failed to listen to my inner demon. I had “seen” several pieces of the eventual puzzle many weeks earlier, but I hadn’t addressed them because they didn’t seem serious. For example, I knew that some classes violated one of my central tenets of a good hexagonal architecture. But I told myself that was just a guideline — I shouldn’t push the code towards that shape unless the code told me it wanted to go. It turned out that the code had been screaming at me for some time!

How did I fix it?

The first, and most important, step is to identify the decision that hadn’t been properly and uniquely represented in the code. The aim is to refactor the code so that if I wished to change that decision again I would only need to make a single change to a single class. This is the hardest part of fixing Shotgun Surgery, and it’s a design step.

In this case I wanted to create a different kind of report, and I wanted to do it in the spec matcher. But quiet reports were currently a special case within full reports, implemented by reading an attribute from a global variable containing the user’s preferences. And the report itself was created way over the other side of the system. In a lazily evaluated cache.

I won’t bore you with the details. I decided to do this one “by the book”, inverting dependencies one by one until the Report object was created once, at the identified decision point in the code. Over the course of a dozen hours I changed all of the above 7 classes again; I introduced 4 new classes; and at the close of play none of the methods I had touched in the original change still remained.

Could this cost have been avoided?

I think this is a natural process — reasonably well-refactored code suddenly receives a change request that goes against the grain, so there’s some origami required before the old and the new sit comfortably together. That much is inevitable.

But I’m sure I missed fainter warning signs earlier in the code’s life. Had I responded to them, this week’s change might have been much less traumatic. So in future I will try to be much more vigilant: the next time a test causes me to change more than one class, I’ll sort it out right there and then.

factory method in ruby

During my refactoring homework last evening I noticed a little tug-of-war between two different coding styles, and after a restless night I’ll try to analyse here what was going on…

Deep inside Reek is a Source class, whose instances are responsible for converting Ruby code into abstract syntax trees for later examination by the various code smell detectors. Source code can come from many different places, and so Source has a number of Factory Methods like this:

class Source
  def self.from_io(ios)
    new(ios.readlines.join)
  end

  def self.from_s(code)
    new(code)
  end

  def self.from_f(file)
    from_path(file.path)
  end

  def self.from_path(filename)
    new(IO.readlines(filename).join)
  end

  # ...
end

(The factory methods also do some other stuff, which I’ve elided for clarity.)

As the refactoring exercise proceeded, I found that one of these methods came to be called from only one place. So I considered inlining it, and that’s when the tug-of-war began. Part of me resisted inlining the method, even though that’s what the code seemed to want. I argued that the Factory Methods were “potentially useful”, that they represented the Source abstraction and helped to document it; the code pulled the opposite way. (My wife was sitting across the room, so I kept this argument quiet inside my head; maybe that’s part of my problem.) Anyway, to cut a long story short, at the end of my refactoring session the code had changed to (something very similar to) this:

class File
  def to_source
    Reek::Source.new(self.path)
  end
end

class IO
  def to_source
    Reek::Source.new(self.readlines.join)
  end
end

class String
  def to_source
    Reek::Source.new(self)
  end
end

The Gang of Four didn’t name this pattern, so I’ll call these methods “Converters”. (Please tell me if this pattern has a more well-known name.) They’re just like Factory Methods, but they sit on the class being converted from instead of on the class being converted to.

The argument between me and the code was between the forces in favour of each pattern. In favour of the original Factory Methods:

  • In C++/Java — the world in which the GoF did their work, and where my roots lie — this is the only place to put the methods;
  • the dependencies point from Source to its origins (File, String, IO et al), which seems quite natural;
  • the compiler checks that the Factory Methods are called with the right types of input object;
  • all the ways to create a Source object are together, so they are easy to find and compare.

In favour of Converters:

  • We could later introduce new ways to create a Source without opening up that class;
  • they are instance methods, and hence just a little more natural, testable and “object-oriented”;
  • methods such as to_s, to_a, to_i are idiomatic in Ruby, which enhances the Communication value of this approach;
  • you need a File (for example) in order to call the Converter. (Actually, I’m not sure this counts as an advantage, because it prevents duck typing.)

On balance, I feel the C++/Java form of Factory Method has many advantages. But I switched the code to use the Converter form, mainly to see what would happen. I could easily switch back again now I’ve seen the forces laid out for comparison in this blog post…

the “anchor adapter”

Warning: academic theorizing and hypothesizing follow. Oh, and half-baked pontification.

I just finished refactoring reek to drive in a major new chunk of functionality (configuration files) which I’ll release soon, when I’ve had time for some thorough testing.

The refactoring needed to accommodate the change was huge, occupying much of my free time over the course of two months. Pretty much the whole of the tool’s original architecture has been revised. Why so big and so complex? Because the original code relied heavily on constants and class methods; they helped me get the early versions written quickly, but they represented a significant barrier to long-term flexibility. I’ve been wondering why that should be; why do constants and class methods stand in the way of adaptable test-driven code?

I think the answer lies in viewing the application through the lens of Hexagonal Architecture. Let me explain…

It seems to me that constants, global variables, classes, class methods, etc all live in a space that’s “anchored” to the runtime environment, which is itself a singleton. Anything anchored to that singleton is going to hinder the independence and isolatedness of unit tests, and also reduce the application’s flexibility in well-known ways. So far so standard. Now, suppose we model the singleton as a notional point that is external to the application. Hexagonal Architecture tells us we must access the singleton via an Adapter — in this case, an Adapter provided by the programming language and/or runtime. I’ll refer to the singleton as the application’s Anchor, and therefore claim that it is accessed through language features in an Anchor Adapter.

Now, I believe that the Domain Middle should not depend directly on Adapters. So any code that makes direct use of the Anchor Adapter must therefore be considered outside of the Domain Middle, and hence part of an Adapter — and hence also inherently outside the space where unit tests live comfortably.

Which is why constants and class methods add friction to unit testing.

Or rather: This model fits nicely with my penchant for Hexagonal Architecture, and lets me justify my unease at testing in and around class methods. And probably adds nothing to our understanding of software development.

too much TemplateMethod

I’ve been refactoring a lot during the festive break, and I’ve noticed that in many cases it was more difficult than I would have liked. Today I think I figured out the reason for that: I use the TemplateMethod pattern too much.

When I see a duplicated algorithm, it seems that my natural tendency is to push up the skeleton into a superclass. This creates an inheritance relationship within the algorithm, which in turn makes it harder to change. Later, when I do need to change the algorithm, I have to change the superclass and all of the subclasses at the same time. For example, one particular superclass in reek contained three or four template methods, which made the subclasses look quite odd; and each little complex of template-plus-overrides significantly hampered design change in each of the others.

Looking back over my programming career I see that I’ve always had this tendency — I can see it in my old C++, Java and Ruby code. I wonder why? Is it the cost of extra classes, or my mathematical background, or coding habits ingrained before the rise of object-oriented languages? Who knows (and who cares).

So, note to self: Break out State/Strategy objects (and in Ruby this includes Procs) instead of always relying on TemplateMethod.

primitive obsession in ruby

I spent an hour or so this afternoon refactoring some code deep inside reek. I wanted to simplify the code that checks for the Uncommunicative Name smell, but it just wouldn’t fall out as clean as I would like. And at some points it seemed that I was going around in circles — a name is represented as a Symbol here, needs to be a String there, but when I add a call to to_s something else seems to need it back as a Symbol again. It’s the old feeling of squeezing water in a balloon: squeeze in any one place and the problem seems to pop up somewhere else. I put the problem down to tiredness and took a break — actually I took a drive in falling snow, which is not necessarily a cure for tiredness!

But the change of scene cleared my thoughts, and I realised what was going on: The troublesome names were a case of the Primitive Obsession smell. Several classes throughout reek “know” when a name travels around as a Symbol and when it travels around as a String. I needed a new class, Name, to hide the details of the actual representation. I added it (following the recipe in the ruby refactoring workbook, you’ll be glad to hear), and all was well. The new class even acquired some behaviour along the way, so all is well.

So, nothing new there; why am I even writing this at all? Because I asked myself why it had happened — why had such a huge case of Primitive Obsession escaped my attention for so many weeks, and why didn’t I pick it up this afternoon when it hit me repeatedly between the eyes (can a smell do that)? Is this another manifestation of my anosmia? Am I smell-blind to Primitive Obsession? I don’t think so.

Every day in my coaching practice I see code riddled with this same smell. I point it out, we have a quick ad-hoc training session and a couple of lunchtime dojos. I can spot this problem easily. The difference is that the teams I work with all use C# or Java or C++, where the types are stated clearly in the method signatures. But in Ruby that’s not the case: when you read a method in Ruby code, you have to infer the types of the actual arguments by looking at how they are used. Parameter names don’t always help here either (in the reek source code above, sometimes the names were called ‘sym’, other times ‘name’).

So I contend that dynamic typing makes Primitive Obsession harder to diagnose. Do you find that too? Does this smell linger longer in Ruby code than in Java code, for example?

And if my contention is true, what new diagnostic tools do we need? If we don’t have statically typed method signatures slapping us with the wet fish of int and string, are there any simple alternative cues we can look for?

refactoring workbook featured at on-ruby

Today Pat Eyler of On-Ruby fame has posted the second half of his interview with myself and Bill Wake. This part focusses on our upcoming Ruby Refactoring Workbook — why we wanted to write it, and what we learned about Ruby and refactoring while writing it.

As ever, Pat has done a seriously good job translating our ramblings into a coherent article — thanks Pat!

reek 0.3.1 released

I’ve just release reek version 0.3.1 with the following wonderful smelly goodness:

  • Uncommunicative Name now checks instance variables more thoroughly, and also warns about names of the form ‘x2’.
  • There’s a new warning about multiple identical calls within a single method…
  • …and consequently the scope of Feature Envy warnings has been reduced, so that it now only covers overuse of lvars within a single method. Hopefully the warnings will be a little less rabid than in previous versions!
  • Each checked code smell now has rdoc comments explaining what each smell is about and providing a simple example.
  • Chained iterators are no longer mis-reported as nested.

Available now at a gem repository near you.