Interesting link:
In Improving Code using Metric Fu the folks at devver.net give a little insight into how they have been using Reek and the other Ruby quality tools to improve their codebase.

A couple of days ago I made a little change to Reek and to my surprise 89 tests broke, which is bad. I tweeted about it, and soon @logosity had asked me to write about why it had happened. I know the same sort of thing happens to people all the time, and it can be quite traumatic in the otherwise placid flow of test-driven development. So I’m hoping it will be somewhat helpful to uncover the path that led me from blissful ignorance to test calamity with one keystroke. So without further ado, let’s do some root cause analysis…

Undesirable event:
I deleted an attr_reader that was called in only one place and 89 of Reek’s 360-odd rspec examples broke. Of these broken tests, only three were tests of SimulatedPolymorphism#examine_context, which was the deleted attribute’s one caller.

Why did the other 86 tests break unexpectedly?

Because they each indirectly invoked the one method that now could no longer work.

Why did they invoke the broken method?

Because they are integration tests, checking the co-ordinated behaviour of a dozen or so classes.

The broken method sits on a relatively new class, an instance of which is automatically included in the framework that those tests invoke.

The central class in Reek is the Sniffer, which obtains a syntax tree from a Source (eg. a File or a String or an IO stream) and then traverses it, firing events at registered SmellDetector objects as it encounters syntax elements they subscribe to. The broken method is in one of these listeners, and it is fired whenever the parser encounters a new class definition in the syntax tree.

Reek has a couple of hundred tests that each run a Sniffer over a code snippet, checking that the detectors find a specific set of code smells (and don’t find any others). The broken detector (SimulatedPolymorphism) is fired on any code fragment that includes a class definition; it turns out that only 86 of these couple hundred tests do that. The damage could have been much worse.

All of which means that the failing tests are “integration tests”, checking the overall behaviour of over a dozen classes under a variety of specific circumstances. Under some circumstances one of those classes is broken, so some of these integration tests fail.

Why are there so many integration tests in the unit test suite?

Because they are much easier to write than the tests for a single class would be. And they express the desired behaviour in ways that unit tests can’t. So I never took the time to write the unit tests.

Here’s an example of one of these tests:

describe ControlCouple do
  it 'should report a ternary check on a parameter' do
    code = 'def simple(arga) arga ? @ivar : 3 end'
    code.should reek_only_of(:ControlCouple, /arga/)
  end
end

The corresponding unit tests would confirm that:

  • the code parser and method context objects conspire together to record the method’s parameters
  • the code parser sends an event to the control couple detector when, and only when, it sees a conditional expression
  • the control couple detector can obtain the enclosing method’s parameters
  • the control couple detector records a smell warning only when the conditional expression tests the value of a method parameter
  • the smell warning includes the correct information
  • no other detector records any warnings for the given code fragment

Unit tests exist for some of these cases and not for others. The ease with which I can write readable tests has made me lazy: I often don’t write the unit tests, because I get new features into Reek much more quickly this way. The penalty occurs when a change breaks one of the detectors or (worse) something in the Sniffer, because then I get dozens of broken tests and it can be difficult to pinpoint the problem quickly.

There’s also another reason I got lazy: As Reek’s internal architecture stabilized many months ago, most of the new code I write consists of adding new smell detector classes, which usually slot right into the existing stable framework with few surprises or problems.

So the root cause of my 89 test failures is the combination of a well-tested, stable, practically defect-free framework together with a great way to quickly write readable integration test examples. Is that so bad after all, I wonder.

Am I going to fix the problem? Yes and no. Each time this happens in future I will add the missing unit tests that should have failed. But I don’t want to lose the expressive value of these integration tests, so I plan to (gradually) organise them into a different test suite more closely related to Reek’s cucumber tests. What would you do?

In Ruby, one easy way to check for palindromes would be to write a method such as this:


class String
 def palindrome?
   self == self.reverse
 end
end

But the C++ part of my brain screams that this is likely to be very inefficient. So, today’s kata: test-drive a palindrome? method for String, ensuring (with tests) that it is much faster than the above.

I’m currently debating with myself on how and where to take Reek next. Specifically, I’m thinking about the FeatureEnvy / UtilityFunction class of code smell and their relation with classes and class methods. I’ve begun unravelling some thoughts over on the ruby-reek mailing list, and I thought I’d seek your opinion too. Much of what follows may turn out to be complete tripe…

There are always two views of any code, imo: specification-time and run-time. Code smells are, I think, (very nearly) 100% in the specification-time space; they deal with the code as written: does this name communicate well, is this code fragment coherent with those around it, will these two fragments change at different times, and so on; aspects of the specification that don’t affect behaviour. Classes are the means by which we describe the behaviour of runtime objects. It’s nice and sexy that in Ruby we also have runtime objects representing those classes, but I don’t necessarily see them as being the same beasts. One is a specification, the other is a runtime mechanism.

So one view of FeatureEnvy and UtilityFunction is that they help to improve the cohesiveness of the specification, by encouraging us to move code fragments closer to the state/data on which they operate. And this is where it begins to get hazy for me…

Consider a “helper” method that is only called by the methods of a single class, but which makes no use of the state of that class’s instances; where should its specification live? Shoulld it live with the class that uses it, or should it move towards the data on which it operates? Usually such methods exist in order to help simplify the caller’s interaction with something — I’ll posit that such methods are usually Adapters. So which is worse — have the adapter live with the adaptee or with the only client? Should it only move to (or towards) the adaptee when a second client needs it, perhaps?

One might argue that moving it to the adaptee creates a case of SpeculativeGeneralisation; in which case it should remain with the only client until it is duplicated by some other class. On the other hand, moving the fragment towards the adaptee can help to encapsulate the adaptee better, often allowing it to expose less to the outside world. All of which seems to imply that we can’t determine the best response to seeing Feature Envy without looking at other smells, current and future.

One approach would be to not treat FeatureEnvy / UtilityFunction as smells, but instead to look at the adaptee and call foul if we see a getter or setter; and in the absence of getters and setters, wait for duplication to show us what code needs to be shared, and hence moved. I think I may try living with that approach for a month and see what happens…

I just watched a great video of a “performance kata” by Corey Haines. Highly recommended!

Watching Corey solve the kata differently than I would have done has taught me several new tricks. Notably spec -fs spec — I get a completely unreadable mess when I run that on my own code, so there’s some work to do there in terms of how I name my specs.

Also, my first instinct would not be to extract the do_replacement helper method. I noticed GeePawHill do that too during his Double Dawg Dare refactoring videos; so it’s common, and a couple of years ago I was doing it all the time. But recently I’ve tried to work the opposite way — if the tests need a helper, I take that as a sign that the API under test isn’t as developer-friendly as it might be. So I look for ways to DRY up the tests by making the test subject’s interface more fluid. The helper method approach certainly gets results more quickly, but I worry about the longer term maintainability of the code; maybe I shouldn’t; maybe I think too much instead of just doing TDD “as if I meant it”.

All of which means that performance kata are a fantastic way to share knowledge, develop new skills and hone techniques. Time for some more practice…

I just had news that our session “Developing a Sense of Smell” has been accepted for XPDays Benelux this year. I’ll be running the session with Lindsay McEwan, and we plan to explore the use of code smells as a tool for communicating about software design. Join us if you can!

For nine months now Corey Haines has been travelling around the Eastern US as a journeyman programmer. During that time he has run numerous code retreats, published numerous fantastic interviews with leading developers, and made dozens of blog posts and videos in which he reflects on the things he’s learned from the people he’s paired with. The body of Corey’s travails is already beginning to form an invaluable reference, and all of the insights he has pushed into the public domain add to our collective knowledge about how to develop software carefully (in both senses of the word).

Just as amazing as the project’s knowledge elicitation is the fact that Corey is self-funded. He pair programs for food and lodging, and just by doing that he has given back an enormous amount to the community. Well, now he’s skint. The journey will continue only if Corey can raise some funds.

So please visit Corey’s blog and pledge a few quid to keep the journeyman on the road. I did, and it didn’t hurt a bit.

And next year, Corey, how about we raise the cash to get you over to the UK for a tour?

A few months ago I asked around to find out how (or whether) people organised the Cucumber feature files in their project’s features folder. Nothing in particular turned up. But yesterday the good folks at Thoughtbot posted their own very interesting convention: They have a sub-folder of features/ for each Actor served by the application. For example:

features/admins/
features/api_clients/
features/users/
features/visitors/

I like this a lot, and I’ll be re-organising Reek’s features soon along the same lines. In fact, thinking about this has made me realise that there’s a whole class of user (API clients) not represented by Reek’s current set of Cucumber features.

And here’s an interesting thought experiment I’ll also be trying out: How about adding another folder level for the Actor’s goal? So the folders would be organised according to the commonalities among the feature descriptions: features/<as_a>/<in_order_to>/feature. For example:

features/
  administrator/
    manage_products/
      create_catalogue.feature
      retire_product_line.feature
      ...
    manage_users/
      grant_access.feature
      ...

Will that be useful, or just create more clutter? In general I dislike hierarchical classification, so I was keen to keep the sub-folder names as verbs. I’ll try it for Reek and report my feelings back here. Please let us know if you try it first…

I’ve just finished refactoring the code of Reek to (almost) eliminate a large dose of Shotgun Surgery, so I thought the time was ripe for a little reflection on what occurred and why.

What is Shotgun Surgery?

This is a code smell that occurs when you have to change loads of existing code in order to make a single change to the overall codebase.

In this case I wanted to change the reports generated by Reek’s spec matchers so that test failures are described more succinctly. To do so I had to change 7 classes, adding 4 methods (three of which all had the same name) and modifying several others. I pushed quickly to GREEN, at which point the code looked like I had driven a bulldozer through it from one end to the other and back again.

Ok, that happens all the time. Why is it a problem?

Because it creates drag. I had known for a while that certain kinds of change in this codebase would be painful to make, and the pain I imagined was subconsciously steering me away from making those kinds of change. I had been spending my time making changes that I knew were easier, putting off tackling the no-go areas of the codebase.

Furthermore, as I was tackling this smell I discovered a few defects in the code. They were edge cases, but they were definitely bugs. They had festered there in the dark corners where I had been afraid to go; and for all I know they had tripped up some new user of Reek and put them off forever.

And perhaps worst of all, it indicates that there’s a single design decision whose implementation (and consequences) is distributed throughout the codebase. So any similar or related decisions are going to be just as difficult and messy, and pretty soon they’ll all pile up together and render the code completely unmaintainable.

Why did it get so bad?

I think the main reason is that I’m not particularly well tuned to noticing Shotgun Surgery. (I believe I’m not alone in this.) We write a test, and it takes a bulldozer to get to green. We get to green, and we make a few localized refactorings to tidy up some of the methods we touched. But we often don’t see the bigger picture. “Hey, I just had to touch 6 classes to get that done; there’s a design issue here.” I’m sure I had previously failed to notice this or related smells on several previous occasions.

I had also failed to listen to my inner demon. I had “seen” several pieces of the eventual puzzle many weeks earlier, but I hadn’t addressed them because they didn’t seem serious. For example, I knew that some classes violated one of my central tenets of a good hexagonal architecture. But I told myself that was just a guideline — I shouldn’t push the code towards that shape unless the code told me it wanted to go. It turned out that the code had been screaming at me for some time!

How did I fix it?

The first, and most important, step is to identify the decision that hadn’t been properly and uniquely represented in the code. The aim is to refactor the code so that if I wished to change that decision again I would only need to make a single change to a single class. This is the hardest part of fixing Shotgun Surgery, and it’s a design step.

In this case I wanted to create a different kind of report, and I wanted to do it in the spec matcher. But quiet reports were currently a special case within full reports, implemented by reading an attribute from a global variable containing the user’s preferences. And the report itself was created way over the other side of the system. In a lazily evaluated cache.

I won’t bore you with the details. I decided to do this one “by the book”, inverting dependencies one by one until the Report object was created once, at the identified decision point in the code. Over the course of a dozen hours I changed all of the above 7 classes again; I introduced 4 new classes; and at the close of play none of the methods I had touched in the original change still remained.

Could this cost have been avoided?

I think this is a natural process — reasonablly well-refactored code suddenly receives a change request that goes against the grain, so there’s some origami required before the old and the new sit comfortably together. That much is inevitable.

But I’m sure I missed fainter warning signs earlier in the code’s life. Had I responded to them, this week’s change might have been much less traumatic. So in future I will try to be much more vigilant: the next time a test causes me to change more than one class, I’ll sort it out right there and then.

During my refactoring homework last evening I noticed a little tug-of-war between two different coding styles, and after a restless night I’ll try to analyse here what was going on…

Deep inside Reek is a Source class, whose instances are responsible for converting Ruby code into abstract syntax trees for later examination by the various code smell detectors. Source code can come from many different places, and so Source has a number of Factory Methods like this:

class Source
  def self.from_io(ios)
    new(ios.readlines.join)
  end

  def self.from_s(code)
    new(code)
  end

  def self.from_f(file)
    from_path(file.path)
  end

  def self.from_path(filename)
    new(IO.readlines(filename).join)
  end

  # ...
end

(The factory methods also do some other stuff, which I’ve elided for clarity.)

As the refactoring exercise proceeded, I found that one of these methods came to be called from only one place. So I considered inlining it, and that’s when the tug-of-war began. Part of me resisted inlining the method, even though that’s what the code seemed to want. I argued that the Factory Methods were “potentially useful”, that they represented the Source abstraction and helped to document it; the code pulled the opposite way. (My wife was sitting across the room, so I kept this argument quiet inside my head; maybe that’s part of my problem.) Anyway, to cut a long story short, at the end of my refactoring session the code had changed to (something very similar to) this:

class File
  def to_source
    Reek::Source.new(self.path)
  end
end

class IO
  def to_source
    Reek::Source.new(self.readlines.join)
  end
end

class String
  def to_source
    Reek::Source.new(self)
  end
end

The Gang of Four didn’t name this pattern, so I’ll call these methods “Converters”. (Please tell me if this pattern has a more well-known name.) They’re just like Factory Methods, but they sit on the class being converted from instead of on the class being converted to.

The argument between me and the code was between the forces in favour of each pattern. In favour of the original Factory Methods:

  • In C++/Java — the world in which the GoF did their work, and where my roots lie — this is the only place to put the methods;
  • the dependencies point from Source to its origins (File, String, IO et al), which seems quite natural;
  • the compiler checks that the Factory Methods are called with the right types of input object;
  • all the ways to create a Source object are together, so they are easy to find and compare.

In favour of Converters:

  • We could later introduce new ways to create a Source without opening up that class;
  • they are instance methods, and hence just a little more natural, testable and “object-oriented”;
  • methods such as to_s, to_a, to_i are idiomatic in Ruby, which enhances the Communication value of this approach;
  • you need a File (for example) in order to call the Converter. (Actually, I’m not sure this counts as an advantage, because it prevents duck typing.)

On balance, I feel the C++/Java form of Factory Method has many advantages. But I switched the code to use the Converter form, mainly to see what would happen. I could easily switch back again now I’ve seen the forces laid out for comparison in this blog post…