Here’s something I’ve seen many times in code right across the spectrum, and which I caught myself doing just the other day. It seems to occur particularly often in codebases that weren’t TDD’d, and which haven’t been kept lean with regular refactoring…
Imagine a codebase in which one particular text string occurs frequently. The string may be some kind of message header, for example, or an XML tag (both of these examples are usually accompanied by another repeated string, representing the message footer or the closing tag). As the codebase grows, you notice the string popping up more and more frequently, until the “duplication” bell goes off. What to do?
It turns out we’ve all been taught the answer: Create a manifest constant (a
#define or a
static final String or whatever, depending on the language), thereby giving the common string a name, and replace all instances of the string by use of the constant. There’s even a name for this refactoring in Martin Fowler’s catalogue: Replace Magic Number With Symbolic Constant. Duplication sorted, right? Wrong!
The duplication wasn’t in the repeated text of the string’s contents, but in the repeated use of the string. The resulting code still contains the original duplication. Worse, by hiding the string behind a constant, my feeling is that we’ve made the code less readable too.
The right approach – after using Inline Constant to put the original strings back into the code – is to look at the context in which each of those strings occurred. Chances are there will be only a small number of kinds of use of the string – perhaps every time a reply message is constructed, for example. If the message has a header and a footer, for example, I’ll look for ways to create a Template Method that builds the message, with a call-out hook to a function that supplies the message body. Ninety percent of the time that kind of approach works; but on the odd occasion when it isn’t appropriate, I’ll look for ways to encapsulate the string in a simple object, and then push the surrounding code onto that object. In either case the result I’m after is that the string is present inline in the code, but the behavioural context in which that string exists now has a name – hopefully a name that is also meaningful in the application’s domain – and the domain knowledge or policy represented by the associated code is no longer duplicated.
Duplication isn’t about text. It’s about knowledge and meaning.