the wrong duplication

Here’s something I’ve seen many times in code right across the spectrum, and which I caught myself doing just the other day. It seems to occur particularly often in codebases that weren’t TDD’d, and which haven’t been kept lean with regular refactoring…

Imagine a codebase in which one particular text string occurs frequently. The string may be some kind of message header, for example, or an XML tag (both of these examples are usually accompanied by another repeated string, representing the message footer or the closing tag). As the codebase grows, you notice the string popping up more and more frequently, until the “duplication” bell goes off. What to do?

It turns out we’ve all been taught the answer: Create a manifest constant (a #define or a static final String or whatever, depending on the language), thereby giving the common string a name, and replace all instances of the string by use of the constant. There’s even a name for this refactoring in Martin Fowler’s catalogue: Replace Magic Number With Symbolic Constant. Duplication sorted, right? Wrong!

The duplication wasn’t in the repeated text of the string’s contents, but in the repeated use of the string. The resulting code still contains the original duplication. Worse, by hiding the string behind a constant, my feeling is that we’ve made the code less readable too.

The right approach – after using Inline Constant to put the original strings back into the code – is to look at the context in which each of those strings occurred. Chances are there will be only a small number of kinds of use of the string – perhaps every time a reply message is constructed, for example. If the message has a header and a footer, for example, I’ll look for ways to create a Template Method that builds the message, with a call-out hook to a function that supplies the message body. Ninety percent of the time that kind of approach works; but on the odd occasion when it isn’t appropriate, I’ll look for ways to encapsulate the string in a simple object, and then push the surrounding code onto that object. In either case the result I’m after is that the string is present inline in the code, but the behavioural context in which that string exists now has a name – hopefully a name that is also meaningful in the application’s domain – and the domain knowledge or policy represented by the associated code is no longer duplicated.

Duplication isn’t about text. It’s about knowledge and meaning.

Advertisements

6 thoughts on “the wrong duplication

  1. Kevin,

    I didn’t realise until I read this post but I’ve been needing something like this for a while. I always wondered why I didn’t like long lists of static constants, always found them difficult to understand and prefered them inline (but not inline everywhere).

    You have hit the nail on the head! Good work!

  2. That fits within the remit of the Pragmatic Programmers’:

    DRY—Don’t Repeat Yourself: Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    and usefully takes the idea to “the next level”. But this has made me think: how do we test for the other aspects of this? How do we ensure the representation is unambiguous, and that it is authoritative? Do there exist code fragments that exhibit these properties that can be shown to be refactored to avoid them? All the examples of DRY I have seen have focussed on SINGLE….

    All the whitespace I’ve added to this comment to blockquote that citation has been collapsed, losing paragraph structure. Hence all the “\n\n” stuff. If this accepts any markup at all, then it would be useful to link to what is acceptable.

  3. Hugh, Now you raise the point, it does seem that the Pragmatics have been a little tautologous! i take ‘authoritative’ to mean the same as ‘single’ in their statement. As for ‘unambiguous’, I have no idea what they meant…

    (BTW, comments do seem to accept simple HTML tags, so I’ve re-instated your blockquote…)

  4. I guess another way to describe this would be to notice that the original code had Connascence of Meaning (two or more places in the codebase have to agree on how to interpret that string). And the refactored code, with the extracted constant, now has Connascence of Meaning _and_ Connascence of Name.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s