Code duplication is often harmful - except when it isn't. Learn how to think about the trade-offs involved.

Good programmers know that code duplication should be avoided. There's a cost associated with duplicated code, so we have catchphrases like Don't Repeat Yourself (DRY) in order to remind ourselves that code duplication is evil.

It seems to me that some programmers see themselves as Terminators: out to eliminate all code duplication with extreme prejudice; sometimes, perhaps, without even considering the trade-offs involved. Every time you remove duplicated code, you add a level of indirection, and as you've probably heard before, all problems in computer science can be solved by another level of indirection, except for the problem of too many levels of indirection.

Removing code duplication is important, but it tends to add a cognitive overhead. Therefore, it's important to understand why code duplication is harmful - or rather: when it's harmful.

Rates of change

Imagine that you copy a piece of code and paste it into ten other code bases, and then never touch that piece of code again. Is that harmful?

Probably not.

Here's one of my favourite examples. When protecting the invariants of objects, I always add Guard Clauses against nulls:

if (subject == null)
    throw new ArgumentNullException("subject");

In fact, I have a Visual Studio code snippet for this; I've been using this code snippet for years, which means that I have code like this Guard Clause duplicated all over my code bases. Most likely, there are thousands of examples of such Guard Clauses on my hard drive, with the only variation being the name of the parameter. I don't mind, because, in my experience, these two lines of code never change.

Yet many programmers see that as a violation of DRY, so instead, they introduce something like this:

Guard.AgainstNull(subject, "subject");

The end result of this is that you've slightly increased the cognitive overhead, but what have you gained? As far as I can tell: nothing. The code still has the same number of Guard Clauses. Instead of idiomatic if statements, they are now method calls, but it's hardly DRY when you have to repeat those calls to Guard.AgainstNull all over the place. You'd still be repeating yourself.

The point here is that DRY is a catchphrase, but shouldn't be an excuse for avoiding thinking explicitly about any given problem.

If the duplicated code is likely to change a lot, the cost of duplication is likely to be high, because you'll have to spend time making the same change in lots of different places - and if you forget one, you'll be introducing bugs in the system. If the duplicated code is unlikely to change, perhaps the cost is low. As with all other risk management, you conceptually multiply the risk of the adverse event happening with the cost of the damage associated with that event. If the product is low, don't bother addressing the risk.

The Rule of Three

It's not a new observation that unconditional elimination of duplicated code can be harmful. The Rule of Three exists for this reason:

  1. Write a piece of code.
  2. Write the same piece of code again. Resist the urge to generalise.
  3. Write the same piece of code again. Now you are allowed to consider generalising it.
There are a couple of reasons why this is a valuable rule of thumb. One is that the fewer examples you have of the duplication, the less evidence you have that the duplication is real, instead of simply a coincidence.

Another reason is that even if the duplication is 'real' (and not coincidental), you may not have enough examples to enable you to make the correct refactoring. Often, even duplicated code comes with small variations:

  • The logic is the same, but a string value differs.
  • The logic is almost the same, but one duplicate performs an extra small step.
  • The logic looks similar, but operates on two different types of object.
  • etc.
How should you refactor? Should you introduce a helper method? Should the helper method take a method argument? Should you extract a class? Should you add an interface? Should you apply the Template Method pattern? Or the Strategy pattern?

If you refactor too prematurely, you may perform the wrong refactoring. Often, people introduce helper methods, and then when they realize that the axis of variability was not what they expected, they add more and more parameters to the helper method, and more and more complexity to its implementation. This leads to ripple effects. Ripple effects lead to thrashing. Thrashing leads to poor maintainability. Poor maintainability leads to low productivity.

This is, in my experience, the most important reason to follow the Rule of Three: wait, until you have more facts available to you. You don't have to take the rule literally either. You can wait until you have four, five, or six examples of the duplication, if the rate of change is low.

The parallel to statistics

If you've ever taken a course in statistics, you would have learned that the less data you have, the less confidence you can have in any sort of analysis. Conversely, the more samples you have, the more confidence can you have if you are trying to find or verify some sort of correlation.

The same holds true for code duplication, I believe. The more samples you have of duplicated code, the better you understand what is truly duplicated, and what varies. The better you understand the axes of variability, the better a refactoring you can perform in order to get rid of the duplication.

Summary

Code duplication is costly - but only if the code changes. The cost of code duplication, thus, is C*p, where C is the cost incurred, when you need to change the code, and p is the probability that you'll need to change the code. In my experience, for example, the Null Guard Clause in this article has a cost of duplication of 0, because the probability that I'll need to change it is 0.

There's a cost associated with removing duplication - particularly if you make the wrong refactoring. Thus, depending on the values of C and p, you may be better off allowing a bit of duplication, instead of trying to eradicate it as soon as you see it.

You may not be able to quantify C and p (I'm not), but you should be able to estimate whether these values are small or large. This should help you decide if you need to eliminate the duplication right away, or if you'd be better off waiting to see what happens.


Comments

Markus Bullmann

I would add one more aspect to this article: Experience.

The more experience you gain the more likely you will perceive parts of your code which could possibly benefit from a generalization; but not yet because it would be premature.

I try to keep these places in mind and try to simplify the potential later refactoring. When your code meets the rule of three, you’re able to adopt your preparations to efficiently refactor your code. This leads to better code even if the code isn't repeated for the third time.

Needless to say that experience is always crucial but I like to show people, who are new to programing, which decisions are based on experience and which are based on easy to follow guidelines like the rule of three.

2014-08-20 22:03 UTC

Markus, thank you for writing. Yes, I agree that experience is always helpful.

2014-08-21 8:48 UTC
Sergey Telshevsky

Wonderful article but I'd like to say that I prefer sticking to the Zero one infinity rule instead of the Rule Of Three

That way I do overcome my laziness of searching for duplicates and extracting them to a procedure. Also it does help to keep code DRY without insanity to use it only on 2 or more lines of code. If it's a non-ternary oneliner I'm ok with that. If it's 2 lines of code I may consider making a procedure out of it and 3 lines and more are extracted every time I need to repeat them.

2014-12-23 14:00 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or Google Plus, or somewhere else with a permalink. Ping me with the link, and I may add it as a comment.

Published

Thursday, 07 August 2014 20:11:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!