Temporary Field code smell by Mark Seemann
Temporary Field is a well-known code smell. Here's an example and remedy.
The Temporary Field code smell was described more than a decade ago, but I keep encountering it when doing code reviews. Despite its vintage, I couldn't find a good example, so I decided to provide one.
The code smell is described in Refactoring:
"Sometimes you see an object in which an instance variable is set only in certain circumstances. Such code is difficult to understand, because you expect an object to need all of its variables. [...]
"A common case of temporary field occurs when a complicated algorithm needs several variables. Because the programmer didn't want to pass around a huge parameter list (who does?), he put them in fields."
- Refactoring, Martin Fowler et al., Addison-Wesley 1999. p. 84
Unfortunately, Refactoring doesn't provide an example, and I couldn't find a good, self-contained example on the web either.
Example: estimate a duration #
In this example, a developer was asked to provide an estimate of a duration, based on a collection of previously observed durations. The requirements are these:
- There's a collection of previously observed durations. These must be used as statistics upon which to base the estimate.
- It's better to estimate too high than too low.
- Durations are assumed to be normal distributed.
- The estimate should be higher than the actual duration in more than 99% of the times.
- If there are no previous observations, a default estimate must be used as a fall-back mechanism.
It's not that bad, actually. Because the distribution is assumed to be normal, you can find a good estimate by calculating the average, and add three times the standard deviation.
Here's how our developer attempted to solve the problem:
public class Estimator { private readonly TimeSpan defaultEstimate; private IReadOnlyCollection<TimeSpan> durations; private TimeSpan average; private TimeSpan standardDeviation; public Estimator(TimeSpan defaultEstimate) { this.defaultEstimate = defaultEstimate; } public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; this.durations = durations; this.CalculateAverage(); this.CalculateStandardDeviation(); var margin = TimeSpan.FromTicks(this.standardDeviation.Ticks * 3); return this.average + margin; } private void CalculateAverage() { this.average = TimeSpan.FromTicks( (long)this.durations.Average(ts => ts.Ticks)); } private void CalculateStandardDeviation() { var variance = this.durations.Average(ts => Math.Pow( (ts - this.average).Ticks, 2)); this.standardDeviation = TimeSpan.FromTicks((long)Math.Sqrt(variance)); } }
The CalculateEstimate method directly uses the temporary field durations
, as well as implicitly the fields average
and standardDeviation
. The reason for this is to avoid passing parameters around. Both the average and the standard deviation depend on the durations
, but the standard deviation also depends on the average.
These dependencies are difficult to see in this code snippet:
this.durations = durations; this.CalculateAverage(); this.CalculateStandardDeviation();
What if I wanted to switch the order around?
this.durations = durations; this.CalculateStandardDeviation(); this.CalculateAverage();
This compiles! It also produces a result if you invoke the CalculateEstimate method. No exception is thrown, but the result returned is incorrect!
Not only is this code difficult to understand, it's also brittle. Furthermore, it's not thread-safe.
This was most likely done with the best of intentions. After all, in Clean Code you learn that methods with zero arguments are better than methods with one argument (which are better than methods with two arguments, and so on).
There are better ways to factor the code, though.
Extract Class #
In Refactoring, the suggested cure is to extract a class that contains only the temporary fields. This refactoring is called Extract Class. Your first step can be to introduce a private, nested class within the containing class. In the case of the example, you might call this class DurationStatistics. It can be used from CalculateEstimate in this way:
public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; var stats = new DurationStatistics(durations); var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3); return stats.Average + margin; }
It's now much clearer what's going on. You have some statistics based on the durations
, and those contain both the average and the standard deviation. The flow of data is also clear: you need the stats
to get the average and the standard deviation, and you need stats.StandardDeviation
in order to calculate the margin
, and the margin
before you can calculate the return value. If you attempt to move those lines of code around, it's no longer going to compile.
This solution is also thread-safe.
Here's the full code of the Estimator class, after the Extract Class refactoring:
public class Estimator { private readonly TimeSpan defaultEstimate; public Estimator(TimeSpan defaultEstimate) { this.defaultEstimate = defaultEstimate; } public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; var stats = new DurationStatistics(durations); var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3); return stats.Average + margin; } private class DurationStatistics { private readonly IReadOnlyCollection<TimeSpan> durations; private readonly Lazy<TimeSpan> average; private readonly Lazy<TimeSpan> standardDeviation; public DurationStatistics(IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) throw new ArgumentException( "Empty collection not allowed.", nameof(durations)); this.durations = durations; this.average = new Lazy<TimeSpan>(this.CalculateAverage); this.standardDeviation = new Lazy<TimeSpan>(this.CalculateStandardDeviation); } public TimeSpan Average { get { return this.average.Value; } } public TimeSpan StandardDeviation { get { return this.standardDeviation.Value; } } private TimeSpan CalculateAverage() { return TimeSpan.FromTicks( (long)this.durations.Average(ts => ts.Ticks)); } private TimeSpan CalculateStandardDeviation() { var variance = this.durations.Average(ts => Math.Pow( (ts - this.average.Value).Ticks, 2)); return TimeSpan.FromTicks((long)Math.Sqrt(variance)); } } }
As you can see, the Estimator class now only has a single read-only field, which ensures that it isn't temporary.
The original intent of not passing parameters around is still preserved, so this solution still adheres to that Clean Code principle.
The DurationStatistics class lazily calculates the average and standard deviation, and memoizes their results (or rather, that's what Lazy<T> does).
Instances of DurationStatistics have a shorter lifetime than the containing Estimator object.
The DurationStatistics class is a private, nested class, but a next step might be to pull it out to a public class in its own right. You don't have to do this. Sometimes I leave such classes as private classes, because they only exist to organise the code better; their purpose and behaviour are still narrow in scope, and associated with the containing class. In other cases, such a refactoring may uncover a great way to model a particular type of problem. If that's the case, making it a public class could make the entire code base better, because you've now introduced a reusable concept.
Summary #
Class fields are intended to be used by the class. They constitute the data part of data with behaviour. If you have temporary values, either pass them around as arguments, or extract a new class to contain them.
Comments
How about passing everything as parameters and returning the calculated value as result, would this be considered dirty code?
Just trying to get the most out of this example.
Tom, thank you for writing. Great observation about passing arguments around instead! In this case, passing the necessary arguments around would also be a good solution. It would look like this:
This refactoring also eliminates the temporary fields, and is probably even easier to understand.
To be quite honest, that's what I would have done with this particular example if it had been production code. The reason I wanted to show the Extract Class refactoring instead is that passing arguments doesn't scale well when you add more intermediate values. In this example, the maximum number of arguments you have to pass is two, which is easily within acceptable limits, but what if you have 15 intermediate values?
Passing 15 method arguments around is well beyond most people's threshold, so instead they sometimes resort to temporary fields. The point here is that this isn't necessary, because you have the alternative of extracting classes.
Why didn't I show an example with 15 intermediate values, then? Well, first, I couldn't think of a realistic example, and even if I could, it would be so complicated that no one would read the article :)
Sounds reasonable - Passing one or two around is okay but at a certain point an Extract Class would make more sense.
No, it's a good example otherwise it would be too confusing as you mentioned. Interesting post, thanks!