Parametrised test primitive obsession code smell

Watch out for this code smell with some unit testing frameworks.

In a previous article you saw this state-based integration test:

[Theory]
[InlineData(1049, 19, 00, "juliad@example.net", "Julia Domna", 5)]
[InlineData(1130, 18, 15, "x@example.com", "Xenia Ng", 9)]
[InlineData( 956, 16, 55, "kite@example.edu", null, 2)]
[InlineData( 433, 17, 30, "shli@example.org", "Shanghai Li", 5)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    int days,
    int hours,
    int minutes,
    string email,
    string name,
    int quantity)
{
    var at = DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = new ReservationDto
    {
        Id = "B50DF5B1-F484-4D99-88F9-1915087AF568",
        At = at.ToString("O"),
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        Guid.Parse(dto.Id),
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        new Email(dto.Email),
        new Name(dto.Name ?? ""),
        dto.Quantity);
    Assert.Contains(expected, db.Grandfather);
}

This was the test after I improved it. Still, I wasn't satisfied with it. It has several problems. Take a few moments to consider it. Can you identify any problems? Which ones?

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

Size #

I know that you're not familiar with all the moving parts. You don't know how ReservationDto or Reservation are implemented. You don't know what InMemoryRestaurantDatabase is, or how ReservationsController behaves. Still, the issues I have in mind aren't specific to a particular code base.

I feel that the method is verging on being too big. Quantifiably, it doesn't fit in an 80x24 box, but that's just an arbitrary threshold anyway. Still, I think it's grown to a size that makes me uncomfortable.

If you aren't convinced, think of this code example as a stand-in for something larger. In the above test, a reservation contains five smaller values (Id, At, Email, Name, and Quantity). How would a similar test look if the object in question contains ten or twenty values?

In the decades I've been programming and consulting, I've seen plenty of code bases. Data objects made from twenty fields are hardly unusual.

What would a similar test look like if the dto and the expected object required twenty smaller values?

The test would be too big.

Primitive obsession #

A test like this one contains a mix of essential behaviour and implementation details. The behaviour that it verifies is that when you Post a valid dto, the data makes it all the way to the database.

Exactly how the dto or the expected value are constructed is less relevant for the test. Yet it's intermingled with the test of behaviour. The signal-to-noise ratio in the test isn't all that great. What can you do to improve things?

As given, it seems difficult to do much. The problem is primitive obsession. While this is a parametrised test, all the method parameters are primitives: integers and strings. The makes it hard to introduce useful abstractions.

In C# (and probably other languages as well) parametrised tests often suffer from primitive obsession. The most common data source is an attribute (AKA annotation), like xUnit.net's [InlineData] attribute. This isn't a limitation of xUnit.net, but rather of .NET attributes. You can only create attributes with primitive values and arrays.

What is a limitation of xUnit.net (and the other mainstream .NET testing frameworks, as far as I know) is that tests aren't first-class values. In Haskell, by contrast, it's easy to write parametrised tests using the normal language constructs exactly because tests are first-class values. (I hope that the next version of xUnit.net will support tests as first-class values.)

Imagine that instead of only five constituent fields, you'd have to write a parametrised test for objects with twenty primitive values. As long as you stick with attribute-based data sources, you'll be stuck with primitive values.

Granted, attributes like [InlineData] are lightweight, but over the years, my patience with them has grown shorter. They lock me into primitive obsession, and I don't appreciate that.

Essential test #

While tests as first-class values aren't an option in xUnit.net, you can provide other data sources for the [Theory] attribute than [InlineData]. It's not as lightweight, but it breaks the primitive obsession and re-enables normal code design techniques. It enables you to reduce the test itself to its essence. You no longer have to think in primitives, but can instead express the test unshackled by constraints. As a first pass, I'd like the test to look like this:

[Theory]
[ClassData(typeof(PostValidReservationWhenDatabaseIsEmptyTestCases))]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    ReservationDto validDto, Reservation expected)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    await sut.Post(validDto);
 
    Assert.Contains(expected, db.Grandfather);
}

This version of the test eliminates the noise. How validDto and expected are constructed is an implementation detail that has little bearing on the behaviour being tested.

For a reader of the code, it's should now be clearer what's at stake here: If you Post a validDto the expected reservation should appear in the database.

Reducing the test code to its essentials made me realise something that hitherto had escaped me: that I could name the DTO by role. Instead of just dto, I could call the parameter validDto.

Granted, I could also have done that previously, but I didn't think of it. There's was so much noise in that test that I didn't stop to consider whether dto sufficiently communicated the role of that variable.

The less code, the easier it becomes to think such things through, I find.

In any case, the test code now much more succinctly expresses the essence of the desired behaviour. Notice how I started my refactoring by writing the desired test code. I've yet to implement the data source. Now that the data source expresses test data as full objects, I'm not so concerned with whether or not that's going to be possible. Of course it's possible.

Object data source #

You can define data sources for xUnit.net as classes or methods. In C# I usually reach for the [ClassData] option, since an object (in C#, that is) gives me better options for further decomposition. For example, I can define a class and delegate the details to helper methods:

private class PostValidReservationWhenDatabaseIsEmptyTestCases :
    TheoryData<ReservationDto, Reservation>
{
    public PostValidReservationWhenDatabaseIsEmptyTestCases()
    {
        AddWithName(1049, 19, 00, "juliad@example.net", "Julia Domna", 5);
        AddWithName(1130, 18, 15, "x@example.com", "Xenia Ng", 9);
        AddWithoutName(956, 16, 55, "kite@example.edu", 2);
        AddWithName(433, 17, 30, "shli@example.org", "Shanghai Li", 5);
    }
 
    // More members here...

Here, I'm taking advantage of xUnit.net's built-in TheoryData<T1, T2> base class, but that's just a convenience. All you have to do is to implement IEnumerable<object[]>.

As you can see, the constructor adds the four test cases by calling two private helper methods. Here's the first of those:

private const string id = "B50DF5B1-F484-4D99-88F9-1915087AF568";
 
private void AddWithName(
    int days,
    int hours,
    int minutes,
    string email,
    string name,
    int quantity)
{
    var at =
        DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    Add(new ReservationDto
        {
            Id = id,
            At = at.ToString("O"),
            Email = email,
            Name = name,
            Quantity = quantity
        },
        new Reservation(
            Guid.Parse(id),
            at,
            new Email(email),
            new Name(name),
            quantity));
}

The other helper method is almost identical, although it has a slight variation when it comes to the reservation name:

private void AddWithoutName(
    int days,
    int hours,
    int minutes,
    string email,
    int quantity)
{
    var at =
        DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    Add(new ReservationDto
        {
            Id = id,
            At = at.ToString("O"),
            Email = email,
            Quantity = quantity
        },
        new Reservation(
            Guid.Parse(id),
            at,
            new Email(email),
            new Name(""),
            quantity));
}

In total, this refactoring results in more code, so how is this an improvement?

The paradox of decomposition #

In object-oriented design, decomposition tends to lead to more code. If you want to isolate and make reusable a particular piece of behaviour, you'll usually introduce an interface or a base class. Even stateless functions need a static class to define them. (To be fair, functional programming isn't entirely devoid of such overhead associated with decomposition, but the cost tends to smaller.) This leads to more code, compared with the situation before decomposition.

This is a situation you may also encounter if you attempt to refactor to design patterns, or follow the SOLID principles. You'll have more code than when you started. This often leads to resistance to such 'code bloat'.

It's fine to resist code bloat. It's also fine to dislike 'complexity for complexity's sake'. Try to evaluate each potential code change based on advantages and disadvantages. I'm not insisting that the above refactoring is objectively better. I did feel, however, that I had a problem that I ought to address, and that this was a viable alternative. The result is more code, but each piece of code is smaller and simpler.

You can, conceivably, read the test method itself to get a feel for what it tests, even if you don't know all the implementation details. You can read the four statements in the PostValidReservationWhenDatabaseIsEmptyTestCases constructor without, I hope, understanding all the details about the two helper methods. And you can read AddWithName without understanding how AddWithoutName works, and vice versa, because these two methods don't depend on each other.

Conclusion #

In this article, I've described how the use of code annotations for parametrised tests tend to pull in the direction of primitive obsession. This is a force worth keeping an eye on, I think.

You saw how to refactor to class-based test data generation. This enables you to use objects instead of primitives, thus opening your design palette. You can now use all your usual object-oriented or functional design skills to factor the code in a way that's satisfactory.

Was it worth it in this case? Keep in mind that the original problem was already marginal. While the code didn't fit in a 80x24 box, it was only 33 lines of code (excluding the test data). Imagine, however, that instead of a five-field reservation, you'd be dealing with a twenty-field data class, and such a refactoring begins to look more compelling.

Is the code now perfect? It still isn't. I'm a little put off by the similarity of AddWithName and AddWithoutName. I'm also aware that there's a trace of production code duplicated in the test case, in the way that the test code duplicates how a valid ReservationDto relates to a Reservation. I'm on the fence whether I should do anything about this.

At the moment I'm inclined to heed the rule of three. The duplication is still too insubstantial to warrant refactoring, but it's worth keeping an eye on.

Comments

Mateusz #

Hi Mark, fair point! Indeed, sometimes [InlineData] is too simple and having test data creation moved to a separate class with all its advantages is a valid approach (using f.e. [ClassData]).

I just wanted to clarify on TheoryData class, I noticed that it was added in xUnit v3. But I can't find this new version anywhere on NuGet - are you using it from some early adopter feed? Thanks!

2021-05-22 12:37 UTC

Mark Seemann #

Mateusz, thank you for writing. The code shown in this article uses xUnit.net 2.4.1, which (apparently) includes the TheoryData base classes. I must admit that I don't know exactly when they were added to the library.

2021-05-23 18:39 UTC

Published: Monday, 18 January 2021 06:30:00 UTC

Parametrised test primitive obsession code smell by Mark Seemann